Databricks Important Commands Cheat Sheet (SQL + Python)
This post is a quick Databricks commands cheat sheet for certification exam preparation. It covers the most important SQL and Python (PySpark) commands used with Delta Lake, Lakehouse, Unity Catalog, Auto Loader, Structured Streaming and optimization.
1. Basic Spark & DataFrame Commands (Python)
Start Spark Session (usually auto in Databricks)
# Spark session is usually available as `spark` in Databricks
spark.range(5).show()
Read CSV File
df = spark.read.option("header", "true").csv("/mnt/data/sales.csv")
df.show()
Write DataFrame as Parquet
df.write.mode("overwrite").parquet("/mnt/data/sales_parquet")
Display DataFrame in Notebook
display(df)
2. Delta Lake – Table Creation & Writes
Create Delta Table from DataFrame (Path)
df.write.format("delta").mode("overwrite").save("/mnt/delta/sales")
Create Delta Table as Managed Table
df.write.format("delta").mode("overwrite").saveAsTable("sales_delta")
SQL – Create Delta Table
CREATE TABLE sales_delta_sql (
id BIGINT,
amount DOUBLE,
country STRING
)
USING DELTA;
SQL – Insert into Delta Table
INSERT INTO sales_delta_sql VALUES (1, 100.0, 'SG'), (2, 250.5, 'IN');
3. Delta Lake – Time Travel & History
View Table History
DESCRIBE HISTORY sales_delta_sql;
Time Travel by Version
SELECT * FROM sales_delta_sql VERSION AS OF 2;
Time Travel by Timestamp
SELECT * FROM sales_delta_sql TIMESTAMP AS OF '2026-02-28T10:00:00Z';
4. Delta Lake – Update, Merge & Delete
SQL – UPDATE
UPDATE sales_delta_sql
SET amount = amount * 1.1
WHERE country = 'SG';
SQL – DELETE
DELETE FROM sales_delta_sql
WHERE amount < 50;
SQL – MERGE (Upsert)
MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED THEN
UPDATE SET t.amount = s.amount
WHEN NOT MATCHED THEN
INSERT (id, amount, country) VALUES (s.id, s.amount, s.country);
5. Optimization – OPTIMIZE, Z-ORDER, VACUUM
OPTIMIZE Delta Table
OPTIMIZE sales_delta_sql;
OPTIMIZE with Z-ORDER
OPTIMIZE sales_delta_sql
ZORDER BY (country);
VACUUM to Remove Old Files
VACUUM sales_delta_sql RETAIN 168 HOURS; -- 7 days
6. Auto Loader – Incremental Ingestion
Python – Auto Loader from Cloud Storage
from pyspark.sql.functions import col
df_auto = (spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("header", "true")
.load("/mnt/raw/sales/"))
(df_auto
.writeStream
.format("delta")
.option("checkpointLocation", "/mnt/checkpoints/sales_autoloader")
.outputMode("append")
.start("/mnt/delta/sales_autoloader"))
7. Structured Streaming with Delta
Read Stream from Delta
stream_df = (spark.readStream
.format("delta")
.load("/mnt/delta/sales_stream"))
Write Stream to Delta
(stream_df
.writeStream
.format("delta")
.option("checkpointLocation", "/mnt/checkpoints/sales_stream_out")
.outputMode("append")
.start("/mnt/delta/sales_stream_out"))
SQL – Streaming Table (Simplified)
CREATE OR REFRESH STREAMING LIVE TABLE sales_stream_silver
AS SELECT * FROM cloud_files("/mnt/raw/sales", "csv");
8. Delta Live Tables (DLT) – Basic Commands
Python DLT Example
import dlt
from pyspark.sql.functions import *
@dlt.table
def sales_bronze():
return spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "csv") \
.load("/mnt/raw/sales")
@dlt.table
def sales_silver():
return dlt.read("sales_bronze").select("id", "amount", "country")
9. Unity Catalog – Databases, Tables & Grants
List Catalogs
SHOW CATALOGS;
Set Current Catalog & Schema
USE CATALOG main;
USE SCHEMA main.sales_db;
Create Schema
CREATE SCHEMA IF NOT EXISTS main.sales_db;
Grant Permissions on Table
GRANT SELECT ON TABLE main.sales_db.sales_delta_sql TO `analyst_role`;
Revoke Permission
REVOKE SELECT ON TABLE main.sales_db.sales_delta_sql FROM `analyst_role`;
10. Useful Utility Commands for Exams
Describe Table
DESCRIBE EXTENDED sales_delta_sql;
Show Tables
SHOW TABLES IN main.sales_db;
Convert Parquet to Delta
CONVERT TO DELTA parquet.`/mnt/data/sales_parquet`;
Python – Convert to Delta Using Command
spark.sql("""
CONVERT TO DELTA parquet.`/mnt/data/sales_parquet`
""")
Conclusion
This Databricks commands cheat sheet covers the most frequently used SQL and Python snippets for Delta Lake, Lakehouse, Auto Loader, DLT, Unity Catalog and optimization. These commands are highly relevant for Databricks certification exams and real-world projects. Use this page as a quick reference while practicing in Databricks notebooks.
No comments:
Post a Comment