Friday, 19 December 2025

Databricks Important Commands Cheat Sheet (SQL + Python)

Databricks Important Commands Cheat Sheet (SQL + Python)

This post is a quick Databricks commands cheat sheet for certification exam preparation. It covers the most important SQL and Python (PySpark) commands used with Delta Lake, Lakehouse, Unity Catalog, Auto Loader, Structured Streaming and optimization.

1. Basic Spark & DataFrame Commands (Python)

Start Spark Session (usually auto in Databricks)

# Spark session is usually available as `spark` in Databricks
spark.range(5).show()

Read CSV File

df = spark.read.option("header", "true").csv("/mnt/data/sales.csv")
df.show()

Write DataFrame as Parquet

df.write.mode("overwrite").parquet("/mnt/data/sales_parquet")

Display DataFrame in Notebook

display(df)

2. Delta Lake – Table Creation & Writes

Create Delta Table from DataFrame (Path)

df.write.format("delta").mode("overwrite").save("/mnt/delta/sales")

Create Delta Table as Managed Table

df.write.format("delta").mode("overwrite").saveAsTable("sales_delta")

SQL – Create Delta Table

CREATE TABLE sales_delta_sql (
  id BIGINT,
  amount DOUBLE,
  country STRING
)
USING DELTA;

SQL – Insert into Delta Table

INSERT INTO sales_delta_sql VALUES (1, 100.0, 'SG'), (2, 250.5, 'IN');

3. Delta Lake – Time Travel & History

View Table History

DESCRIBE HISTORY sales_delta_sql;

Time Travel by Version

SELECT * FROM sales_delta_sql VERSION AS OF 2;

Time Travel by Timestamp

SELECT * FROM sales_delta_sql TIMESTAMP AS OF '2026-02-28T10:00:00Z';

4. Delta Lake – Update, Merge & Delete

SQL – UPDATE

UPDATE sales_delta_sql
SET amount = amount * 1.1
WHERE country = 'SG';

SQL – DELETE

DELETE FROM sales_delta_sql
WHERE amount < 50;

SQL – MERGE (Upsert)

MERGE INTO target t
USING source s
ON t.id = s.id
WHEN MATCHED THEN
  UPDATE SET t.amount = s.amount
WHEN NOT MATCHED THEN
  INSERT (id, amount, country) VALUES (s.id, s.amount, s.country);

5. Optimization – OPTIMIZE, Z-ORDER, VACUUM

OPTIMIZE Delta Table

OPTIMIZE sales_delta_sql;

OPTIMIZE with Z-ORDER

OPTIMIZE sales_delta_sql
ZORDER BY (country);

VACUUM to Remove Old Files

VACUUM sales_delta_sql RETAIN 168 HOURS;  -- 7 days

6. Auto Loader – Incremental Ingestion

Python – Auto Loader from Cloud Storage

from pyspark.sql.functions import col

df_auto = (spark.readStream
  .format("cloudFiles")
  .option("cloudFiles.format", "csv")
  .option("header", "true")
  .load("/mnt/raw/sales/"))

(df_auto
  .writeStream
  .format("delta")
  .option("checkpointLocation", "/mnt/checkpoints/sales_autoloader")
  .outputMode("append")
  .start("/mnt/delta/sales_autoloader"))

7. Structured Streaming with Delta

Read Stream from Delta

stream_df = (spark.readStream
  .format("delta")
  .load("/mnt/delta/sales_stream"))

Write Stream to Delta

(stream_df
  .writeStream
  .format("delta")
  .option("checkpointLocation", "/mnt/checkpoints/sales_stream_out")
  .outputMode("append")
  .start("/mnt/delta/sales_stream_out"))

SQL – Streaming Table (Simplified)

CREATE OR REFRESH STREAMING LIVE TABLE sales_stream_silver
AS SELECT * FROM cloud_files("/mnt/raw/sales", "csv");

8. Delta Live Tables (DLT) – Basic Commands

Python DLT Example

import dlt
from pyspark.sql.functions import *

@dlt.table
def sales_bronze():
    return spark.readStream.format("cloudFiles") \
        .option("cloudFiles.format", "csv") \
        .load("/mnt/raw/sales")

@dlt.table
def sales_silver():
    return dlt.read("sales_bronze").select("id", "amount", "country")

9. Unity Catalog – Databases, Tables & Grants

List Catalogs

SHOW CATALOGS;

Set Current Catalog & Schema

USE CATALOG main;
USE SCHEMA main.sales_db;

Create Schema

CREATE SCHEMA IF NOT EXISTS main.sales_db;

Grant Permissions on Table

GRANT SELECT ON TABLE main.sales_db.sales_delta_sql TO `analyst_role`;

Revoke Permission

REVOKE SELECT ON TABLE main.sales_db.sales_delta_sql FROM `analyst_role`;

10. Useful Utility Commands for Exams

Describe Table

DESCRIBE EXTENDED sales_delta_sql;

Show Tables

SHOW TABLES IN main.sales_db;

Convert Parquet to Delta

CONVERT TO DELTA parquet.`/mnt/data/sales_parquet`;

Python – Convert to Delta Using Command

spark.sql("""
  CONVERT TO DELTA parquet.`/mnt/data/sales_parquet`
""")

Conclusion

This Databricks commands cheat sheet covers the most frequently used SQL and Python snippets for Delta Lake, Lakehouse, Auto Loader, DLT, Unity Catalog and optimization. These commands are highly relevant for Databricks certification exams and real-world projects. Use this page as a quick reference while practicing in Databricks notebooks.

No comments:

Post a Comment

Databricks Lakehouse Architecture Explained (Simple Guide)

Databricks Lakehouse Architecture Explained The Lakehouse architecture introduced by Databricks is a modern approach that combines the low...