Monday, 16 February 2026

How to Move Files from One S3 Bucket to Another Using Databricks

How to Move Files from One S3 Bucket to Another Using Databricks

Introduction

Moving files between S3 buckets is a common requirement in enterprise pipelines. For example, raw files may land in one bucket, then after validation they must be moved to a processed or archive bucket. Databricks can help automate this flow.

Step 1: Define Source and Destination Buckets

source_bucket = "s3a://source-bucket-name/input/"
target_bucket = "s3a://target-bucket-name/archive/"

Step 2: List the Source Files

source_files = dbutils.fs.ls(source_bucket)
display(source_files)

Step 3: Copy Files to the Target Bucket

for file in source_files:
    dbutils.fs.cp(file.path, target_bucket + file.name)

Step 4: Validate the Target Bucket

Always confirm the copied files are available in the destination bucket.

display(dbutils.fs.ls(target_bucket))

Step 5: Delete Files from the Source Bucket

Once validation is complete, remove the original files so the move operation is complete.

for file in source_files:
    dbutils.fs.rm(file.path)

Step 6: Add Logging and Error Handling

In production, add try-except blocks, audit logs, and row/file counts to avoid accidental data loss.

Conclusion

Moving files from one S3 bucket to another in Databricks is usually handled as a copy-then-delete operation. This pattern is reliable and works well for archive, backup, and multi-stage ingestion pipelines.

No comments:

Post a Comment

End-to-End Databricks S3 Workflow: Connect, Create Tables, Archive, and Move Files

End-to-End Databricks S3 Workflow: Connect, Create Tables, Archive, and Move Files Introduction An end-to-end Databricks S3 pipeline ofte...