Monday, 2 February 2026

How to Delete Files from an S3 Bucket Using Databricks

How to Delete Files from an S3 Bucket Using Databricks

Introduction

In many data pipelines, old files must be removed from S3 after processing. Databricks provides filesystem utilities that can help manage files stored in cloud buckets. This guide shows the step-by-step process for deleting files from S3.

Step 1: List Files Before Deletion

Always inspect the target path before deleting any file.

display(dbutils.fs.ls("s3a://your-bucket-name/archive-test/"))

Step 2: Identify the Exact File or Folder

Make sure you are pointing to the correct file path, especially in production environments.

Step 3: Delete a Single File

dbutils.fs.rm("s3a://your-bucket-name/archive-test/file1.csv", False)

Step 4: Delete an Entire Folder

Use recursive deletion for folders.

dbutils.fs.rm("s3a://your-bucket-name/archive-test/old_files/", True)

Step 5: Recheck the Path

List files again to confirm the deletion worked as expected.

display(dbutils.fs.ls("s3a://your-bucket-name/archive-test/"))

Important Precautions

  • Never run recursive delete on the wrong root folder
  • Test in non-production first
  • Keep backups or archive copies before permanent removal
  • Control delete permissions using IAM policies

Conclusion

Deleting S3 files from Databricks is straightforward, but it must be done carefully. A good practice is to archive files first and permanently delete them only after validation.

No comments:

Post a Comment

End-to-End Databricks S3 Workflow: Connect, Create Tables, Archive, and Move Files

End-to-End Databricks S3 Workflow: Connect, Create Tables, Archive, and Move Files Introduction An end-to-end Databricks S3 pipeline ofte...