Last updated: 19 November 2022
In this blog, I am going to play around with spark with databricks running on AWS.
Setup a free trial databricks account.

Created a new S3 bucket spark-learning in the cheapest region Ohio.

# high concurrency is for a shared cluster
Cluster Mode = Standard
# Turn off Autopilot
Terminate after = 15 mins
# 10 cents an hour. Cheapest I can find.
Worker, Driver Type = m6gd.large
Num workers = 1

Let’s first create a table, this can be done in the workspace console
Data -> Table -> Create table. For data, we are using this telecommunication data from
kaggle. We are going
to directly upload the data from this csv file into DataBricks DBFS fmt table creator.
We create a spark notebook with databricks.

In order to teardown the resources, we need to