In this notebook we will be working with spotify songs Dataset from Kaggle. Specifically we will work with nested data types where the columns are of type ARRAYS or MAPS.

Kaggle: Spotify Dataset 1921–2020, 160k+ Tracks

Problem Statement:

Recently, I needed to work with Spark dataframes having Map datatypes for one of our projects. I realized that Map and Array are the two most commonly used datatypes. So, I explored in detail how can we create, query, explode and implode columns of array and map datatypes. I created this notebook to be a handy reference for myself. Please feel free to checkout this notebook on if you also…

Anindya Saha

Machine Learning Platform Engineer at Lyft Inc.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store