pandas¶pandas¶pandas is a popular Python package with excellent functionality for data analysis tasks.
A DataFrame is a pandas data structure, much like a table:
Typically, rows are numerically labelled, starting at zero, while columns have names in strings.
However, a DataFrame can have strings for row labels and they don't have to be unique.
Let's jump in with an example. Fork the repl, remembering as always to add your credentials to a .env file.
!!! info
The walkthrough will highlight a handful of pandas techniques used for creating the Seeder app, but if you would like a more complete understanding of pandas and these techniques, please consult the excellent documentation.
Not much new to see in main.py; we've just substituted use of the .get_recommmendations() method for the .get_item() method we used previously to get individual track data:
recs = s.get_recommendations(track_id)
for track in recs:
...
print(f'{track_name} - {track_artist}')
.get_recommendataions() returns a list containing data for several tracks, over which we iterate to print the track_name and track_artist pandas has also been imported at the top of the script, and given the alias of pdIn spotter.py we'll find .get_recommendations() at the bottom, but also notice a small modification to .get_item():
def get_item(self, category, item_id=None, params=None):
...
resp = requests.get(f'{self.api_root}{category}/{item_id}', params=params, headers=self.headers)
get_item() method no longer requires an item_id, and can also accept a params argumentdef get_recommendations(self, seed, number=6):
recs = self.get_item('recommendations', params={'seed_tracks': seed, 'limit': 100})
.get_item() to fetch some recommendationstrack_id defined in main.py is the value for the seed_tracks parameter limit of 100 items to be returnedrecs_df = pd.DataFrame(recs['tracks'])
recs_df.head(3)
| album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 291369 | False | {'isrc': 'GBAYE9600152'} | {'spotify': 'https://open.spotify.com/track/0m... | https://api.spotify.com/v1/tracks/0mIZErjrNQzO... | 0mIZErjrNQzOTDtn4UYflo | False | Sun Hits The Sky | 48 | https://p.scdn.co/mp3-preview/7591511d739043e4... | 6 | track | spotify:track:0mIZErjrNQzOTDtn4UYflo |
| 1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AE, AL, AR, AT, AU, BE, BG, BH, BO, BR, BY, C... | 1 | 351626 | False | {'isrc': 'GBAAN9500078'} | {'spotify': 'https://open.spotify.com/track/2f... | https://api.spotify.com/v1/tracks/2fXKyAyPrEa2... | 2fXKyAyPrEa24c6PJyqznF | False | Common People | 69 | None | 3 | track | spotify:track:2fXKyAyPrEa24c6PJyqznF |
| 2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F |
tracks key in recs was a list of 100 dictionaries, each one relating to a given trackpd.DataFrame() function has created a DataFrame from the list, with:.head() method allows us to show just the first few records .apply() and lambda functions¶In this part of the code, we're going to extract the number of tracks we tracks we're looking for (which is 6 by default), but only take those which are available in as many markets as possible.
We'll do this by:
available_markets for each trackmax() number found.head() method to fetch the first 6 tracks from the remaining tracksrecs_df['availability'] = recs_df['available_markets'].apply(lambda x: len(x))
recs_df.head(3)
| album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | availability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 291369 | False | {'isrc': 'GBAYE9600152'} | {'spotify': 'https://open.spotify.com/track/0m... | https://api.spotify.com/v1/tracks/0mIZErjrNQzO... | 0mIZErjrNQzOTDtn4UYflo | False | Sun Hits The Sky | 48 | https://p.scdn.co/mp3-preview/7591511d739043e4... | 6 | track | spotify:track:0mIZErjrNQzOTDtn4UYflo | 91 |
| 1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AE, AL, AR, AT, AU, BE, BG, BH, BO, BR, BY, C... | 1 | 351626 | False | {'isrc': 'GBAAN9500078'} | {'spotify': 'https://open.spotify.com/track/2f... | https://api.spotify.com/v1/tracks/2fXKyAyPrEa2... | 2fXKyAyPrEa24c6PJyqznF | False | Common People | 69 | None | 3 | track | spotify:track:2fXKyAyPrEa24c6PJyqznF | 80 |
| 2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F | 92 |
availability has been created (scroll right to see it).apply() method was used on the available_markets column, with a lambda function used as its argumentlambda function means "take the original value as x, and return len(x), i.e. the length of the list"Lambda functions are also known as anonymous functions, and tend to be found where the logic is only required for a short period of time, i.e. it will not be re-used.
tracks = recs_df[recs_df['availability'] == recs_df['availability'].max()].head(6).copy()
tracks.head(2)
| album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | availability | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F | 92 |
| 5 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 414080 | False | {'isrc': 'GBDGG0200012'} | {'spotify': 'https://open.spotify.com/track/2w... | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | 2wzTHuw8nhFSXUU27x0PFa | False | There Goes The Fear | 51 | None | 3 | track | spotify:track:2wzTHuw8nhFSXUU27x0PFa | 92 |
df [where the ['availablility] column in that df is equal to == the .max() value in that same column ]".head() of the result, containing the number of values specified.head() again to display the first two rows of the resulttracks['features'] = tracks['id'].apply(lambda x: s.get_item('audio-features', x))
tracks[['availability', 'features']]
| availability | features | |
|---|---|---|
| 2 | 92 | {'danceability': 0.36, 'energy': 0.895, 'key':... |
| 5 | 92 | {'danceability': 0.499, 'energy': 0.975, 'key'... |
| 6 | 92 | {'danceability': 0.491, 'energy': 0.686, 'key'... |
| 9 | 92 | {'danceability': 0.45, 'energy': 0.656, 'key':... |
| 10 | 92 | {'danceability': 0.508, 'energy': 0.826, 'key'... |
| 14 | 92 | {'danceability': 0.47, 'energy': 0.959, 'key':... |
.apply() again, this time incorporating our .get_item() method and passing the value in the id column as one of the arguments to get audio-features data for each tracktracks DataFrame by providing a list of columns in [square brackets ]reset_index()¶tracks = tracks.reset_index(drop=True)
drop=True parameter means that our original index values won't be stored (drop=False would have resulted in the creation of a new column containing them) features = pd.json_normalize(tracks['features'])
features.head(2)
| danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | type | id | uri | track_href | analysis_url | duration_ms | time_signature | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.360 | 0.895 | 9 | -3.795 | 1 | 0.1170 | 0.032900 | 0.000001 | 0.0600 | 0.424 | 168.214 | audio_features | 7mAF2MJdbNT75VrVcgwT6F | spotify:track:7mAF2MJdbNT75VrVcgwT6F | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | https://api.spotify.com/v1/audio-analysis/7mAF... | 182680 | 4 |
| 1 | 0.499 | 0.975 | 5 | -4.504 | 1 | 0.0538 | 0.000086 | 0.072100 | 0.0258 | 0.509 | 109.997 | audio_features | 2wzTHuw8nhFSXUU27x0PFa | spotify:track:2wzTHuw8nhFSXUU27x0PFa | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | https://api.spotify.com/v1/audio-analysis/2wzT... | 414080 | 4 |
json_normalize() function is used here to unpack the dictionaries returned by our .get_item() method calls requesting audio-features for each tracktracks_full = pd.concat([tracks, features], axis=1)
print(tracks_full.shape)
tracks_full.head(2)
(6, 37)
| album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | ... | liveness | valence | tempo | type | id | uri | track_href | analysis_url | duration_ms | time_signature | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | ... | 0.0600 | 0.424 | 168.214 | audio_features | 7mAF2MJdbNT75VrVcgwT6F | spotify:track:7mAF2MJdbNT75VrVcgwT6F | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | https://api.spotify.com/v1/audio-analysis/7mAF... | 182680 | 4 |
| 1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 414080 | False | {'isrc': 'GBDGG0200012'} | {'spotify': 'https://open.spotify.com/track/2w... | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | 2wzTHuw8nhFSXUU27x0PFa | ... | 0.0258 | 0.509 | 109.997 | audio_features | 2wzTHuw8nhFSXUU27x0PFa | spotify:track:2wzTHuw8nhFSXUU27x0PFa | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | https://api.spotify.com/v1/audio-analysis/2wzT... | 414080 | 4 |
2 rows × 37 columns
axis=1 means that we do so in a column-wise manner.shape attribute of a DataFrame contains the number of (rows, columns) The pandas .join() and merge() methods are useful for more complex SQL-like joins using specific columns as keys.
dupes = tracks_full.columns.duplicated()
dupes
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, True, True, True, False, False, True,
False])
.duplicated() is a method which identifies any repeat instances of column names.loc¶tracks_all = tracks_full.loc[:,~dupes].copy()
tracks_all.shape
(6, 33)
.loc is used to locate items in our DataFrame, refers to the rows; here : means 'all rows', refers to the columns; ~ means 'not' or 'opposite'.copy() makes a new object so that subsequent operations on tracks_all can be made without tracks_full being affected (to save memory, pandas may not otherwise store such a subset as a distinct object) Sometimes our tracks will have multplie artists, and it would be nice to be able to use all of their names in our app.
Here we again use .apply() with a lambda function, incorporating the .join() method and a list comprehension to amalgamate the name values for all entries in artists:
tracks_all['artist_string'] = tracks_all['artists'].apply(lambda x: ', '.join([a['name'] for a in x]))
list(tracks_all['artist_string'])
['The Futureheads', 'Doves', 'The Stone Roses', 'Dexys Midnight Runners', 'Stereophonics', 'Primal Scream']
tracks_data = tracks_all.to_dict(orient="records")
tracks_data[0].keys()
dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri', 'availability', 'features', 'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'track_href', 'analysis_url', 'time_signature', 'artist_string'])
.to_dict() DataFrame method, with the orient="records" parameter, to return a list of dictionaries, each one containing data about a given trackaudio-features and our new artist_string key:value pairs) as seen previously when calling .json() on a Response object returned from a get_item() request