pandas
¶pandas
¶pandas
is a popular Python package with excellent functionality for data analysis tasks.
A DataFrame is a pandas data structure, much like a table:
Typically, rows are numerically labelled, starting at zero, while columns have names in strings.
However, a DataFrame can have strings for row labels and they don't have to be unique.
Let's jump in with an example. Fork the repl, remembering as always to add your credentials to a .env
file.
!!! info
The walkthrough will highlight a handful of pandas
techniques used for creating the Seeder app, but if you would like a more complete understanding of pandas
and these techniques, please consult the excellent documentation.
Not much new to see in main.py
; we've just substituted use of the .get_recommmendations()
method for the .get_item()
method we used previously to get individual track data:
recs = s.get_recommendations(track_id)
for track in recs:
...
print(f'{track_name} - {track_artist}')
.get_recommendataions()
returns a list containing data for several tracks, over which we iterate to print the track_name
and track_artist
pandas
has also been imported at the top of the script, and given the alias of pd
In spotter.py
we'll find .get_recommendations()
at the bottom, but also notice a small modification to .get_item()
:
def get_item(self, category, item_id=None, params=None):
...
resp = requests.get(f'{self.api_root}{category}/{item_id}', params=params, headers=self.headers)
get_item()
method no longer requires an item_id
, and can also accept a params
argumentdef get_recommendations(self, seed, number=6):
recs = self.get_item('recommendations', params={'seed_tracks': seed, 'limit': 100})
.get_item()
to fetch some recommendations
track_id
defined in main.py
is the value for the seed_tracks
parameter limit
of 100
items to be returnedrecs_df = pd.DataFrame(recs['tracks'])
recs_df.head(3)
album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 291369 | False | {'isrc': 'GBAYE9600152'} | {'spotify': 'https://open.spotify.com/track/0m... | https://api.spotify.com/v1/tracks/0mIZErjrNQzO... | 0mIZErjrNQzOTDtn4UYflo | False | Sun Hits The Sky | 48 | https://p.scdn.co/mp3-preview/7591511d739043e4... | 6 | track | spotify:track:0mIZErjrNQzOTDtn4UYflo |
1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AE, AL, AR, AT, AU, BE, BG, BH, BO, BR, BY, C... | 1 | 351626 | False | {'isrc': 'GBAAN9500078'} | {'spotify': 'https://open.spotify.com/track/2f... | https://api.spotify.com/v1/tracks/2fXKyAyPrEa2... | 2fXKyAyPrEa24c6PJyqznF | False | Common People | 69 | None | 3 | track | spotify:track:2fXKyAyPrEa24c6PJyqznF |
2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F |
tracks
key in recs
was a list of 100 dictionaries, each one relating to a given trackpd.DataFrame()
function has created a DataFrame from the list, with:.head()
method allows us to show just the first few records .apply()
and lambda
functions¶In this part of the code, we're going to extract the number
of tracks we tracks we're looking for (which is 6
by default), but only take those which are available in as many markets as possible.
We'll do this by:
available_markets
for each trackmax()
number found.head()
method to fetch the first 6
tracks from the remaining tracksrecs_df['availability'] = recs_df['available_markets'].apply(lambda x: len(x))
recs_df.head(3)
album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | availability | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 291369 | False | {'isrc': 'GBAYE9600152'} | {'spotify': 'https://open.spotify.com/track/0m... | https://api.spotify.com/v1/tracks/0mIZErjrNQzO... | 0mIZErjrNQzOTDtn4UYflo | False | Sun Hits The Sky | 48 | https://p.scdn.co/mp3-preview/7591511d739043e4... | 6 | track | spotify:track:0mIZErjrNQzOTDtn4UYflo | 91 |
1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AE, AL, AR, AT, AU, BE, BG, BH, BO, BR, BY, C... | 1 | 351626 | False | {'isrc': 'GBAAN9500078'} | {'spotify': 'https://open.spotify.com/track/2f... | https://api.spotify.com/v1/tracks/2fXKyAyPrEa2... | 2fXKyAyPrEa24c6PJyqznF | False | Common People | 69 | None | 3 | track | spotify:track:2fXKyAyPrEa24c6PJyqznF | 80 |
2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F | 92 |
availability
has been created (scroll right to see it).apply()
method was used on the available_markets
column, with a lambda
function used as its argumentlambda
function means "take the original value as x
, and return len(x)
, i.e. the length of the list"Lambda functions are also known as anonymous functions, and tend to be found where the logic is only required for a short period of time, i.e. it will not be re-used.
tracks = recs_df[recs_df['availability'] == recs_df['availability'].max()].head(6).copy()
tracks.head(2)
album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | is_local | name | popularity | preview_url | track_number | type | uri | availability | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | False | Hounds of Love | 55 | https://p.scdn.co/mp3-preview/c98ce02fe11caae9... | 14 | track | spotify:track:7mAF2MJdbNT75VrVcgwT6F | 92 |
5 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 414080 | False | {'isrc': 'GBDGG0200012'} | {'spotify': 'https://open.spotify.com/track/2w... | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | 2wzTHuw8nhFSXUU27x0PFa | False | There Goes The Fear | 51 | None | 3 | track | spotify:track:2wzTHuw8nhFSXUU27x0PFa | 92 |
df
[
where the [
'availablility]
column in that df
is equal to ==
the .max()
value in that same column ]
".head()
of the result, containing the number
of values specified.head()
again to display the first two rows of the resulttracks['features'] = tracks['id'].apply(lambda x: s.get_item('audio-features', x))
tracks[['availability', 'features']]
availability | features | |
---|---|---|
2 | 92 | {'danceability': 0.36, 'energy': 0.895, 'key':... |
5 | 92 | {'danceability': 0.499, 'energy': 0.975, 'key'... |
6 | 92 | {'danceability': 0.491, 'energy': 0.686, 'key'... |
9 | 92 | {'danceability': 0.45, 'energy': 0.656, 'key':... |
10 | 92 | {'danceability': 0.508, 'energy': 0.826, 'key'... |
14 | 92 | {'danceability': 0.47, 'energy': 0.959, 'key':... |
.apply()
again, this time incorporating our .get_item()
method and passing the value in the id
column as one of the arguments to get audio-features
data for each tracktracks
DataFrame by providing a list of columns in [
square brackets ]
reset_index()
¶tracks = tracks.reset_index(drop=True)
drop=True
parameter means that our original index values won't be stored (drop=False
would have resulted in the creation of a new column containing them) features = pd.json_normalize(tracks['features'])
features.head(2)
danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | type | id | uri | track_href | analysis_url | duration_ms | time_signature | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.360 | 0.895 | 9 | -3.795 | 1 | 0.1170 | 0.032900 | 0.000001 | 0.0600 | 0.424 | 168.214 | audio_features | 7mAF2MJdbNT75VrVcgwT6F | spotify:track:7mAF2MJdbNT75VrVcgwT6F | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | https://api.spotify.com/v1/audio-analysis/7mAF... | 182680 | 4 |
1 | 0.499 | 0.975 | 5 | -4.504 | 1 | 0.0538 | 0.000086 | 0.072100 | 0.0258 | 0.509 | 109.997 | audio_features | 2wzTHuw8nhFSXUU27x0PFa | spotify:track:2wzTHuw8nhFSXUU27x0PFa | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | https://api.spotify.com/v1/audio-analysis/2wzT... | 414080 | 4 |
json_normalize()
function is used here to unpack the dictionaries returned by our .get_item()
method calls requesting audio-features
for each tracktracks_full = pd.concat([tracks, features], axis=1)
print(tracks_full.shape)
tracks_full.head(2)
(6, 37)
album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | ... | liveness | valence | tempo | type | id | uri | track_href | analysis_url | duration_ms | time_signature | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 182680 | False | {'isrc': 'GBFFS0400061'} | {'spotify': 'https://open.spotify.com/track/7m... | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | 7mAF2MJdbNT75VrVcgwT6F | ... | 0.0600 | 0.424 | 168.214 | audio_features | 7mAF2MJdbNT75VrVcgwT6F | spotify:track:7mAF2MJdbNT75VrVcgwT6F | https://api.spotify.com/v1/tracks/7mAF2MJdbNT7... | https://api.spotify.com/v1/audio-analysis/7mAF... | 182680 | 4 |
1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 414080 | False | {'isrc': 'GBDGG0200012'} | {'spotify': 'https://open.spotify.com/track/2w... | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | 2wzTHuw8nhFSXUU27x0PFa | ... | 0.0258 | 0.509 | 109.997 | audio_features | 2wzTHuw8nhFSXUU27x0PFa | spotify:track:2wzTHuw8nhFSXUU27x0PFa | https://api.spotify.com/v1/tracks/2wzTHuw8nhFS... | https://api.spotify.com/v1/audio-analysis/2wzT... | 414080 | 4 |
2 rows × 37 columns
axis=1
means that we do so in a column-wise manner.shape
attribute of a DataFrame contains the number of (rows
, columns
) The pandas .join()
and merge()
methods are useful for more complex SQL-like joins using specific columns as keys.
dupes = tracks_full.columns.duplicated()
dupes
array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, False, False, True, False])
.duplicated()
is a method which identifies any repeat instances of column names.loc
¶tracks_all = tracks_full.loc[:,~dupes].copy()
tracks_all.shape
(6, 33)
.loc
is used to locate items in our DataFrame,
refers to the rows; here :
means 'all rows',
refers to the columns; ~
means 'not' or 'opposite'.copy()
makes a new object so that subsequent operations on tracks_all
can be made without tracks_full
being affected (to save memory, pandas
may not otherwise store such a subset as a distinct object) Sometimes our tracks will have multplie artists, and it would be nice to be able to use all of their names in our app.
Here we again use .apply()
with a lambda
function, incorporating the .join()
method and a list comprehension to amalgamate the name
values for all entries in artists
:
tracks_all['artist_string'] = tracks_all['artists'].apply(lambda x: ', '.join([a['name'] for a in x]))
list(tracks_all['artist_string'])
['The Futureheads', 'Doves', 'The Stone Roses', 'Dexys Midnight Runners', 'Stereophonics', 'Primal Scream']
tracks_data = tracks_all.to_dict(orient="records")
tracks_data[0].keys()
dict_keys(['album', 'artists', 'available_markets', 'disc_number', 'duration_ms', 'explicit', 'external_ids', 'external_urls', 'href', 'id', 'is_local', 'name', 'popularity', 'preview_url', 'track_number', 'type', 'uri', 'availability', 'features', 'danceability', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'valence', 'tempo', 'track_href', 'analysis_url', 'time_signature', 'artist_string'])
.to_dict()
DataFrame method, with the orient="records"
parameter, to return a list of dictionaries, each one containing data about a given trackaudio-features
and our new artist_string
key:value pairs) as seen previously when calling .json()
on a Response
object returned from a get_item()
request