chartify
¶The Chartify package has been created by Spotify, so it seems rude not to make use of it in our project :)
There are several popular Python data visualization packages, of which one is Bokeh
.
Chartify is built on top of Bokeh, simplifying the creation of certain types of charts while retaining the ability to modify the underlying Bokeh Figure
object.
Take a look at the GitHub repo for more information and examples.
Let's take a look at an example which makes use of our Spotify data. Fork the repl, remembering as always to create your own .env
file should you wish to re-run the code.
Things should look familiar, until we get to:
tidy = tidy_data(recs)
We'll take a look at tidy.py
to see what this function is doing.
One of the most important requirements for data visualization is to ensure our our data is suitably structured.
Tidy data is described here as follows:
Each variable is a column, each observation is a row, and each type of observational unit is a table.
Beyond being tidy, we may also need to apply certain transformations to our datasets, such as stacking or pivoting, so it can be used with our visualization tools of choice.
set_index()
¶df = pd.DataFrame(tracks_data)
track_numbers = list(range(1, len(df) + 1))
df['Track'] = track_numbers
df = df.set_index('Track')
df.head(2)
album | artists | available_markets | disc_number | duration_ms | explicit | external_ids | external_urls | href | id | ... | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | track_href | analysis_url | time_signature | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Track | |||||||||||||||||||||
1 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 272946 | False | {'isrc': 'GBBLK0300012'} | {'spotify': 'https://open.spotify.com/track/6S... | https://api.spotify.com/v1/tracks/6SXy02aTZU3y... | 6SXy02aTZU3ysoGUixYCz0 | ... | 0 | 0.0361 | 0.2170 | 0.000458 | 0.334 | 0.571 | 80.897 | https://api.spotify.com/v1/tracks/6SXy02aTZU3y... | https://api.spotify.com/v1/audio-analysis/6SXy... | 4 |
2 | {'album_type': 'ALBUM', 'artists': [{'external... | [{'external_urls': {'spotify': 'https://open.s... | [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... | 1 | 199066 | False | {'isrc': 'GBBRL9691749'} | {'spotify': 'https://open.spotify.com/track/5b... | https://api.spotify.com/v1/tracks/5beiMMlsINDI... | 5beiMMlsINDI5fxRdF0D42 | ... | 0 | 0.0270 | 0.0123 | 0.000000 | 0.155 | 0.677 | 108.030 | https://api.spotify.com/v1/tracks/5beiMMlsINDI... | https://api.spotify.com/v1/audio-analysis/5bei... | 4 |
2 rows × 34 columns
tracks_data
(which is a list of dictionaries)Track
column which, if there are six tracks, is simply numbers from 1-6index
.copy()
¶cols = ['popularity', 'danceability', 'energy', 'loudness', 'valence', 'tempo']
stats = df[cols]
stats.head(2)
popularity | danceability | energy | loudness | valence | tempo | |
---|---|---|---|---|---|---|
Track | ||||||
1 | 65 | 0.495 | 0.653 | -6.769 | 0.571 | 80.897 |
2 | 47 | 0.654 | 0.773 | -6.484 | 0.677 | 108.030 |
df3 = stats / stats.mean()
df3.head(2)
popularity | danceability | energy | loudness | valence | tempo | |
---|---|---|---|---|---|---|
Track | ||||||
1 | 1.164179 | 0.947368 | 0.794082 | 1.103731 | 0.783444 | 0.635385 |
2 | 0.841791 | 1.251675 | 0.940008 | 1.057260 | 0.928882 | 0.848494 |
.mean()
of each columnThis will allow us to more easily compare the values between tracks of each metric using the same chart configuration.
.stack()
¶df4 = df3.stack().to_frame().reset_index()
df4.columns=['Track', 'Feature', 'Value']
df4.head(3)
Track | Feature | Value | |
---|---|---|---|
0 | 1 | popularity | 1.164179 |
1 | 1 | danceability | 0.947368 |
2 | 1 | energy | 0.794082 |
.stack()
and its sister .unstack()
can be very useful for transforming DataFrames into the appropriate format for a given chart type.
For bar plots, Chartify will want us to specify categorical_columns
and numeric_column
, so we need to have a row for each data point, with the categorical data (in our case Features
) to be a value in each row.
feature = 'tempo'
ch = chartify.Chart(x_axis_type='categorical')
feature
variable to 'tempo'
(we'll be able to modify this to any other value found in the Feature
column of the tidy
DataFrame) chartify.Chart()
object with the given argument for x_axis_type
There are various examples of Chartify code on GitHub.
ch.plot.bar(
data_frame=tidy[tidy['Feature'] == feature],
...
.plot.bar()
method of our chartify.Chart()
objectdata_frame
is a subset of the tidy
DataFrame, containing only rows with the given feature
categorical_columns='Track',
numeric_column='Value',
categorical_order_by='labels',
categorical_order_ascending=True)
categorical_columns
(which we set before as being the x_axis_type
) will be just Track
numeric_column
will therefore be represented on the y_axis
categorical_order...
parameters will determine the order of the x_axis
Refer to the examples and documentation for more detail on these methods and parameters.
Our data has been plotted on a bar chart, with some helpful placeholders for various label
and title
attributes which we can change.
ch.set_title(f"Track comparison by {feature}")
ch.set_subtitle(None)
ch.set_source_label('Data source: Spotify')
ch.axes.set_xaxis_label('Track')
ch.axes.set_yaxis_label('Value')
chartify.Chart(blank_labels=False, layout='slide_100%', x_axis_type='categorical', y_axis_type='linear')
None
)filename = f'charts/{feature}.png'
ch.save(filename, format='png')
.save()
method allows us to save the chart in various formatssvg
, png
and html
can be used for the format
parameterIn the Seeder app, the svg
format has been used, which scales better than png
(without loss of image quality). The html
format may be useful if you want to incorporate interactivity into your charts.