Data Visualization with chartify

Chartify package

The Chartify package has been created by Spotify, so it seems rude not to make use of it in our project :)

There are several popular Python data visualization packages, of which one is Bokeh.

Chartify is built on top of Bokeh, simplifying the creation of certain types of charts while retaining the ability to modify the underlying Bokeh Figure object.

Take a look at the GitHub repo for more information and examples.

Example repl

Let's take a look at an example which makes use of our Spotify data. Fork the repl, remembering as always to create your own .env file should you wish to re-run the code.

Things should look familiar, until we get to:

tidy = tidy_data(recs)

We'll take a look at tidy.py to see what this function is doing.

Tidying data

One of the most important requirements for data visualization is to ensure our our data is suitably structured.

Tidy data is described here as follows:

Each variable is a column, each observation is a row, and each type of observational unit is a table.

Beyond being tidy, we may also need to apply certain transformations to our datasets, such as stacking or pivoting, so it can be used with our visualization tools of choice.

set_index()

df = pd.DataFrame(tracks_data)
track_numbers = list(range(1, len(df) + 1))
df['Track'] = track_numbers
df = df.set_index('Track')
df.head(2)
album artists available_markets disc_number duration_ms explicit external_ids external_urls href id ... mode speechiness acousticness instrumentalness liveness valence tempo track_href analysis_url time_signature
Track
1 {'album_type': 'ALBUM', 'artists': [{'external... [{'external_urls': {'spotify': 'https://open.s... [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... 1 272946 False {'isrc': 'GBBLK0300012'} {'spotify': 'https://open.spotify.com/track/6S... https://api.spotify.com/v1/tracks/6SXy02aTZU3y... 6SXy02aTZU3ysoGUixYCz0 ... 0 0.0361 0.2170 0.000458 0.334 0.571 80.897 https://api.spotify.com/v1/tracks/6SXy02aTZU3y... https://api.spotify.com/v1/audio-analysis/6SXy... 4
2 {'album_type': 'ALBUM', 'artists': [{'external... [{'external_urls': {'spotify': 'https://open.s... [AD, AE, AL, AR, AT, AU, BA, BE, BG, BH, BO, B... 1 199066 False {'isrc': 'GBBRL9691749'} {'spotify': 'https://open.spotify.com/track/5b... https://api.spotify.com/v1/tracks/5beiMMlsINDI... 5beiMMlsINDI5fxRdF0D42 ... 0 0.0270 0.0123 0.000000 0.155 0.677 108.030 https://api.spotify.com/v1/tracks/5beiMMlsINDI... https://api.spotify.com/v1/audio-analysis/5bei... 4

2 rows × 34 columns

  • we create a DataFrame from tracks_data (which is a list of dictionaries)
  • we add a Track column which, if there are six tracks, is simply numbers from 1-6
  • we set these values as the DataFrame index

.copy()

cols = ['popularity', 'danceability', 'energy', 'loudness', 'valence', 'tempo']    

stats = df[cols]
stats.head(2)
popularity danceability energy loudness valence tempo
Track
1 65 0.495 0.653 -6.769 0.571 80.897
2 47 0.654 0.773 -6.484 0.677 108.030
  • we've created a subset of the original data, containing only that which is needed for our chart

Relative values

df3 = stats / stats.mean()
df3.head(2)
popularity danceability energy loudness valence tempo
Track
1 1.164179 0.947368 0.794082 1.103731 0.783444 0.635385
2 0.841791 1.251675 0.940008 1.057260 0.928882 0.848494
  • here we have divided the values in each column by the .mean() of each column
  • our values are now all relative rather than absolute

This will allow us to more easily compare the values between tracks of each metric using the same chart configuration.

.stack()

df4 = df3.stack().to_frame().reset_index()
df4.columns=['Track', 'Feature', 'Value']
df4.head(3)
Track Feature Value
0 1 popularity 1.164179
1 1 danceability 0.947368
2 1 energy 0.794082

.stack() and its sister .unstack() can be very useful for transforming DataFrames into the appropriate format for a given chart type.

For bar plots, Chartify will want us to specify categorical_columns and numeric_column, so we need to have a row for each data point, with the categorical data (in our case Features) to be a value in each row.

Creating a chart

feature = 'tempo'
ch = chartify.Chart(x_axis_type='categorical')
  • we've set the feature variable to 'tempo' (we'll be able to modify this to any other value found in the Feature column of the tidy DataFrame)
  • we've instantiated a chartify.Chart() object with the given argument for x_axis_type

There are various examples of Chartify code on GitHub.

ch.plot.bar(
    data_frame=tidy[tidy['Feature'] == feature],
    ...
  • we will use the .plot.bar() method of our chartify.Chart() object
  • our data_frame is a subset of the tidy DataFrame, containing only rows with the given feature
categorical_columns='Track',
numeric_column='Value', 
categorical_order_by='labels',
categorical_order_ascending=True)
  • our categorical_columns (which we set before as being the x_axis_type) will be just Track
  • the numeric_column will therefore be represented on the y_axis
  • the categorical_order... parameters will determine the order of the x_axis

Refer to the examples and documentation for more detail on these methods and parameters.

bar-chart

Our data has been plotted on a bar chart, with some helpful placeholders for various label and title attributes which we can change.

Modifying chart attributes

ch.set_title(f"Track comparison by {feature}")
ch.set_subtitle(None)
ch.set_source_label('Data source: Spotify')
ch.axes.set_xaxis_label('Track')
ch.axes.set_yaxis_label('Value')
chartify.Chart(blank_labels=False,
layout='slide_100%',
x_axis_type='categorical',
y_axis_type='linear')
  • we can set the various chart attributes as required (or remove them using None)
  • plots can be further customized; refer to the documentation

Saving charts

filename = f'charts/{feature}.png'
ch.save(filename, format='png')
  • the .save() method allows us to save the chart in various formats
  • svg, png and html can be used for the format parameter

In the Seeder app, the svg format has been used, which scales better than png (without loss of image quality). The html format may be useful if you want to incorporate interactivity into your charts.