Animated choropleth map with discrete colors using Python and Plotly

Mahshad Nejati
7 min readSep 2, 2020

--

Since I was a little boy I have been obsessed with the maps! I remember that I used to buy paper sheet maps of different continents and have them pinned on my room’s wall. Back in such old days, maps were used to be some static shapes that conveyed a limited amount of information. Fast forward to nowadays, we barely pass a day without benefiting from advancements of digital maps.

This tutorial is also available on my personal website here.

Image taken from page 1130 of ‘History of England and the British Empire. Photo by British Library on Unsplash

Plotly is a data visualization library that provides a wide variety of basic visualization charts, statistical charts, scientific charts, financial charts, maps, 3D charts, animated graphs, etc. for different types of visualization applications. Plotly Express is an easy-to-use and high-level interface to Plotly, which operates with a variety of data and make it easy to create professional looking graphs. We can either use Mapbox or Plotly’s built-in maps for visualization. We will use Plotly’s maps in this tutorial.

Animated maps are useful when we want to show a situation or value change in a variety of geographic regions over the course of time.

Dataset

Since COVID-19 has been a hot topic during past few months and most users are familiar with how quick the virus can spread, we use the COVID-19 data set from Government of Canada’s website for this tutorial. The ultimate objective here is to create an animated choropleth map of Canadian provinces that shows the spread of COVID-19 through a weekly time frame.

Geojson data

For choropleth maps we will need a .geojson file which in simple word, indicates the boundaries for a region with sets of vectors. Other information related to the boundary such as name, description, etc. is usually include in the .geojson file. For Canadian provinces geojson file, you can either download the boundary file from Canadian statistics and convert it to a geojson file, or use the available geojson files. For this project I am using the geojson file taken from Carto.

Data cleaning

We will need the date, number of cases, province id and name. Since we would like to have a weekly time frame, we will create timeframe column which shows the month name and week number of that month as below. We can make such adjustments either in Excel or directly in Python. The cleaned dataset file is available on my GitHub.

dataset after few adjustments in date format and the weekly time frame

As you can see in the dataset, we have the total number of cases for each day for each province. By using ‘cases’ as the color value we will have a continuous color legend. But since we would like to have discrete color setting, we need to add a new column called ‘category’ and assign each category based on the number of cases for each row as below:

import pandas as pd

df = pd.read_csv("ca_pr_day_n.csv")
df['category'] = ''

#categorizing the number of cases and assign each category to each row
def set_cat(row):
if row['cases'] == 0:
return '0'
if row['cases'] > 0 and row['cases'] < 1001:
return '1 - 1,000'
if row['cases'] > 1001 and row['cases'] < 5001:
return '1,001 - 5,000'
if row['cases'] > 5001 and row['cases'] < 10001:
return '5,001 - 10,000'
if row['cases'] > 10001 and row['cases'] < 30001:
return '10,001 - 30,000'
if row['cases'] > 30001 and row['cases'] < 50001:
return '30,001 - 50,000'
if row['cases'] > 50001:
return '50,001 and higher'

df = df.assign(category=df.apply(set_cat, axis=1))

Map creation

Having the data cleaned and grouped in the data frame, now we can proceed to create the choropleth map:

import pandas as pd
import plotly_express as px
import json

df = pd.read_csv("ca_pr_day_n.csv")
df['category'] = ''

#categorizing the number of cases and assign each category to each row
def set_cat(row):
if row['cases'] == 0:
return '0'
if row['cases'] > 0 and row['cases'] < 1001:
return '1 - 1,000'
if row['cases'] > 1001 and row['cases'] < 5001:
return '1,001 - 5,000'
if row['cases'] > 5001 and row['cases'] < 10001:
return '5,001 - 10,000'
if row['cases'] > 10001 and row['cases'] < 30001:
return '10,001 - 30,000'
if row['cases'] > 30001 and row['cases'] < 50001:
return '30,001 - 50,000'
if row['cases'] > 50001:
return '50,001 and higher'

df = df.assign(category=df.apply(set_cat, axis=1))

# assign mp to the geojson data
with open("canada_provinces.geojson", "r") as geo:
mp = json.load(geo)

# Create choropleth map
fig = px.choropleth(df,
locations="cartodb_id",
geojson=mp,
featureidkey="properties.cartodb_id",
color="category",
color_discrete_map={
'0': '#fffcfc',
'1 - 1,000' : '#ffdbdb',
'1,001 - 5,000' : '#ffbaba',
'5,001 - 10,000' : '#ff9e9e',
'10,001 - 30,000' : '#ff7373',
'30,001 - 50,000' : '#ff4d4d',
'50,001 and higher' : '#ff0d0d'},
category_orders={
'category' : [
'0',
'1 - 1,000',
'1,001 - 5,000',
'5,001 - 10,000',
'10,001 - 30,000',
'30,001 - 50,000',
'50,001 and higher'
]
},
animation_frame="timeframe",
scope='north america',
title='<b>COVID-19 cases in Canadian provinces</b>',
labels={'cases' : 'Number of Cases',
'category' : 'Category'},
hover_name='province',
hover_data={
'cases' : True,
'cartodb_id' : False
},
# height=900,
locationmode='geojson-id',
)

# Adjust map layout stylings
fig.update_layout(
showlegend=True,
legend_title_text='<b>Total Number of Cases</b>',
font={"size": 16, "color": "#808080", "family" : "calibri"},
margin={"r":0,"t":40,"l":0,"b":0},
legend=dict(orientation='v'),
geo=dict(bgcolor='rgba(0,0,0,0)', lakecolor='#e0fffe')
)

# Adjust map geo options
fig.update_geos(showcountries=False, showcoastlines=False,
showland=False, fitbounds="locations",
subunitcolor='white')
fig.show()

Few points worth to be mentioned here:

  • The more accurate you want to have your map (color filled layer), the more boundary polygons you may need in your geojson file which results in a bigger geojson file size.
  • When using geojson files, you will need to assign featureidkey to the ids associated with each province in the geojson file.
  • We set category_orders to have the legend sorted in our desired way. If not, the legend will appear with the same order as it is stored in the data frame.
  • animation_frame is the feature that animates the map based on the frames it creates from the associated column. Without setting animation_frame we will have a static map.
  • The more number of frames can result in a heavier visual to render.
  • We can use HTML tags to adjust styling. For example: newline <br>, bold <b></b>, italics <i></i>, hyperlinks <a href='...'></a>,Tags <em>, <sup>, <sub>, <span>.
  • If you are running the 32-bit version of Python and receive “MemoryError”, you probably need to uninstall the 32-bit version and then install the 64-bit Python.

Troubleshooting

By running the above code we will encounter an unusual behavior in the map as shown below:

Map is not functioning as expected

The reason for this issue is that since we are using discrete color setting, in other words mapping each color to each category, we need each frame to include all the possible categories so we will need to assign all of the categories to every single frame in the animation, so we can do the same with the following code:

catg = df['category'].unique()
dts = df['timeframe'].unique()
for tf in dts:
for i in catg:
df = df.append({
'timeframe' : tf,
'cases' : 'N',
'cartodb_id' : '0',
'category' : i
}, ignore_index=True)

Above piece of code will add the distinct categories to each time frame so we can make sure that all of the time frames contain all of the categories.

Final functional code

Eventually the full functional code will be as below:

import pandas as pd
import plotly_express as px
import json

df = pd.read_csv("ca_pr_day_n.csv")
df['category'] = ''

# Categorizing the number of cases and assign each category to each row
def set_cat(row):
if row['cases'] == 0:
return '0'
if row['cases'] > 0 and row['cases'] < 1001:
return '1 - 1,000'
if row['cases'] > 1001 and row['cases'] < 5001:
return '1,001 - 5,000'
if row['cases'] > 5001 and row['cases'] < 10001:
return '5,001 - 10,000'
if row['cases'] > 10001 and row['cases'] < 30001:
return '10,001 - 30,000'
if row['cases'] > 30001 and row['cases'] < 50001:
return '30,001 - 50,000'
if row['cases'] > 50001:
return '50,001 and higher'

df = df.assign(category=df.apply(set_cat, axis=1))

# Adds all available categories to each time frame
catg = df['category'].unique()
dts = df['timeframe'].unique()

for tf in dts:
for i in catg:
df = df.append({
'timeframe' : tf,
'cases' : 'N',
'cartodb_id' : '0',
'category' : i
}, ignore_index=True)

# Assign mp to the geojson data
with open("canada_provinces.geojson", "r") as geo:
mp = json.load(geo)

# Create choropleth map
fig = px.choropleth(df,
locations="cartodb_id",
geojson=mp,
featureidkey="properties.cartodb_id",
color="category",
color_discrete_map={
'0': '#fffcfc',
'1 - 1,000' : '#ffdbdb',
'1,001 - 5,000' : '#ffbaba',
'5,001 - 10,000' : '#ff9e9e',
'10,001 - 30,000' : '#ff7373',
'30,001 - 50,000' : '#ff4d4d',
'50,001 and higher' : '#ff0d0d'},
category_orders={
'category' : [
'0',
'1 - 1,000',
'1,001 - 5,000',
'5,001 - 10,000',
'10,001 - 30,000',
'30,001 - 50,000',
'50,001 and higher'
]
},
animation_frame="timeframe",
scope='north america',
title='<b>COVID-19 cases in Canadian provinces</b>',
labels={'cases' : 'Number of Cases',
'category' : 'Category'},
hover_name='province',
hover_data={
'cases' : True,
'cartodb_id' : False
},
# height=900,
locationmode='geojson-id',
)

# Adjust map layout styling
fig.update_layout(
showlegend=True,
legend_title_text='<b>Total Number of Cases</b>',
font={"size": 16, "color": "#808080", "family" : "calibri"},
margin={"r":0,"t":40,"l":0,"b":0},
legend=dict(orientation='v'),
geo=dict(bgcolor='rgba(0,0,0,0)', lakecolor='#e0fffe')
)

# Adjust map geo options
fig.update_geos(showcountries=False, showcoastlines=False,
showland=False, fitbounds="locations",
subunitcolor='white')
fig.show()

The outcome will look and work like this:

Final animated choropleth map with discrete colors

You can check the outcome here. All the files associated with this tutorial is available on my GitHub.

For sure there are other procedures available to make such maps or do the same with more optimized coding. Please feel free to leave a comment and share your opinion/codes.

Conclusion

Plotly Express, as a high-level interface to plotly, which has made it convenient to visualize data in Python. However, in Plotly’s Graph Objects we can have a higher degree of freedom in creating more complex visuals or make more manipulations, but requires more lines of codes and greater effort.

--

--

Mahshad Nejati

Business Intelligence | Data Analytics | F1 Enthusiast