Print

47 of 100: Stacked bar chart in matplotlib

At the beginning of the year I challenged myself to create all 100 visualizations using python and matplotlib from the 1 dataset,100 visualizations project and I am sharing with you the code for all the visualizations.

Note: Data Viz Project is copyright Ferdio and available under a Creative Commons Attribution – Non Commercial – No Derivatives 4.0 International license. I asked Ferdio and they told me they used a Design tool to create all the plots.

Collaborate

There are a ton of improvements that can be made on the code, so let me know in the comments any improvements you make and I will update the post accordingly!

This is the original viz that we are trying to recreate in matplotlib:

Import the packages

We will need the following packages:

import matplotlib.pyplot as plt
import pandas as pd

Generate the data

We could actually go from numpy to matplotlib, but most data projects use pandas to transform the data, so I am using a pandas dataframe as the starting point.

color_dict = {(2022,"Norway"): "#9194A3", (2004,"Norway"): "#2B314D",
              (2022,"Denmark"): "#E2AFA5", (2004,"Denmark"): "#A54836",
              (2022,"Sweden"): "#C4D6F8", (2004,"Sweden"): "#5375D4",
              }

xy_ticklabel_color, xlabel_color, grand_totals_color, grid_color, datalabels_color ='#C8C9C9',"#101628","#101628", "#C8C9C9", "#2B314D"

data = {
    "year": [2004, 2022, 2004, 2022, 2004, 2022],
    "countries" : [ "Denmark", "Denmark", "Norway", "Norway","Sweden", "Sweden",],
    "sites": [4,10,5,8,13,15]
}
df= pd.DataFrame(data)
indexyearcountriessites
02004Sweden13
12022Sweden15
22004Denmark4
32022Denmark10
42004Norway5
52022Norway8

We need to create the subtotals for each year, the year labels and then sort the data.

df['year_lbl'] ="'"+df['year'].astype(str).str[-2:].astype(str)
df['sub_total'] = df.groupby('countries')['sites'].transform('sum')

#custom sort
sort_order_dict = {"Denmark":1, "Sweden":3, "Norway":2, 2004:5, 2022:4}
df = df.sort_values(by=['year','countries',], key=lambda x: x.map(sort_order_dict))
#Add the color based on the color dictionary
df['color'] = df.set_index(['year', 'countries']).index.map(color_dict.get)
indexyearcountriessitesyear_lblsub_totalcolor
12022Denmark10’2214#E2AFA5
32022Norway8’2213#9194A3
52022Sweden15’2228#C4D6F8
02004Denmark4’0414#A54836
22004Norway5’0413#2B314D
42004Sweden13’0428#5375D4

Define the variables

countries = df.countries.unique()
years = df.year.unique()
x = len(df.countries.unique())
codes = df.year_lbl
colors = df.color

Plot the chart

fig, ax = plt.subplots(figsize=(5,5),facecolor = "#FFFFFF")
fig.tight_layout(pad=3.0)

for year in zip(years,):
    y = df[df["year"] == year]["sites"].values
    ax.bar(countries, y,width =0.5)


for bar, color, code in zip(ax.patches, colors, codes):
    bar.set_facecolor(color)
    ax.text(
        bar.get_x() + bar.get_width() / 2, 
        bar.get_height()- 0.2 + bar.get_y(),  #height
        code,
            ha="center", va="top",
        color = "w", weight= "light",)

ax.tick_params(axis='x', which='major', length=0, labelsize=14,colors= xy_ticklabel_color,pad =15)
ax.tick_params(axis='y', which='major',  labelsize=14,colors= xy_ticklabel_color,pad =15)
ax.set_ylim(0,16)
ax.set_axisbelow(True) #set the grid lines in the BACK
ax.spines[['top','left','right','bottom']].set_visible(False)
ax.axhline(y=0, xmin =0, xmax=1, color = xy_ticklabel_color)

The result:

47 of 100: Stacked bar chart in matplotlib
Was this helpful?

Reader Interactions

Leave a Reply

Your email address will not be published. Required fields are marked *

Table of Contents