How to plot stacked event duration (Gantt Charts) using Python Pandas

I have a Pandas DataFrame containing the date that a stream gage started measuring flow and the date that the station was decommissioned. I want to generate a plot showing these dates graphically. Here is a sample of my DataFrame:

import pandas as pd

data = {'index': [40623, 40637, 40666, 40697, 40728, 40735, 40742, 40773, 40796, 40819, 40823, 40845, 40867, 40887, 40945, 40964, 40990, 41040, 41091, 41100], 'StationId': ['UTAHDWQ-5932100', 'UTAHDWQ-5932230', 'UTAHDWQ-5932240', 'UTAHDWQ-5932250', 'UTAHDWQ-5932253', 'UTAHDWQ-5932254', 'UTAHDWQ-5932280', 'UTAHDWQ-5932290', 'UTAHDWQ-5932750', 'UTAHDWQ-5983753', 'UTAHDWQ-5983754', 'UTAHDWQ-5983755', 'UTAHDWQ-5983756', 'UTAHDWQ-5983757', 'UTAHDWQ-5983759', 'UTAHDWQ-5983760', 'UTAHDWQ-5983775', 'UTAHDWQ-5989066', 'UTAHDWQ-5996780', 'UTAHDWQ-5996800'], 'amin': ['1994-07-19 13:15:00', '2006-03-16 13:55:00', '1980-10-31 16:00:00', '1981-06-11 17:45:00', '2006-06-28 13:15:00', '2006-06-28 13:55:00', '1981-06-11 15:30:00', '1992-06-10 15:45:00', '2005-10-03 16:30:00', '2006-04-25 09:56:00', '2006-04-25 11:05:00', '2006-04-25 13:50:00', '2006-04-25 14:20:00', '2006-04-25 12:45:00', '2008-04-08 13:03:00', '2008-04-08 13:15:00', '2008-04-15 12:47:00', '2005-10-04 10:15:00', '1995-03-09 13:59:00', '1995-03-09 15:13:00'], 'amax': ['1998-06-30 14:51:00', '2007-01-24 12:55:00', '2007-07-31 11:35:00', '1990-08-01 08:30:00', '2007-01-24 13:35:00', '2007-01-24 14:05:00', '2006-08-22 16:00:00', '1998-06-30 11:33:00', '2005-10-22 15:00:00', '2006-04-25 10:00:00', '2008-04-08 12:16:00', '2008-04-08 09:10:00', '2008-04-08 09:30:00', '2008-04-08 11:27:00', '2008-04-08 13:05:00', '2008-04-08 13:23:00', '2009-04-07 13:15:00', '2005-10-05 11:40:00', '1996-03-14 10:40:00', '1996-03-14 11:05:00']}
df = pd.DataFrame(data)
df.set_index('index', inplace=True)

# display(df.head())

             StationId                 amin                 amax
index                                                           
40623  UTAHDWQ-5932100  1994-07-19 13:15:00  1998-06-30 14:51:00
40637  UTAHDWQ-5932230  2006-03-16 13:55:00  2007-01-24 12:55:00
40666  UTAHDWQ-5932240  1980-10-31 16:00:00  2007-07-31 11:35:00
40697  UTAHDWQ-5932250  1981-06-11 17:45:00  1990-08-01 08:30:00
40728  UTAHDWQ-5932253  2006-06-28 13:15:00  2007-01-24 13:35:00

I want to create a plot similar to this (please note that I did not make this plot using the above data):
How to plot stacked event duration (Gantt Charts) using Python Pandas

The plot does not have to have the text shown along each line, just the y-axis with station names.

While this may seem like a niche application of pandas, I know several scientists that would benefit from this plotting ability.

The closest answer I could find is here:

The last answer is closest to suiting my needs.

While I would prefer a way to do it through the Pandas wrapper, I would be open and grateful to a straight matplotlib solution.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

  • I think you are trying to create a gantt plot.
  • This suggests using hlines
  • Tested in matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dt

# using df from the OP

# convert columns to a datetime dtype
df.amin = pd.to_datetime(df.amin)
df.amax = pd.to_datetime(df.amax)

fig, ax = plt.subplots(figsize=(8, 5))
ax = ax.xaxis_date()
ax = plt.hlines(df.index, dt.date2num(df.amin), dt.date2num(df.amax))

How to plot stacked event duration (Gantt Charts) using Python Pandas

  • The following code also works
# using df from the OP

df.amin = pd.to_datetime(df.amin)
df.amax = pd.to_datetime(df.amax)

fig, ax = plt.subplots(figsize=(8, 5))
ax = plt.hlines(df.index, df.amin, df.amax)

Method 2

You can use Bokeh (a python library) to make gantt chart and its really beautiful.
Here is a code I copied from a twiiter.
http://nbviewer.jupyter.org/gist/quebbs/10416d9fb954020688f2

from bokeh.plotting import figure, show, output_notebook, output_file
from bokeh.models import ColumnDataSource, Range1d
from bokeh.models.tools import HoverTool
from datetime import datetime
from bokeh.charts import Bar
output_notebook()
#output_file('GanntChart.html') #use this to create a standalone html file to send to others
import pandas as ps

DF=ps.DataFrame(columns=['Item','Start','End','Color'])
Items=[
    ['Contract Review & Award','2015-7-22','2015-8-7','red'],
    ['Submit SOW','2015-8-10','2015-8-14','gray'],
    ['Initial Field Study','2015-8-17','2015-8-21','gray'],
    ['Topographic Procesing','2015-9-1','2016-6-1','gray'],
    ['Init. Hydrodynamic Modeling','2016-1-2','2016-3-15','gray'],
    ['Prepare Suitability Curves','2016-2-1','2016-3-1','gray'],
    ['Improvement Conceptual Designs','2016-5-1','2016-6-1','gray'],
    ['Retrieve Water Level Data','2016-8-15','2016-9-15','gray'],
    ['Finalize Hydrodynamic Models','2016-9-15','2016-10-15','gray'],
    ['Determine Passability','2016-9-15','2016-10-1','gray'],
    ['Finalize Improvement Concepts','2016-10-1','2016-10-31','gray'],
    ['Stakeholder Meeting','2016-10-20','2016-10-21','blue'],
    ['Completion of Project','2016-11-1','2016-11-30','red']
    ] #first items on bottom

for i,Dat in enumerate(Items[::-1]):
    DF.loc[i]=Dat

#convert strings to datetime fields:
DF['Start_dt']=ps.to_datetime(DF.Start)
DF['End_dt']=ps.to_datetime(DF.End)


G=figure(title='Project Schedule',x_axis_type='datetime',width=800,height=400,y_range=DF.Item.tolist(),
        x_range=Range1d(DF.Start_dt.min(),DF.End_dt.max()), tools='save')

hover=HoverTool(tooltips="Task: @Item<br>
Start: @Start<br>
End: @End")
G.add_tools(hover)

DF['ID']=DF.index+0.8
DF['ID1']=DF.index+1.2
CDS=ColumnDataSource(DF)
G.quad(left='Start_dt', right='End_dt', bottom='ID', top='ID1',source=CDS,color="Color")
#G.rect(,"Item",source=CDS)
show(G)

Method 3

It’s possible to do this with horizontal bars too: broken_barh(xranges, yrange, **kwargs)

Method 4

While I do not know of any way to do this in MatplotLib, you may want to take a look at options with visualizing the data in the way you want by using D3, for example, with this library:

https://github.com/jiahuang/d3-timeline

If you must do it with Matplotlib, here is one way in which it has been done:

Matplotlib timelines


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x