Visualization
We use the standard convention for referencing the matplotlib API:
In [1]: import matplotlib.pyplot as plt In [2]: plt.close('all')
We provide the basics in pandas to easily create decent looking plots. See the ecosystem section for visualization libraries that go beyond the basics documented here.
Note
All calls to np.random
are seeded with 123456.
Basic Plotting: plot
We will demonstrate the basics, see the cookbook for some advanced strategies.
The plot
method on Series and DataFrame is just a simple wrapper around plt.plot()
:
In [3]: ts = pd.Series(np.random.randn(1000), ...: index=pd.date_range('1/1/2000', periods=1000)) ...: In [4]: ts = ts.cumsum() In [5]: ts.plot() Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b22f16550>
If the index consists of dates, it calls gcf().autofmt_xdate()
to try to format the x-axis nicely as per above.
On DataFrame, plot()
is a convenience to plot all of the columns with labels:
In [6]: df = pd.DataFrame(np.random.randn(1000, 4), ...: index=ts.index, columns=list('ABCD')) ...: In [7]: df = df.cumsum() In [8]: plt.figure(); In [9]: df.plot();
You can plot one column versus another using the x
and y
keywords in plot()
:
In [10]: df3 = pd.DataFrame(np.random.randn(1000, 2), columns=['B', 'C']).cumsum() In [11]: df3['A'] = pd.Series(list(range(len(df)))) In [12]: df3.plot(x='A', y='B') Out[12]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b29a8af98>
Note
For more formatting and styling options, see formatting below.
Other Plots
Plotting methods allow for a handful of plot styles other than the default line plot. These methods can be provided as the kind
keyword argument to plot()
, and include:
- ‘bar’ or ‘barh’ for bar plots
- ‘hist’ for histogram
- ‘box’ for boxplot
- ‘kde’ or ‘density’ for density plots
- ‘area’ for area plots
- ‘scatter’ for scatter plots
- ‘hexbin’ for hexagonal bin plots
- ‘pie’ for pie plots
For example, a bar plot can be created the following way:
In [13]: plt.figure(); In [14]: df.iloc[5].plot(kind='bar');
You can also create these other plots using the methods DataFrame.plot.<kind>
instead of providing the kind
keyword argument. This makes it easier to discover plot methods and the specific arguments they use:
In [15]: df = pd.DataFrame() In [16]: df.plot.<TAB> # noqa: E225, E999 df.plot.area df.plot.barh df.plot.density df.plot.hist df.plot.line df.plot.scatter df.plot.bar df.plot.box df.plot.hexbin df.plot.kde df.plot.pie
In addition to these kind
s, there are the DataFrame.hist(), and DataFrame.boxplot() methods, which use a separate interface.
Finally, there are several plotting functions in pandas.plotting
that take a Series
or DataFrame
as an argument. These include:
- Scatter Matrix
- Andrews Curves
- Parallel Coordinates
- Lag Plot
- Autocorrelation Plot
- Bootstrap Plot
- RadViz
Plots may also be adorned with errorbars or tables.
Bar plots
For labeled, non-time series data, you may wish to produce a bar plot:
In [17]: plt.figure(); In [18]: df.iloc[5].plot.bar() Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b2830d048> In [19]: plt.axhline(0, color='k');
Calling a DataFrame’s plot.bar()
method produces a multiple bar plot:
In [20]: df2 = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd']) In [21]: df2.plot.bar();
To produce a stacked bar plot, pass stacked=True
:
In [22]: df2.plot.bar(stacked=True);
To get horizontal bar plots, use the barh
method:
In [23]: df2.plot.barh(stacked=True);
Histograms
Histograms can be drawn by using the DataFrame.plot.hist()
and Series.plot.hist()
methods.
In [24]: df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000), ....: 'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c']) ....: In [25]: plt.figure(); In [26]: df4.plot.hist(alpha=0.5) Out[26]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b23e905f8>
A histogram can be stacked using stacked=True
. Bin size can be changed using the bins
keyword.
In [27]: plt.figure(); In [28]: df4.plot.hist(stacked=True, bins=20) Out[28]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b23ef2208>
You can pass other keywords supported by matplotlib hist
. For example, horizontal and cumulative histograms can be drawn by orientation='horizontal'
and cumulative=True
.
In [29]: plt.figure(); In [30]: df4['a'].plot.hist(orientation='horizontal', cumulative=True) Out[30]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b280a0978>
See the hist
method and the matplotlib hist documentation for more.
The existing interface DataFrame.hist
to plot histogram still can be used.
In [31]: plt.figure(); In [32]: df['A'].diff().hist() Out[32]: <matplotlib.axes._subplots.AxesSubplot at 0x7f2b222aab00>
DataFrame.hist()
plots the histograms of the columns on multiple subplots:
In [33]: plt.figure() Out[33]: <Figure size 640x480 with 0 Axes> In [34]: df.diff().hist(color='k', alpha=0.5, bins=50)
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.24.2/user_guide/visualization.html