Time Series / Date functionality
pandas has proven very successful as a tool for working with time series data, especially in the financial data analysis space. Using the NumPy datetime64
and timedelta64
dtypes, we have consolidated a large number of features from other Python libraries like scikits.timeseries
as well as created a tremendous amount of new functionality for manipulating time series data.
In working with time series data, we will frequently seek to:
- generate sequences of fixed-frequency dates and time spans
- conform or convert time series to a particular frequency
- compute “relative” dates based on various non-standard time increments (e.g. 5 business days before the last business day of the year), or “roll” dates forward or backward
pandas provides a relatively compact and self-contained set of tools for performing the above tasks.
Create a range of dates:
# 72 hours starting with midnight Jan 1st, 2011 In [1]: rng = pd.date_range('1/1/2011', periods=72, freq='H') In [2]: rng[:5] Out[2]: DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00', '2011-01-01 04:00:00'], dtype='datetime64[ns]', freq='H')
Index pandas objects with dates:
In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng) In [4]: ts.head() Out[4]: 2011-01-01 00:00:00 0.469112 2011-01-01 01:00:00 -0.282863 2011-01-01 02:00:00 -1.509059 2011-01-01 03:00:00 -1.135632 2011-01-01 04:00:00 1.212112 Freq: H, dtype: float64
Change frequency and fill gaps:
# to 45 minute frequency and forward fill In [5]: converted = ts.asfreq('45Min', method='pad') In [6]: converted.head() Out[6]: 2011-01-01 00:00:00 0.469112 2011-01-01 00:45:00 0.469112 2011-01-01 01:30:00 -0.282863 2011-01-01 02:15:00 -1.509059 2011-01-01 03:00:00 -1.135632 Freq: 45T, dtype: float64
Resample the series to a daily frequency:
# Daily means In [7]: ts.resample('D').mean() Out[7]: 2011-01-01 -0.319569 2011-01-02 -0.337703 2011-01-03 0.117258 Freq: D, dtype: float64
Overview
The following table shows the type of time-related classes pandas can handle and how to create them.
Class | Remarks | How to create |
---|---|---|
Timestamp | Represents a single timestamp |
to_datetime , Timestamp
|
DatetimeIndex | Index of Timestamp
|
to_datetime , date_range , bdate_range , DatetimeIndex
|
Period | Represents a single time span | Period |
PeriodIndex | Index of Period
|
period_range , PeriodIndex
|
Timestamps vs. Time Spans
Timestamped data is the most basic type of time series data that associates values with points in time. For pandas objects it means using the points in time.
In [8]: pd.Timestamp(datetime(2012, 5, 1)) Out[8]: Timestamp('2012-05-01 00:00:00') In [9]: pd.Timestamp('2012-05-01')
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.23.4/timeseries.html