Time Series / Date functionality
pandas has proven very successful as a tool for working with time series data, especially in the financial data analysis space. Using the NumPy datetime64
and timedelta64
dtypes, we have consolidated a large number of features from other Python libraries like scikits.timeseries
as well as created a tremendous amount of new functionality for manipulating time series data.
In working with time series data, we will frequently seek to:
- generate sequences of fixed-frequency dates and time spans
- conform or convert time series to a particular frequency
- compute “relative” dates based on various non-standard time increments (e.g. 5 business days before the last business day of the year), or “roll” dates forward or backward
pandas provides a relatively compact and self-contained set of tools for performing the above tasks.
Create a range of dates:
# 72 hours starting with midnight Jan 1st, 2011 In [1]: rng = pd.date_range('1/1/2011', periods=72, freq='H') In [2]: rng[:5] Out[2]: DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00', '2011-01-01 02:00:00', '2011-01-01 03:00:00', '2011-01-01 04:00:00'], dtype='datetime64[ns]', freq='H')
Index pandas objects with dates:
In [3]: ts = pd.Series(np.random.randn(len(rng)), index=rng) In [4]: ts.head() Out[4]: 2011-01-01 00:00:00 0.469112 2011-01-01 01:00:00 -0.282863 2011-01-01 02:00:00 -1.509059 2011-01-01 03:00:00 -1.135632 2011-01-01 04:00:00 1.212112 Freq: H, dtype: float64
Change frequency and fill gaps:
# to 45 minute frequency and forward fill In [5]: converted = ts.asfreq('45Min', method='pad') In [6]: converted.head() Out[6]: 2011-01-01 00:00:00 0.469112 2011-01-01 00:45:00 0.469112 2011-01-01 01:30:00 -0.282863 2011-01-01 02:15:00 -1.509059 2011-01-01 03:00:00 -1.135632 Freq: 45T, dtype: float64
Resample:
# Daily means In [7]: ts.resample('D').mean() Out[7]: 2011-01-01 -0.319569 2011-01-02 -0.337703 2011-01-03 0.117258 Freq: D, dtype: float64
Overview
Following table shows the type of time-related classes pandas can handle and how to create them.
Class | Remarks | How to create |
---|---|---|
Timestamp | Represents a single time stamp |
to_datetime , Timestamp
|
DatetimeIndex | Index of Timestamp
|
to_datetime , date_range , DatetimeIndex
|
Period | Represents a single time span | Period |
PeriodIndex | Index of Period
|
period_range , PeriodIndex
|
Time Stamps vs. Time Spans
Time-stamped data is the most basic type of timeseries data that associates values with points in time. For pandas objects it means using the points in time.
In [8]: pd.Timestamp(datetime(2012, 5, 1)) Out[8]: Timestamp('2012-05-01 00:00:00') In [9]: pd.Timestamp('2012-05-01')
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.20.3/timeseries.html