pandas arrays
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index
, Series
, or DataFrame
.
For some data types, pandas extends NumPy’s type system. String aliases for these types can be found at dtypes.
Kind of Data | pandas Data Type | Scalar | Array |
---|---|---|---|
TZ-aware datetime | |||
Timedeltas | (none) | ||
Period (time spans) | |||
Intervals | |||
Nullable Integer |
| (none) | |
Categorical | (none) | ||
Sparse | (none) | ||
Strings | |||
Boolean (with NA) |
pandas and third-party libraries can extend NumPy’s type system (see Extension types). The top-level array()
method can be used to create a new array, which may be stored in a Series
, Index
, or as a column in a DataFrame
.
| Create an array. |
Datetime data
NumPy cannot natively represent timezone-aware datetimes. pandas supports this with the arrays.DatetimeArray
extension array, which can hold timezone-naive or timezone-aware values.
Timestamp
, a subclass of datetime.datetime
, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.
| Pandas replacement for python datetime.datetime object. |
Properties
Return numpy datetime64 format in nanoseconds. | |
Return day of the week. | |
Return day of the week. | |
Return the day of the year. | |
Return the day of the year. | |
Return the number of days in the month. | |
Return the number of days in the month. | |
Return True if year is a leap year. | |
Return True if date is last day of month. | |
Return True if date is first day of month. | |
Return True if date is last day of the quarter. | |
Return True if date is first day of the quarter. | |
Return True if date is last day of the year. | |
Return True if date is first day of the year. | |
Return the quarter of the year. | |
Alias for tzinfo. | |
Return the week number of the year. | |
Return the week number of the year. | |
Methods
Convert tz-aware Timestamp to another time zone. | |
| Return a new Timestamp ceiled to this resolution. |
| Combine date, time into datetime with same date and time fields. |
Return ctime() style string. | |
Return date object with same year, month and day. | |
Return the day name of the Timestamp with specified locale. | |
Return self.tzinfo.dst(self). | |
| Return a new Timestamp floored to this resolution. |
Return the total number of days in the month. | |
| Passed an ordinal, translate and convert to a ts. |
Transform timestamp[, tz] to tz's local time from POSIX timestamp. | |
Return a 3-tuple containing ISO year, week number, and weekday. | |
[sep] -> string in ISO 8601 format, YYYY-MM-DDT[HH[:MM[:SS[.mmm[uuu]]]]][+HH:MM]. | |
Return the day of the week represented by the date. | |
Return the month name of the Timestamp with specified locale. | |
Normalize Timestamp to midnight, preserving tz information. | |
| Return new Timestamp object representing current time local to tz. |
| Implements datetime.replace, handles nanoseconds. |
| Round the Timestamp to the specified resolution. |
| Return a string representing the given POSIX timestamp controlled by an explicit format string. |
| Function is not implemented. |
Return time object with same time but with tzinfo=None. | |
Return POSIX timestamp as float. | |
Return time tuple, compatible with time.localtime(). | |
Return time object with same time and tzinfo. | |
Return a numpy.datetime64 object with 'ns' precision. | |
Convert the Timestamp to a NumPy datetime64. | |
Convert TimeStamp to a Julian Date. | |
Return an period of which this timestamp is an observation. | |
Convert a Timestamp object to a native Python datetime object. | |
| Return the current time in the local timezone. |
Return proleptic Gregorian ordinal. | |
Convert tz-aware Timestamp to another time zone. | |
| Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp. |
Return self.tzinfo.tzname(self). | |
Construct a naive UTC datetime from a POSIX timestamp. | |
Return a new Timestamp representing UTC day and time. | |
Return self.tzinfo.utcoffset(self). | |
Return UTC time tuple, compatible with time.localtime(). | |
Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray
. For timezone-aware data, the .dtype
of a DatetimeArray
is a DatetimeTZDtype
. For timezone-naive data, np.dtype("datetime64[ns]")
is used.
If the data are tz-aware, then every value in the array must have the same timezone.
| Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
| An ExtensionDtype for timezone-aware datetime data. |
Timedelta data
NumPy can natively represent timedeltas. pandas provides Timedelta
for symmetry with Timestamp
.
| Represents a duration, the difference between two dates or times. |
Properties
Return a numpy timedelta64 array scalar view. | |
Return a components namedtuple-like. | |
Number of days. | |
Return the timedelta in nanoseconds (ns), for internal compatibility. | |
Number of microseconds (>= 0 and less than 1 second). | |
Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. | |
Number of seconds (>= 0 and less than 1 day). | |
Array view compatibility. |
Methods
| Return a new Timedelta ceiled to this resolution. |
| Return a new Timedelta floored to this resolution. |
Format Timedelta as ISO 8601 Duration like | |
| Round the Timedelta to the specified resolution. |
Convert a pandas Timedelta object into a python timedelta object. | |
Return a numpy.timedelta64 object with 'ns' precision. | |
Convert the Timedelta to a NumPy timedelta64. | |
Total seconds in the duration. |
A collection of timedeltas may be stored in a TimedeltaArray
.
| Pandas ExtensionArray for timedelta data. |
Timespan data
pandas represents spans of times as Period
objects.
Period
| Represents a period of time. |
Properties
Get day of the month that a Period falls on. | |
Day of the week the period lies in, with Monday=0 and Sunday=6. | |
Day of the week the period lies in, with Monday=0 and Sunday=6. | |
Return the day of the year. | |
Return the day of the year. | |
Get the total number of days in the month that this period falls on. | |
Get the total number of days of the month that the Period falls in. | |
Get the hour of the day component of the Period. | |
Get minute of the hour component of the Period. | |
Fiscal year the Period lies in according to its starting-quarter. | |
Get the second component of the Period. | |
Get the Timestamp for the start of the period. | |
Get the week of the year on the given Period. | |
Day of the week the period lies in, with Monday=0 and Sunday=6. | |
Methods
Convert Period to desired frequency, at the start or end of the interval. | |
Returns the string representation of the | |
Return the Timestamp representation of the Period. |
A collection of timedeltas may be stored in a arrays.PeriodArray
. Every period in a PeriodArray
must have the same freq
.
| Pandas ExtensionArray for storing Period data. |
| An ExtensionDtype for Period data. |
Interval data
Arbitrary intervals can be represented as Interval
objects.
Immutable object implementing an Interval, a bounded slice-like interval. |
Properties
Whether the interval is closed on the left-side, right-side, both or neither. | |
Check if the interval is closed on the left side. | |
Check if the interval is closed on the right side. | |
Indicates if an interval is empty, meaning it contains no points. | |
Left bound for the interval. | |
Return the length of the Interval. | |
Return the midpoint of the Interval. | |
Check if the interval is open on the left side. | |
Check if the interval is open on the right side. | |
Check whether two Interval objects overlap. | |
Right bound for the interval. |
A collection of intervals may be stored in an arrays.IntervalArray
.
| Pandas array for interval data that are closed on the same side. |
| An ExtensionDtype for Interval data. |
Nullable integer
numpy.ndarray
cannot natively represent integer-data with missing values. pandas provides this through arrays.IntegerArray
.
| Array of integer (optional missing) values. |
An ExtensionDtype for int8 integer data. | |
An ExtensionDtype for int16 integer data. | |
An ExtensionDtype for int32 integer data. | |
An ExtensionDtype for int64 integer data. | |
An ExtensionDtype for uint8 integer data. | |
An ExtensionDtype for uint16 integer data. | |
An ExtensionDtype for uint32 integer data. | |
An ExtensionDtype for uint64 integer data. |
Categorical data
pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical
can be described by a pandas.api.types.CategoricalDtype
.
| Type for categorical data with the categories and orderedness. |
An | |
Whether the categories have an ordered relationship. |
Categorical data can be stored in a pandas.Categorical
| Represent a categorical variable in classic R / S-plus fashion. |
The alternative Categorical.from_codes()
constructor can be used when you have the categories and integer codes already:
| Make a Categorical type from codes and categories or dtype. |
The dtype information is available on the Categorical
The | |
The categories of this categorical. | |
Whether the categories have an ordered relationship. | |
The category codes of this categorical. |
np.asarray(categorical)
works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
| The numpy array interface. |
A Categorical
can be stored in a Series
or DataFrame
. To create a Series of dtype category
, use cat = s.astype(dtype)
or Series(..., dtype=dtype)
where dtype
is either
the string
'category'
an instance of
CategoricalDtype
.
If the Series is of dtype CategoricalDtype
, Series.cat
can be used to change the categorical data. See Categorical accessor for more.
Sparse data
Data where a single value is repeated many times (e.g. 0
or NaN
) may be stored efficiently as a arrays.SparseArray
.
| An ExtensionArray for storing sparse data. |
| Dtype for data stored in |
The Series.sparse
accessor may be used to access sparse-specific attributes and methods if the Series
contains sparse values. See Sparse accessor for more.
Text data
When working with text data, where each valid element is a string or missing, we recommend using StringDtype
(with the alias "string"
).
| Extension array for string data. |
| Extension array for string data in a |
| Extension dtype for string data. |
The Series.str
accessor is available for Series
backed by a arrays.StringArray
. See String handling for more.
Boolean data with missing values
The boolean dtype (with the alias "boolean"
) provides support for storing boolean data (True, False values) with missing values, which is not possible with a bool numpy.ndarray
.
| Array of boolean (True/False) data with missing values. |
Extension dtype for boolean data. |
© 2008–2021, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/1.3.4/reference/arrays.html