Pandas Arrays

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system.

Kind of Data Pandas Data Type Scalar Array
TZ-aware datetime DatetimeTZDtype Timestamp Datetime Data
Timedeltas (none) Timedelta Timedelta Data
Period (time spans) PeriodDtype Period Timespan Data
Intervals IntervalDtype Interval Interval Data
Nullable Integer Int64Dtype, … (none) Nullable Integer
Categorical CategoricalDtype (none) Categorical Data
Sparse SparseDtype (none) Sparse Data

Pandas and third-party libraries can extend NumPy’s type system (see Extension Types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

array(data[, dtype, copy]) Create an array.

Datetime Data

NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.

Timestamp Pandas replacement for datetime.datetime

Properties

Timestamp.asm8
Timestamp.day
Timestamp.dayofweek
Timestamp.dayofyear
Timestamp.days_in_month
Timestamp.daysinmonth
Timestamp.fold
Timestamp.hour
Timestamp.is_leap_year
Timestamp.is_month_end
Timestamp.is_month_start
Timestamp.is_quarter_end
Timestamp.is_quarter_start
Timestamp.is_year_end
Timestamp.is_year_start
Timestamp.max
Timestamp.microsecond
Timestamp.min
Timestamp.minute
Timestamp.month
Timestamp.nanosecond
Timestamp.quarter
Timestamp.resolution Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state
Timestamp.second
Timestamp.tz Alias for tzinfo
Timestamp.tzinfo
Timestamp.value
Timestamp.week
Timestamp.weekofyear
Timestamp.year

Methods

Timestamp.astimezone Convert tz-aware Timestamp to another time zone.
Timestamp.ceil return a new Timestamp ceiled to this resolution
Timestamp.combine(date, time) date, time -> datetime with same date and time fields
Timestamp.ctime Return ctime() style string.
Timestamp.date Return date object with same year, month and day.
Timestamp.day_name Return the day name of the Timestamp with specified locale.
Timestamp.dst Return self.tzinfo.dst(self).
Timestamp.floor return a new Timestamp floored to this resolution
Timestamp.freq
Timestamp.freqstr
Timestamp.fromordinal(ordinal[, freq, tz]) passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself
Timestamp.fromtimestamp(ts) timestamp[, tz] -> tz’s local time from POSIX timestamp.
Timestamp.isocalendar Return a 3-tuple containing ISO year, week number, and weekday.
Timestamp.isoformat
Timestamp.isoweekday Return the day of the week represented by the date.
Timestamp.month_name Return the month name of the Timestamp with specified locale.
Timestamp.normalize Normalize Timestamp to midnight, preserving tz information.
Timestamp.now([tz]) Returns new Timestamp object representing current time local to tz.
Timestamp.replace implements datetime.replace, handles nanoseconds
Timestamp.round Round the Timestamp to the specified resolution
Timestamp.strftime format -> strftime() style string.
Timestamp.strptime string, format -> new datetime parsed from a string (like time.strptime()).
Timestamp.time Return time object with same time but with tzinfo=None.
Timestamp.timestamp Return POSIX timestamp as float.
Timestamp.timetuple Return time tuple, compatible with time.localtime().
Timestamp.timetz Return time object with same time and tzinfo.
Timestamp.to_datetime64 Returns a numpy.datetime64 object with ‘ns’ precision
Timestamp.to_julian_date Convert TimeStamp to a Julian Date.
Timestamp.to_period Return an period of which this timestamp is an observation.
Timestamp.to_pydatetime Convert a Timestamp object to a native Python datetime object.
Timestamp.today(cls[, tz]) Return the current time in the local timezone.
Timestamp.toordinal Return proleptic Gregorian ordinal.
Timestamp.tz_convert Convert tz-aware Timestamp to another time zone.
Timestamp.tz_localize Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.
Timestamp.tzname Return self.tzinfo.tzname(self).
Timestamp.utcfromtimestamp(ts) Construct a naive UTC datetime from a POSIX timestamp.
Timestamp.utcnow() Return a new Timestamp representing UTC day and time.
Timestamp.utcoffset Return self.tzinfo.utcoffset(self).
Timestamp.utctimetuple Return UTC time tuple, compatible with time.localtime().
Timestamp.weekday Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are tz-aware, then every value in the array must have the same timezone.

arrays.DatetimeArray(values[, dtype, freq, copy]) Pandas ExtensionArray for tz-naive or tz-aware datetime data.
DatetimeTZDtype([unit, tz]) A np.dtype duck-typed class, suitable for holding a custom datetime with tz dtype.

Timedelta Data

NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.

Timedelta Represents a duration, the difference between two dates or times.

Properties

Timedelta.asm8 Return a numpy timedelta64 array scalar view.
Timedelta.components Return a Components NamedTuple-like
Timedelta.days Number of days.
Timedelta.delta Return the timedelta in nanoseconds (ns), for internal compatibility.
Timedelta.freq
Timedelta.is_populated
Timedelta.max
Timedelta.microseconds Number of microseconds (>= 0 and less than 1 second).
Timedelta.min
Timedelta.nanoseconds Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
Timedelta.resolution Return a string representing the lowest timedelta resolution.
Timedelta.seconds Number of seconds (>= 0 and less than 1 day).
Timedelta.value
Timedelta.view array view compat

Methods

Timedelta.ceil return a new Timedelta ceiled to this resolution
Timedelta.floor return a new Timedelta floored to this resolution
Timedelta.isoformat Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values.
Timedelta.round Round the Timedelta to the specified resolution
Timedelta.to_pytimedelta return an actual datetime.timedelta object note: we lose nanosecond resolution if any
Timedelta.to_timedelta64 Returns a numpy.timedelta64 object with ‘ns’ precision
Timedelta.total_seconds Total duration of timedelta in seconds (to ns precision)

A collection of timedeltas may be stored in a TimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, …]) Pandas ExtensionArray for timedelta data.

Timespan Data

Pandas represents spans of times as Period objects.

Period

Period Represents a period of time

Properties

Period.day Get day of the month that a Period falls on.
Period.dayofweek Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.dayofyear Return the day of the year.
Period.days_in_month Get the total number of days in the month that this period falls on.
Period.daysinmonth Get the total number of days of the month that the Period falls in.
Period.end_time
Period.freq
Period.freqstr
Period.hour Get the hour of the day component of the Period.
Period.is_leap_year
Period.minute Get minute of the hour component of the Period.
Period.month
Period.ordinal
Period.quarter
Period.qyear Fiscal year the Period lies in according to its starting-quarter.
Period.second Get the second component of the Period.
Period.start_time Get the Timestamp for the start of the period.
Period.week Get the week of the year on the given Period.
Period.weekday Day of the week the period lies in, with Monday=0 and Sunday=6.
Period.weekofyear
Period.year

Methods

Period.asfreq Convert Period to desired frequency, either at the start or end of the interval
Period.now
Period.strftime Returns the string representation of the Period, depending on the selected fmt.
Period.to_timestamp Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period

A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.

arrays.DatetimeArray(values[, dtype, freq, copy]) Pandas ExtensionArray for tz-naive or tz-aware datetime data.
PeriodDtype A Period duck-typed class, suitable for holding a period with freq dtype.

Interval Data

Arbitrary intervals can be represented as Interval objects.

Interval Immutable object implementing an Interval, a bounded slice-like interval.

Properties

Interval.closed Whether the interval is closed on the left-side, right-side, both or neither
Interval.closed_left Check if the interval is closed on the left side.
Interval.closed_right Check if the interval is closed on the right side.
Interval.left Left bound for the interval
Interval.length Return the length of the Interval
Interval.mid Return the midpoint of the Interval
Interval.open_left Check if the interval is open on the left side.
Interval.open_right Check if the interval is open on the right side.
Interval.overlaps Check whether two Interval objects overlap.
Interval.right Right bound for the interval

A collection of intervals may be stored in an arrays.IntervalArray.

arrays.IntervalArray Pandas array for interval data that are closed on the same side.
IntervalDtype A Interval duck-typed class, suitable for holding an interval

Nullable Integer

numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.

arrays.IntegerArray(values, mask[, copy]) Array of integer (optional missing) values.
Int8Dtype

Attributes

Int16Dtype

Attributes

Int32Dtype

Attributes

Int64Dtype

Attributes

UInt8Dtype

Attributes

UInt16Dtype

Attributes

UInt32Dtype

Attributes

UInt64Dtype

Attributes

Categorical Data

Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.

CategoricalDtype([categories, ordered]) Type for categorical data with the categories and orderedness
CategoricalDtype.categories An Index containing the unique categories allowed.
CategoricalDtype.ordered Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, …]) Represents a categorical variable in classic R / S-plus fashion

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Categorical.from_codes(codes[, categories, …]) Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

Categorical.dtype The CategoricalDtype for this instance
Categorical.categories The categories of this categorical.
Categorical.ordered Whether the categories have an ordered relationship.
Categorical.codes The category codes of this categorical.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Categorical.__array__([dtype]) The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

  • the string 'category'
  • an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical Accessor for more.

Sparse Data

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a SparseArray.

SparseArray(data[, sparse_index, index, …]) An ExtensionArray for storing sparse data.
SparseDtype([dtype, fill_value]) Dtype for data stored in SparseArray.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse Accessor for more.

© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/arrays.html