pandas.crosstab
-
pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name='All', dropna=True, normalize=False)
[source] -
Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed
Parameters: index : array-like, Series, or list of arrays/Series
Values to group by in the rows
columns : array-like, Series, or list of arrays/Series
Values to group by in the columns
values : array-like, optional
Array of values to aggregate according to the factors. Requires
aggfunc
be specified.aggfunc : function, optional
If specified, requires
values
be specified as wellrownames : sequence, default None
If passed, must match number of row arrays passed
colnames : sequence, default None
If passed, must match number of column arrays passed
margins : boolean, default False
Add row/column margins (subtotals)
margins_name : string, default ‘All’
Name of the row / column that will contain the totals when margins is True.
New in version 0.21.0.
dropna : boolean, default True
Do not include columns whose entries are all NaN
normalize : boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False
Normalize by dividing all values by the sum of values.
- If passed ‘all’ or
True
, will normalize over all values. - If passed ‘index’ will normalize over each row.
- If passed ‘columns’ will normalize over each column.
- If margins is
True
, will also normalize margin values.
New in version 0.18.1.
Returns: crosstab : DataFrame
Notes
Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified.
Any input passed containing Categorical data will have all of its categories included in the cross-tabulation, even if the actual data does not contain any instances of a particular category.
In the event that there aren’t overlapping indexes an empty DataFrame will be returned.
Examples
>>> a = np.array(["foo", "foo", "foo", "foo", "bar", "bar", ... "bar", "bar", "foo", "foo", "foo"], dtype=object) >>> b = np.array(["one", "one", "one", "two", "one", "one", ... "one", "two", "two", "two", "one"], dtype=object) >>> c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", ... "shiny", "dull", "shiny", "shiny", "shiny"], ... dtype=object)
>>> pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c']) ... b one two c dull shiny dull shiny a bar 1 2 1 0 foo 2 2 1 2
>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c']) >>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f']) >>> crosstab(foo, bar) # 'c' and 'f' are not represented in the data, ... # but they still will be counted in the output ... col_0 d e f row_0 a 1 0 0 b 0 1 0 c 0 0 0
- If passed ‘all’ or
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.22.0/generated/pandas.crosstab.html