pandas.DataFrame.replace
-
DataFrame.replace(to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')
[source] -
Replace values given in
to_replace
withvalue
.Values of the DataFrame are replaced with other values dynamically. This differs from updating with
.loc
or.iloc
, which require you to specify a location to update with some value.Parameters: to_replace : str, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.
-
numeric, str or regex:
- numeric: numeric values equal to
to_replace
will be replaced withvalue
- str: string exactly matching
to_replace
will be replaced withvalue
- regex: regexs matching
to_replace
will be replaced withvalue
- numeric: numeric values equal to
-
list of str, regex, or numeric:
- First, if
to_replace
andvalue
are both lists, they must be the same length. - Second, if
regex=True
then all of the strings in both lists will be interpreted as regexs otherwise they will match directly. This doesn’t matter much forvalue
since there are only a few possible substitution regexes you can use. - str, regex and numeric rules apply as above.
- First, if
-
dict:
- Dicts can be used to specify different replacement values for different existing values. For example,
{'a': 'b', 'y': 'z'}
replaces the value ‘a’ with ‘b’ and ‘y’ with ‘z’. To use a dict in this way thevalue
parameter should beNone
. - For a DataFrame a dict can specify that different values should be replaced in different columns. For example,
{'a': 1, 'b': 'z'}
looks for the value 1 in column ‘a’ and the value ‘z’ in column ‘b’ and replaces these values with whatever is specified invalue
. Thevalue
parameter should not beNone
in this case. You can treat this as a special case of passing two lists except that you are specifying the column to search in. - For a DataFrame nested dictionaries, e.g.,
{'a': {'b': np.nan}}
, are read as follows: look in column ‘a’ for the value ‘b’ and replace it with NaN. Thevalue
parameter should beNone
to use a nested dict in this way. You can nest regular expressions as well. Note that column names (the top-level dictionary keys in a nested dictionary) cannot be regular expressions.
- Dicts can be used to specify different replacement values for different existing values. For example,
-
None:
- This means that the
regex
argument must be a string, compiled regular expression, or list, dict, ndarray or Series of such elements. Ifvalue
is alsoNone
then this must be a nested dictionary or Series.
- This means that the
See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
Value to replace any values matching
to_replace
with. For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the dict will not be filled). Regular expressions, strings and lists or dicts of such objects are also allowed.inplace : boolean, default False
If True, in place. Note: this will modify any other views on this object (e.g. a column from a DataFrame). Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill.
regex : bool or same types as
to_replace
, default FalseWhether to interpret
to_replace
and/orvalue
as regular expressions. If this isTrue
thento_replace
must be a string. Alternatively, this could be a regular expression or a list, dict, or array of regular expressions in which caseto_replace
must beNone
.method : {‘pad’, ‘ffill’, ‘bfill’,
None
}The method to use when for replacement, when
to_replace
is a scalar, list or tuple andvalue
isNone
.Changed in version 0.23.0: Added to DataFrame.
Returns: DataFrame
Object after replacement.
Raises: AssertionError
- If
regex
is not abool
andto_replace
is notNone
.
TypeError
- If
to_replace
is adict
andvalue
is not alist
,dict
,ndarray
, orSeries
- If
to_replace
isNone
andregex
is not compilable into a regular expression or is a list, dict, ndarray, or Series. - When replacing multiple
bool
ordatetime64
objects and the arguments toto_replace
does not match the type of the value being replaced
ValueError
- If a
list
or anndarray
is passed toto_replace
andvalue
but they are not the same length.
See also
-
DataFrame.fillna
- Fill NA values
-
DataFrame.where
- Replace values based on boolean condition
-
Series.str.replace
- Simple string replacement.
Notes
- Regex substitution is performed under the hood with
re.sub
. The rules for substitution forre.sub
are the same. - Regular expressions will only substitute on strings, meaning you cannot provide, for example, a regular expression matching floating point numbers and expect the columns in your frame that have a numeric dtype to be matched. However, if those floating point numbers are strings, then you can do this.
- This method has a lot of options. You are encouraged to experiment and play with this method to gain intuition about how it works.
- When dict is used as the
to_replace
value, it is like key(s) in the dict are the to_replace part and value(s) in the dict are the value parameter.
Examples
Scalar `to_replace` and `value`
>>> s = pd.Series([0, 1, 2, 3, 4]) >>> s.replace(0, 5) 0 5 1 1 2 2 3 3 4 4 dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], ... 'B': [5, 6, 7, 8, 9], ... 'C': ['a', 'b', 'c', 'd', 'e']}) >>> df.replace(0, 5) A B C 0 5 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
List-like `to_replace`
>>> df.replace([0, 1, 2, 3], 4) A B C 0 4 5 a 1 4 6 b 2 4 7 c 3 4 8 d 4 4 9 e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) A B C 0 4 5 a 1 3 6 b 2 2 7 c 3 1 8 d 4 4 9 e
>>> s.replace([1, 2], method='bfill') 0 0 1 3 2 3 3 3 4 4 dtype: int64
dict-like `to_replace`
>>> df.replace({0: 10, 1: 100}) A B C 0 10 5 a 1 100 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace({'A': 0, 'B': 5}, 100) A B C 0 100 100 a 1 1 6 b 2 2 7 c 3 3 8 d 4 4 9 e
>>> df.replace({'A': {0: 100, 4: 400}}) A B C 0 100 5 a 1 1 6 b 2 2 7 c 3 3 8 d 4 400 9 e
Regular expression `to_replace`
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'], ... 'B': ['abc', 'bar', 'xyz']}) >>> df.replace(to_replace=r'^ba.$', value='new', regex=True) A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True) A B 0 new abc 1 foo bar 2 bait xyz
>>> df.replace(regex=r'^ba.$', value='new') A B 0 new abc 1 foo new 2 bait xyz
>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'}) A B 0 new abc 1 xyz new 2 bait xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') A B 0 new abc 1 new new 2 bait xyz
Note that when replacing multiple
bool
ordatetime64
objects, the data types in theto_replace
parameter must match the data type of the value being replaced:>>> df = pd.DataFrame({'A': [True, False, True], ... 'B': [False, True, False]}) >>> df.replace({'a string': 'new value', True: False}) # raises Traceback (most recent call last): ... TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'
This raises a
TypeError
because one of thedict
keys is not of the correct type for replacement.Compare the behavior of
s.replace({'a': None})
ands.replace('a', None)
to understand the pecularities of theto_replace
parameter:>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
When one uses a dict as the
to_replace
value, it is like the value(s) in the dict are equal to thevalue
parameter.s.replace({'a': None})
is equivalent tos.replace(to_replace={'a': None}, value=None, method=None)
:>>> s.replace({'a': None}) 0 10 1 None 2 None 3 b 4 None dtype: object
When
value=None
andto_replace
is a scalar, list or tuple,replace
uses the method parameter (default ‘pad’) to do the replacement. So this is why the ‘a’ values are being replaced by 10 in rows 1 and 2 and ‘b’ in row 4 in this case. The commands.replace('a', None)
is actually equivalent tos.replace(to_replace='a', value=None, method='pad')
:>>> s.replace('a', None) 0 10 1 10 2 10 3 b 4 b dtype: object
-
© 2008–2012, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.DataFrame.replace.html