pandas.read_stata
- pandas.read_stata(filepath_or_buffer, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, compression='infer', storage_options=None)[source]
-
Read Stata file into DataFrame.
- Parameters
-
- filepath_or_buffer:str, path object or file-like object
-
Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be:
file://localhost/path/to/table.dta
.If you want to pass in a path object, pandas accepts any
os.PathLike
.By file-like object, we refer to objects with a
read()
method, such as a file handle (e.g. via builtinopen
function) orStringIO
. - convert_dates:bool, default True
-
Convert date variables to DataFrame time values.
- convert_categoricals:bool, default True
-
Read value labels and convert columns to Categorical/Factor variables.
- index_col:str, optional
-
Column to set as index.
- convert_missing:bool, default False
-
Flag indicating whether to convert missing values to their Stata representations. If False, missing values are replaced with nan. If True, columns containing missing values are returned with object data types and missing values are represented by StataMissingValue objects.
- preserve_dtypes:bool, default True
-
Preserve Stata datatypes. If False, numeric data are upcast to pandas default types for foreign data (float64 or int64).
- columns:list or None
-
Columns to retain. Columns will be returned in the given order. None returns all columns.
- order_categoricals:bool, default True
-
Flag indicating whether converted categorical data are ordered.
- chunksize:int, default None
-
Return StataReader object for iterations, returns chunks with given number of lines.
- iterator:bool, default False
-
Return StataReader object.
- compression:str or dict, default None
-
If string, specifies compression mode. If dict, value at key ‘method’ specifies compression mode. Compression mode must be one of {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}. If compression mode is ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no compression). If dict and compression mode is one of {‘zip’, ‘gzip’, ‘bz2’}, or inferred as one of the above, other entries passed as additional compression options.
- storage_options:dict, optional
-
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec
. Please seefsspec
andurllib
for more details.
- Returns
-
- DataFrame or StataReader
See also
io.stata.StataReader
-
Low-level reader for Stata data files.
DataFrame.to_stata
-
Export Stata data files.
Notes
Categorical variables read through an iterator may not have the same categories and dtype. This occurs when a variable stored in a DTA file is associated to an incomplete set of value labels that only label a strict subset of the values.
Examples
Read a Stata dta file:
>>> df = pd.read_stata('filename.dta')
Read a Stata dta file in 10,000 line chunks:
>>> itr = pd.read_stata('filename.dta', chunksize=10000) >>> for chunk in itr: ... do_something(chunk)
© 2008–2021, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/1.3.4/reference/api/pandas.read_stata.html