pandas.DataFrame.to_parquet
- DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)[source]
-
Write a DataFrame to the binary parquet format.
This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.
- Parameters
-
- path:str or file-like object, default None
-
If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned.
Changed in version 1.2.0.
Previously this was “fname”
- engine:{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’
-
Parquet library to use. If ‘auto’, then the option
io.parquet.engine
is used. The defaultio.parquet.engine
behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable. - compression:{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’
-
Name of the compression to use. Use
None
for no compression. - index:bool, default None
-
If
True
, include the dataframe’s index(es) in the file output. IfFalse
, they will not be written to the file. IfNone
, similar toTrue
the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output. - partition_cols:list, optional, default None
-
Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.
- storage_options:dict, optional
-
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib
as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec
. Please seefsspec
andurllib
for more details.New in version 1.2.0.
- **kwargs
-
Additional arguments passed to the parquet library. See pandas io for more details.
- Returns
-
- bytes if no path argument is provided else None
See also
read_parquet
-
Read a parquet file.
DataFrame.to_csv
-
Write a csv file.
DataFrame.to_sql
-
Write to a sql table.
DataFrame.to_hdf
-
Write to hdf.
Notes
This function requires either the fastparquet or pyarrow library.
Examples
>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]}) >>> df.to_parquet('df.parquet.gzip', ... compression='gzip') >>> pd.read_parquet('df.parquet.gzip') col1 col2 0 1 3 1 2 4
If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.
>>> import io >>> f = io.BytesIO() >>> df.to_parquet(f) >>> f.seek(0) 0 >>> content = f.read()
© 2008–2021, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/1.3.4/reference/api/pandas.DataFrame.to_parquet.html