pandas.DataFrame.to_parquet

DataFrame.to_parquet(path=None, engine='auto', compression='snappy', index=None, partition_cols=None, storage_options=None, **kwargs)[source]

Write a DataFrame to the binary parquet format.

This function writes the dataframe as a parquet file. You can choose different parquet backends, and have the option of compression. See the user guide for more details.

Parameters
path:str or file-like object, default None

If a string, it will be used as Root Directory path when writing a partitioned dataset. By file-like object, we refer to objects with a write() method, such as a file handle (e.g. via builtin open function) or io.BytesIO. The engine fastparquet does not accept file-like objects. If path is None, a bytes object is returned.

Changed in version 1.2.0.

Previously this was “fname”

engine:{‘auto’, ‘pyarrow’, ‘fastparquet’}, default ‘auto’

Parquet library to use. If ‘auto’, then the option io.parquet.engine is used. The default io.parquet.engine behavior is to try ‘pyarrow’, falling back to ‘fastparquet’ if ‘pyarrow’ is unavailable.

compression:{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’

Name of the compression to use. Use None for no compression.

index:bool, default None

If True, include the dataframe’s index(es) in the file output. If False, they will not be written to the file. If None, similar to True the dataframe’s index(es) will be saved. However, instead of being saved as values, the RangeIndex will be stored as a range in the metadata so it doesn’t require much space and is faster. Other indexes will be included as columns in the file output.

partition_cols:list, optional, default None

Column names by which to partition the dataset. Columns are partitioned in the order they are given. Must be None if path is not a string.

storage_options:dict, optional

Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec. Please see fsspec and urllib for more details.

New in version 1.2.0.

**kwargs

Additional arguments passed to the parquet library. See pandas io for more details.

Returns
bytes if no path argument is provided else None

See also

read_parquet

Read a parquet file.

DataFrame.to_csv

Write a csv file.

DataFrame.to_sql

Write to a sql table.

DataFrame.to_hdf

Write to hdf.

Notes

This function requires either the fastparquet or pyarrow library.

Examples

>>> df = pd.DataFrame(data={'col1': [1, 2], 'col2': [3, 4]})
>>> df.to_parquet('df.parquet.gzip',
...               compression='gzip')  
>>> pd.read_parquet('df.parquet.gzip')  
   col1  col2
0     1     3
1     2     4

If you want to get a buffer to the parquet content you can use a io.BytesIO object, as long as you don’t use partition_cols, which creates multiple files.

>>> import io
>>> f = io.BytesIO()
>>> df.to_parquet(f)
>>> f.seek(0)
0
>>> content = f.read()

© 2008–2021, AQR Capital Management, LLC, Lambda Foundry, Inc. and PyData Development Team
Licensed under the 3-clause BSD License.
https://pandas.pydata.org/pandas-docs/version/1.3.4/reference/api/pandas.DataFrame.to_parquet.html