pudl.load.parquet
Load PUDL data into an Apache Parquet dataset.
Currently this module is only used for the EPA CEMS hourly dataset, but it will also be used for other long tables that are too big for SQLite to handle gracefully.
Module Contents
Functions
|
Write an EPA CEMS dataframe out to a partitioned Parquet dataset. |
Attributes
Schema defining efficient data types for EPA CEMS Parquet outputs. |
- pudl.load.parquet.EPACEMS_ARROW_SCHEMA[source]
Schema defining efficient data types for EPA CEMS Parquet outputs.
- pudl.load.parquet.epacems_to_parquet(df, root_path)[source]
Write an EPA CEMS dataframe out to a partitioned Parquet dataset.
- Parameters
df (pandas.DataFrame) – Dataframe containing the data to be output.
root_path (path-like) – The top level directory for the partitioned dataset.
- Returns
None