`pudl.load.parquet`

Load PUDL data into an Apache Parquet dataset.

Currently this module is only used for the EPA CEMS hourly dataset, but it will also be used for other long tables that are too big for SQLite to handle gracefully.

Module Contents

Functions

epacems_to_parquet(df, root_path)

Write an EPA CEMS dataframe out to a partitioned Parquet dataset.

Attributes

`logger`
`INT_NULLABLE`
`INT_NOT_NULL`
`STR_NOT_NULL`
`TIMESTAMP`
`FLOAT_NULLABLE`
`FLOAT_NOT_NULL`
`DICT_NULLABLE`
`EPACEMS_ARROW_SCHEMA`	Schema defining efficient data types for EPA CEMS Parquet outputs.

pudl.load.parquet.logger[source]

pudl.load.parquet.INT_NULLABLE[source]

pudl.load.parquet.INT_NOT_NULL[source]

pudl.load.parquet.STR_NOT_NULL[source]

pudl.load.parquet.TIMESTAMP[source]

pudl.load.parquet.FLOAT_NULLABLE[source]

pudl.load.parquet.FLOAT_NOT_NULL[source]

pudl.load.parquet.DICT_NULLABLE[source]

pudl.load.parquet.EPACEMS_ARROW_SCHEMA[source]: Schema defining efficient data types for EPA CEMS Parquet outputs.

pudl.load.parquet.epacems_to_parquet(df, root_path)[source]

Write an EPA CEMS dataframe out to a partitioned Parquet dataset.

Parameters

df (pandas.DataFrame) – Dataframe containing the data to be output.
root_path (path-like) – The top level directory for the partitioned dataset.

Returns

None

pudl.load.parquet

Module Contents

Functions

Attributes

`pudl.load.parquet`