pudl.convert.epacems_to_parquet

Process raw EPA CEMS data into a Parquet dataset outside of the PUDL ETL.

This script transforms the raw EPA CEMS data from Zip compressed CSV files into an Apache Parquet dataset partitioned by year and state.

Processing the EPA CEMS data requires information that’s stored in the main PUDL database, so to run this script, you must already have a PUDL database available on your system.

Module Contents

Functions

parse_command_line(argv)

Parse command line arguments. See the -h option.

main()

Convert zipped EPA CEMS Hourly data to Apache Parquet format.

Attributes

logger

pudl.convert.epacems_to_parquet.logger[source]
pudl.convert.epacems_to_parquet.parse_command_line(argv)[source]

Parse command line arguments. See the -h option.

Parameters

argv (str) – Command line arguments, including caller filename.

Returns

Dictionary of command line arguments and their parsed values.

Return type

dict

pudl.convert.epacems_to_parquet.main()[source]

Convert zipped EPA CEMS Hourly data to Apache Parquet format.