pudl.convert.epacems_to_parquet#

Process raw EPA CEMS data into a Parquet dataset outside of the PUDL ETL.

This script transforms the raw EPA CEMS data from Zip compressed CSV files into an Apache Parquet dataset partitioned by year and state.

Processing the EPA CEMS data requires information that’s stored in the main PUDL database, so to run this script, you must already have a PUDL database available on your system.

Module Contents#

Functions#

parse_command_line(argv)

Parse command line arguments. See the -h option.

main()

Convert zipped EPA CEMS Hourly data to Apache Parquet format.

Attributes#

pudl.convert.epacems_to_parquet.logger[source]#
pudl.convert.epacems_to_parquet.parse_command_line(argv)[source]#

Parse command line arguments. See the -h option.

Parameters:

argv (str) – Command line arguments, including caller filename.

Returns:

Dictionary of command line arguments and their parsed values.

Return type:

dict

pudl.convert.epacems_to_parquet.main()[source]#

Convert zipped EPA CEMS Hourly data to Apache Parquet format.