pudl.extract.epacems module¶

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

pudl.extract.epacems.extract(epacems_years, states, data_dir)[source]¶

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters

epacems_years (list) – list of years from which we are trying to read CEMS data
states (list) – list of states from which we are trying to read CEMS data
data_dir (path-like) – Path to the top directory of the PUDL datastore.

Yields

dict – a dictionary of States (keys) and DataFrames of CEMS data (values)

Todo

This is really slow. Can we do some parallel processing?

pudl.extract.epacems.read_cems_csv(filename)[source]¶

Reads one CEMS CSV file.

Note that some columns are not read. See epacems_columns_to_ignores.