pudl.extract.epacems module

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

pudl.extract.epacems.extract(epacems_years, states, data_dir)[source]

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters
  • epacems_years (list) – list of years from which we are trying to read CEMS data

  • states (list) – list of states from which we are trying to read CEMS data

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Yields

dict – a dictionary of States (keys) and DataFrames of CEMS data (values)

Todo

This is really slow. Can we do some parallel processing?

pudl.extract.epacems.read_cems_csv(filename)[source]

Reads one CEMS CSV file.

Note that some columns are not read. See epacems_columns_to_ignores.

Parameters

filename (str) – The name of the file to be read

Returns

A DataFrame containing the contents of the CSV file.

Return type

pandas.DataFrame