pudl.extract.epacems#

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

Module Contents#

Classes#

EpaCemsPartition

Represents EpaCems partition identifying unique resource file.

EpaCemsDatastore

Helper class to extract EpaCems resources from datastore.

Functions#

extract(year, state, ds)

Coordinate the extraction of EPA CEMS hourly DataFrames.

Attributes#

logger

RENAME_DICT

A dictionary containing EPA CEMS column names (keys) and replacement

IGNORE_COLS

The set of EPA CEMS columns to ignore when reading data.

pudl.extract.epacems.logger[source]#
pudl.extract.epacems.RENAME_DICT[source]#

A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).

Type:

dict

pudl.extract.epacems.IGNORE_COLS[source]#

The set of EPA CEMS columns to ignore when reading data.

Type:

set

class pudl.extract.epacems.EpaCemsPartition[source]#

Bases: NamedTuple

Represents EpaCems partition identifying unique resource file.

year :str[source]#
state :str[source]#
get_key()[source]#

Returns hashable key for use with EpaCemsDatastore.

get_filters()[source]#

Returns filters for retrieving given partition resource from Datastore.

get_monthly_file(month: int) pathlib.Path[source]#

Returns the filename (without suffix) that contains the monthly data.

class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]#

Helper class to extract EpaCems resources from datastore.

EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.

get_data_frame(partition: EpaCemsPartition) pandas.DataFrame[source]#

Constructs dataframe holding data for a given (year, state) partition.

_csv_to_dataframe(csv_file) pandas.DataFrame[source]#

Convert a CEMS csv file into a pandas.DataFrame.

Parameters:

csv (file-like object) – data to be read

Returns:

A DataFrame containing the contents of the CSV file.

pudl.extract.epacems.extract(year: int, state: str, ds: pudl.workspace.datastore.Datastore)[source]#

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters:
  • year – report year of the data to extract

  • ds – Initialized datastore

Yields:

pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.