`pudl.extract.epacems`

Retrieve data from EPA CEMS hourly zipped CSVs.

This modules pulls data from EPA’s published CSV files.

Module Contents

Classes

`EpaCemsPartition`	Represents EpaCems partition identifying unique resource file.
`EpaCemsDatastore`	Helper class to extract EpaCems resources from datastore.

Functions

extract(epacems_years, states, ds: pudl.workspace.datastore.Datastore)

Coordinate the extraction of EPA CEMS hourly DataFrames.

Attributes

`logger`
`RENAME_DICT`	A dictionary containing EPA CEMS column names (keys) and replacement
`IGNORE_COLS`	The set of EPA CEMS columns to ignore when reading data.

pudl.extract.epacems.logger[source]

pudl.extract.epacems.RENAME_DICT[source]

A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).

Type: dict

pudl.extract.epacems.IGNORE_COLS[source]

The set of EPA CEMS columns to ignore when reading data.

Type: set

class pudl.extract.epacems.EpaCemsPartition[source]

Bases: NamedTuple

Represents EpaCems partition identifying unique resource file.

year :str[source]

state :str[source]

get_key(self)[source]: Returns hashable key for use with EpaCemsDatastore.

get_filters(self)[source]: Returns filters for retrieving given partition resource from Datastore.

get_monthly_file(self, month: int) → pathlib.Path[source]: Returns the filename (without suffix) that contains the monthly data.

class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]

Helper class to extract EpaCems resources from datastore.

EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.

get_data_frame(self, partition: EpaCemsPartition) → pandas.DataFrame[source]: Constructs dataframe holding data for a given (year, state) partition.

_csv_to_dataframe(self, csv_file) → pandas.DataFrame[source]

Convert a CEMS csv file into a pandas.DataFrame.

Note that some columns are not read. See pudl.constants.epacems_columns_to_ignore. Data types for the columns are specified in pudl.constants.epacems_csv_dtypes and names of the output columns are set by pudl.constants.epacems_rename_dict.

Parameters: csv (file-like object) – data to be read
Returns: A DataFrame containing the contents of the CSV file.
Return type: pandas.DataFrame

pudl.extract.epacems.extract(epacems_years, states, ds: pudl.workspace.datastore.Datastore)[source]

Coordinate the extraction of EPA CEMS hourly DataFrames.

Parameters

epacems_years (list) – The years of CEMS data to extract, as 4-digit integers.
states (list) – The states whose CEMS data we want to extract, indicated by 2-letter US state codes.
ds (Datastore) – Initialized datastore

Yields

pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.

pudl.extract.epacems

Module Contents

Classes

Functions

Attributes

`pudl.extract.epacems`