pudl.extract.epacems
Retrieve data from EPA CEMS hourly zipped CSVs.
This modules pulls data from EPA’s published CSV files.
Module Contents
Classes
Represents EpaCems partition identifying unique resource file. |
|
Helper class to extract EpaCems resources from datastore. |
Functions
|
Coordinate the extraction of EPA CEMS hourly DataFrames. |
Attributes
A dictionary containing EPA CEMS column names (keys) and replacement |
|
The set of EPA CEMS columns to ignore when reading data. |
- pudl.extract.epacems.RENAME_DICT[source]
A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).
- Type
- pudl.extract.epacems.IGNORE_COLS[source]
The set of EPA CEMS columns to ignore when reading data.
- Type
- class pudl.extract.epacems.EpaCemsPartition[source]
Bases:
NamedTuple
Represents EpaCems partition identifying unique resource file.
- get_monthly_file(self, month: int) pathlib.Path [source]
Returns the filename (without suffix) that contains the monthly data.
- class pudl.extract.epacems.EpaCemsDatastore(datastore: pudl.workspace.datastore.Datastore)[source]
Helper class to extract EpaCems resources from datastore.
EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.
- get_data_frame(self, partition: EpaCemsPartition) pandas.DataFrame [source]
Constructs dataframe holding data for a given (year, state) partition.
- _csv_to_dataframe(self, csv_file) pandas.DataFrame [source]
Convert a CEMS csv file into a
pandas.DataFrame
.Note that some columns are not read. See
pudl.constants.epacems_columns_to_ignore
. Data types for the columns are specified inpudl.constants.epacems_csv_dtypes
and names of the output columns are set bypudl.constants.epacems_rename_dict
.- Parameters
csv (file-like object) – data to be read
- Returns
A DataFrame containing the contents of the CSV file.
- Return type
- pudl.extract.epacems.extract(epacems_years, states, ds: pudl.workspace.datastore.Datastore)[source]
Coordinate the extraction of EPA CEMS hourly DataFrames.
- Parameters
- Yields
pandas.DataFrame – A single state-year of EPA CEMS hourly emissions data.