pudl.extract.epacems module¶
Retrieve data from EPA CEMS hourly zipped CSVs.
This modules pulls data from EPA’s published CSV files.
-
pudl.extract.epacems.
CSV_DTYPES
= {'CO2_MASS': <class 'float'>, 'CO2_MASS (tons)': <class 'float'>, 'CO2_MASS_MEASURE_FLG': StringDtype, 'FAC_ID': Int64Dtype(), 'GLOAD': <class 'float'>, 'GLOAD (MW)': <class 'float'>, 'HEAT_INPUT': <class 'float'>, 'HEAT_INPUT (mmBtu)': <class 'float'>, 'NOX_MASS': <class 'float'>, 'NOX_MASS (lbs)': <class 'float'>, 'NOX_MASS_MEASURE_FLG': StringDtype, 'NOX_RATE': <class 'float'>, 'NOX_RATE (lbs/mmBtu)': <class 'float'>, 'NOX_RATE_MEASURE_FLG': StringDtype, 'OP_DATE': StringDtype, 'OP_HOUR': Int64Dtype(), 'OP_TIME': <class 'float'>, 'ORISPL_CODE': Int64Dtype(), 'SLOAD': <class 'float'>, 'SLOAD (1000 lbs)': <class 'float'>, 'SLOAD (1000lb/hr)': <class 'float'>, 'SO2_MASS': <class 'float'>, 'SO2_MASS (lbs)': <class 'float'>, 'SO2_MASS_MEASURE_FLG': StringDtype, 'STATE': StringDtype, 'UNITID': StringDtype, 'UNIT_ID': Int64Dtype()}¶ A dictionary containing column names (keys) and data types (values) for EPA CEMS.
- Type
-
class
pudl.extract.epacems.
EpaCemsDatastore
(datastore: pudl.workspace.datastore.Datastore)[source]¶ Bases:
object
Helper class to extract EpaCems resources from datastore.
EpaCems resources are identified by a year and a state. Each of these zip files contain monthly zip files that in turn contain csv files. This class implements get_data_frame method that will concatenate tables for a given state and month across all months.
-
get_data_frame
(partition: pudl.extract.epacems.EpaCemsPartition) → pandas.core.frame.DataFrame[source]¶ Constructs dataframe holding data for a given (year, state) partition.
-
-
class
pudl.extract.epacems.
EpaCemsPartition
(year: str, state: str)[source]¶ Bases:
tuple
Represents EpaCems partition identifying unique resource file.
-
get_monthly_file
(month: int) → pathlib.Path[source]¶ Returns the filename (without suffix) that contains the monthly data.
-
-
pudl.extract.epacems.
IGNORE_COLS
= {'CO2_RATE', 'CO2_RATE (tons/mmBtu)', 'CO2_RATE_MEASURE_FLG', 'FACILITY_NAME', 'SO2_RATE', 'SO2_RATE (lbs/mmBtu)', 'SO2_RATE_MEASURE_FLG'}¶ The set of EPA CEMS columns to ignore when reading data.
- Type
-
pudl.extract.epacems.
RENAME_DICT
= {'CO2_MASS': 'co2_mass_tons', 'CO2_MASS (tons)': 'co2_mass_tons', 'CO2_MASS_MEASURE_FLG': 'co2_mass_measurement_code', 'FAC_ID': 'facility_id', 'GLOAD': 'gross_load_mw', 'GLOAD (MW)': 'gross_load_mw', 'HEAT_INPUT': 'heat_content_mmbtu', 'HEAT_INPUT (mmBtu)': 'heat_content_mmbtu', 'NOX_MASS': 'nox_mass_lbs', 'NOX_MASS (lbs)': 'nox_mass_lbs', 'NOX_MASS_MEASURE_FLG': 'nox_mass_measurement_code', 'NOX_RATE': 'nox_rate_lbs_mmbtu', 'NOX_RATE (lbs/mmBtu)': 'nox_rate_lbs_mmbtu', 'NOX_RATE_MEASURE_FLG': 'nox_rate_measurement_code', 'OP_DATE': 'op_date', 'OP_HOUR': 'op_hour', 'OP_TIME': 'operating_time_hours', 'ORISPL_CODE': 'plant_id_eia', 'SLOAD': 'steam_load_1000_lbs', 'SLOAD (1000 lbs)': 'steam_load_1000_lbs', 'SLOAD (1000lb/hr)': 'steam_load_1000_lbs', 'SO2_MASS': 'so2_mass_lbs', 'SO2_MASS (lbs)': 'so2_mass_lbs', 'SO2_MASS_MEASURE_FLG': 'so2_mass_measurement_code', 'STATE': 'state', 'UNITID': 'unitid', 'UNIT_ID': 'unit_id_epa'}¶ A dictionary containing EPA CEMS column names (keys) and replacement names to use when reading those columns into PUDL (values).
- Type
-
pudl.extract.epacems.
extract
(epacems_years, states, ds: pudl.workspace.datastore.Datastore)[source]¶ Coordinate the extraction of EPA CEMS hourly DataFrames.
- Parameters
- Yields
dict – a dictionary with a single EPA CEMS tabular data resource name as the key, having the form “hourly_emissions_epacems_YEAR_STATE” where YEAR is a 4 digit number and STATE is a lower case 2-letter code for a US state. The value is a
pandas.DataFrame
containing all the raw EPA CEMS hourly emissions data for the indicated state and year.