pudl.transform.epacems module¶

Routines specific to cleaning up EPA CEMS hourly data.

pudl.transform.epacems.add_facility_id_unit_id_epa(df)[source]¶

Harmonize columns that are added later.

The datapackage validation checks for consistent column names, and these two columns aren’t present before August 2008, so this adds them in.

Parameters: df (pandas.DataFrame) – A CEMS dataframe
Returns: The same DataFrame guaranteed to have int facility_id and unit_id_epa cols.
Return type: pandas.Dataframe

pudl.transform.epacems.correct_gross_load_mw(df)[source]¶

Fix values of gross load that are wrong by orders of magnitude.

pudl.transform.epacems.fix_up_dates(df, plant_utc_offset)[source]¶

Fix the dates for the CEMS data.

Parameters

Returns

The same data, with an op_datetime_utc column added and the op_date and op_hour columns removed

Return type

pandas.DataFrame

pudl.transform.epacems.harmonize_eia_epa_orispl(df)[source]¶

Harmonize the ORISPL code to match the EIA data – NOT YET IMPLEMENTED.

Note that this transformation needs to be run before fix_up_dates, because fix_up_dates uses the plant ID to look up timezones.

Parameters: df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state
Returns: The same data, with the ORISPL plant codes corrected to match the EIA plant IDs.
Return type: pandas.DataFrame

Todo

Actually implement the function…

pudl.transform.epacems.transform(epacems_raw_dfs, datapkg_dir)[source]¶: Transform EPA CEMS hourly data for use in datapackage export.

Todo

Incomplete docstring.