pudl.transform.epacems module

Module to perform data cleaning functions on EPA CEMS data tables.

pudl.transform.epacems.add_facility_id_unit_id_epa(df)[source]

Harmonize columns that are added later.

The datapackage validation checks for consistent column names, and these two columns aren’t present before August 2008, so this adds them in.

Parameters

df (pandas.DataFrame) – A CEMS dataframe

Returns

The same DataFrame guaranteed to have int facility_id and unit_id_epa cols.

Return type

pandas.Dataframe

pudl.transform.epacems.correct_gross_load_mw(df)[source]

Fix values of gross load that are wrong by orders of magnitude.

Parameters

df (pandas.DataFrame) – A CEMS dataframe

Returns

The same DataFrame with corrected gross load values.

Return type

pandas.DataFrame

pudl.transform.epacems.fix_up_dates(df, plant_utc_offset)[source]

Fix the dates for the CEMS data.

Transformations include:

  • Account for timezone differences with offset from UTC.

Parameters

df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state plant_utc_offset (pandas.DataFrame): A dataframe of plants’ timezones.

Returns

The same data, with an op_datetime_utc column added and the op_date and op_hour columns removed.

Return type

pandas.DataFrame

pudl.transform.epacems.harmonize_eia_epa_orispl(df)[source]

Harmonize the ORISPL code to match the EIA data – NOT YET IMPLEMENTED.

The EIA plant IDs and CEMS ORISPL codes almost match, but not quite. EPA has compiled a crosswalk that maps one set of IDs to the other, but we haven’t integrated it yet. It can be found at:

https://github.com/USEPA/camd-eia-crosswalk

Note that this transformation needs to be run before fix_up_dates, because fix_up_dates uses the plant ID to look up timezones.

Parameters

df (pandas.DataFrame) – A CEMS hourly dataframe for one year-month-state.

Returns

The same data, with the ORISPL plant codes corrected to match the EIA plant IDs.

Return type

pandas.DataFrame

Todo

Actually implement the function…

pudl.transform.epacems.transform(epacems_raw_dfs, datapkg_dir)[source]

Transform EPA CEMS hourly data for use in datapackage export.

Todo

Incomplete docstring.