pudl.glue.eia_epacems

Extract, clean, and normalize the EPA-EIA crosswalk.

This module defines functions that read the raw EPA-EIA crosswalk file, clean up the column names, and separate it into three distinctive normalize tables for integration in the database. There are many gaps in the mapping of EIA plant and generator ids to EPA plant and unit ids, so, for the time being these tables are sparse.

The EPA, in conjunction with the EIA, plans to relase an crosswalk with fewer gaps at the beginning of 2021. Until then, this module reads and cleans the currently available crosswalk.

The raw crosswalk file was obtained from Greg Schivley. His methods for filling in some of the gaps are not included in this version of the module. https://github.com/grgmiller/EPA-EIA-Unit-Crosswalk

Module Contents

Functions

grab_n_clean_epa_orignal()

Retrieve and clean column names for the original EPA-EIA crosswalk file.

split_tables(df: pandas.DataFrame) → Dict[str, pandas.DataFrame]

Split the cleaned EIA-EPA crosswalk table into three normalized tables.

grab_clean_split() → Dict[str, pandas.DataFrame]

Clean raw crosswalk data, drop nans, and return split tables.

Attributes

logger

pudl.glue.eia_epacems.logger[source]
pudl.glue.eia_epacems.grab_n_clean_epa_orignal()[source]

Retrieve and clean column names for the original EPA-EIA crosswalk file.

Returns

a version of the EPA-EIA crosswalk containing only

relevant columns. Columns names are clear and programatically accessible.

Return type

pandas.DataFrame

pudl.glue.eia_epacems.split_tables(df: pandas.DataFrame) Dict[str, pandas.DataFrame][source]

Split the cleaned EIA-EPA crosswalk table into three normalized tables.

Parameters

df – a DataFrame of relevant, readible columns from the EIA-EPA crosswalk. Output of grab_n_clean_epa_original().

Returns

A dictionary of three normalized DataFrames comprised of the data in the original crosswalk file. EPA plant id to EPA unit id; EPA plant id to EIA plant id; and EIA plant id to EIA generator id to EPA unit id. Includes no nan values.

pudl.glue.eia_epacems.grab_clean_split() Dict[str, pandas.DataFrame][source]

Clean raw crosswalk data, drop nans, and return split tables.

Returns

A dictionary of three normalized DataFrames comprised of the data in the original crosswalk file. EPA plant id to EPA unit id; EPA plant id to EIA plant id; and EIA plant id to EIA generator id to EPA unit id.