pudl.glue.eia_epacems
Extract, clean, and normalize the EPA-EIA crosswalk.
This module defines functions that read the raw EPA-EIA crosswalk file, clean up the column names, and separate it into three distinctive normalize tables for integration in the database. There are many gaps in the mapping of EIA plant and generator ids to EPA plant and unit ids, so, for the time being these tables are sparse.
The EPA, in conjunction with the EIA, plans to relase an crosswalk with fewer gaps at the beginning of 2021. Until then, this module reads and cleans the currently available crosswalk.
The raw crosswalk file was obtained from Greg Schivley. His methods for filling in some of the gaps are not included in this version of the module. https://github.com/grgmiller/EPA-EIA-Unit-Crosswalk
Module Contents
Functions
Retrieve and clean column names for the original EPA-EIA crosswalk file. |
|
|
Split the cleaned EIA-EPA crosswalk table into three normalized tables. |
Clean raw crosswalk data, drop nans, and return split tables. |
Attributes
- pudl.glue.eia_epacems.grab_n_clean_epa_orignal()[source]
Retrieve and clean column names for the original EPA-EIA crosswalk file.
- Returns
- a version of the EPA-EIA crosswalk containing only
relevant columns. Columns names are clear and programatically accessible.
- Return type
- pudl.glue.eia_epacems.split_tables(df)[source]
Split the cleaned EIA-EPA crosswalk table into three normalized tables.
- Parameters
pandas.DataFrame – a DataFrame of relevant, readible columns from the EIA-EPA crosswalk. Output of grab_n_clean_epa_original().
- Returns
a dictionary of three normalized DataFrames comprised of the data in the original crosswalk file. EPA plant id to EPA unit id; EPA plant id to EIA plant id; and EIA plant id to EIA generator id to EPA unit id. Includes no nan values.
- Return type
- pudl.glue.eia_epacems.grab_clean_split()[source]
Clean raw crosswalk data, drop nans, and return split tables.
- Returns
a dictionary of three normalized DataFrames comprised of the data in the original crosswalk file. EPA plant id to EPA unit id; EPA plant id to EIA plant id; and EIA plant id to EIA generator id to EPA unit id.
- Return type