pudl.transform.eia861#

Module to perform data cleaning functions on EIA861 data tables.

All transformations include: - Replace . values with NA.

Module Contents#

Functions#

_filter_class_cols(df, class_list)

_filter_non_class_cols(df, class_list)

_ba_code_backfill(df)

Backfill Balancing Authority Codes based on codes in later years.

_tidy_class_dfs(df, df_name, idx_cols, class_list, ...)

_drop_dupes(df, df_name, subset)

_check_for_dupes(df, df_name, subset)

_early_transform(df)

Fix EIA na values and convert year column to date.

_compare_totals(data_cols, idx_cols, class_type, df_name)

Compare reported totals with sum of component columns.

_clean_nerc(df, idx_cols)

Clean NERC region entries and make new rows for multiple nercs.

_compare_nerc_physical_w_nerc_operational(...)

Show df rows where physical NERC region does not match operational region.

_pct_to_mw(df, pct_col)

Turn pct col into mw capacity using total capacity col.

_make_yn_bool(df_object)

Turn Y/N reporting into True or False boolean statements for df or series.

_thousand_to_one(df_object)

Turn reporting in thousands of dollars to regular dollars for df or series.

service_territory(tfr_dfs)

Transform the EIA 861 utility service territory table.

balancing_authority(tfr_dfs)

Transform the EIA 861 Balancing Authority table.

balancing_authority_assn(tfr_dfs)

Compile a balancing authority, utility, state association table.

utility_assn(tfr_dfs)

Harvest a Utility-Date-State Association Table.

_harvest_associations(dfs, cols)

Compile all unique, non-null combinations of values cols within dfs.

normalize_balancing_authority(tfr_dfs)

Finish the normalization of the balancing_authority_eia861 table.

sales(tfr_dfs)

Transform the EIA 861 Sales table.

advanced_metering_infrastructure(tfr_dfs)

Transform the EIA 861 Advanced Metering Infrastructure table.

demand_response(tfr_dfs)

Transform the EIA 861 Demand Response table.

demand_side_management(tfr_dfs)

Transform the EIA 861 Demand Side Management table.

distributed_generation(tfr_dfs)

Transform the EIA 861 Distributed Generation table.

distribution_systems(tfr_dfs)

Transform the EIA 861 Distribution Systems table.

dynamic_pricing(tfr_dfs)

Transform the EIA 861 Dynamic Pricing table.

energy_efficiency(tfr_dfs)

Transform the EIA 861 Energy Efficiency table.

green_pricing(tfr_dfs)

Transform the EIA 861 Green Pricing table.

mergers(tfr_dfs)

Transform the EIA 861 Mergers table.

net_metering(tfr_dfs)

Transform the EIA 861 Net Metering table.

non_net_metering(tfr_dfs)

Transform the EIA 861 Non-Net Metering table.

operational_data(tfr_dfs)

Transform the EIA 861 Operational Data table.

reliability(tfr_dfs)

Transform the EIA 861 Reliability table.

utility_data(tfr_dfs)

Transform the EIA 861 Utility Data table.

transform(raw_dfs[, eia861_settings])

Transform EIA 861 DataFrames.

Attributes#

pudl.transform.eia861.logger[source]#
pudl.transform.eia861.BA_ID_NAME_FIXES :pandas.DataFrame[source]#
pudl.transform.eia861.EIA_FIPS_COUNTY_FIXES :pandas.DataFrame[source]#
pudl.transform.eia861.BA_NAME_FIXES :pandas.DataFrame[source]#
pudl.transform.eia861.NERC_SPELLCHECK :dict[str, str][source]#
pudl.transform.eia861._filter_class_cols(df, class_list)[source]#
pudl.transform.eia861._filter_non_class_cols(df, class_list)[source]#
pudl.transform.eia861._ba_code_backfill(df)[source]#

Backfill Balancing Authority Codes based on codes in later years.

Note

The BA Code to ID mapping can change from year to year. If a Balancing Authority is bought by another entity, the code may change, but the old EIA BA ID will be retained.

Parameters:

ba_eia861 (pandas.DataFrame) – The transformed EIA 861 Balancing Authority dataframe (balancing_authority_eia861).

Returns:

The balancing_authority_eia861 dataframe, but with many fewer NA values in the balancing_authority_code_eia column.

Return type:

pandas.DataFrame

pudl.transform.eia861._tidy_class_dfs(df, df_name, idx_cols, class_list, class_type, keep_totals=False)[source]#
pudl.transform.eia861._drop_dupes(df, df_name, subset)[source]#
pudl.transform.eia861._check_for_dupes(df, df_name, subset)[source]#
pudl.transform.eia861._early_transform(df)[source]#

Fix EIA na values and convert year column to date.

pudl.transform.eia861._compare_totals(data_cols, idx_cols, class_type, df_name)[source]#

Compare reported totals with sum of component columns.

Parameters:
  • data_cols (pd.DataFrame) – A DataFrame containing only the columns with normalized information.

  • idx_cols (list) – A list of the primary keys for the given denormalized DataFrame.

  • class_type (str) – The name (either ‘customer_class’ or ‘tech_class’) of the column for which you’d like to compare totals to components.

  • df_name (str) – The name of the dataframe.

pudl.transform.eia861._clean_nerc(df, idx_cols)[source]#

Clean NERC region entries and make new rows for multiple nercs.

This function examines reported NERC regions and makes sure the output column of the same name has reliable, singular NERC region acronyms. To do so, this function identifies entries where there are two or more NERC regions specified in a single cell (such as SPP & ERCOT) and makes new, duplicate rows for each NERC region. It also converts non-recognized reported nerc regions to ‘UNK’.

Parameters:
  • df (pandas.DataFrame) – A DataFrame with the column ‘nerc_region’ to be cleaned.

  • idx_cols (list) – A list of the primary keys.

Returns:

A DataFrame with correct and clean nerc regions.

Return type:

pandas.DataFrame

pudl.transform.eia861._compare_nerc_physical_w_nerc_operational(df: pandas.DataFrame) pandas.DataFrame[source]#

Show df rows where physical NERC region does not match operational region.

In the Utility Data table, there is the ‘nerc_region’ index column, otherwise interpreted as nerc region in which the utility is physically located and the ‘nerc_regions_of_operation’ column that depicts the nerc regions the utility operates in. In most cases, these two columns are the same, however there are certain instances where this is not true. There are also instances where a utility operates in multiple nerc regions in which case one row will match and another row will not. The output of this function in a table that shows only the utilities where the physical nerc region does not match the operational region ever, meaning there is no additional row for the same utlity during the same year where there is a match between the cols.

Parameters:

df – The utility_data_nerc_eia861 table output from the utility_data() function.

Returns:

A DataFrame with rows for utilities where NO listed operating nerc region matches the “physical location” nerc region column that’s a part of the index.

pudl.transform.eia861._pct_to_mw(df, pct_col)[source]#

Turn pct col into mw capacity using total capacity col.

pudl.transform.eia861._make_yn_bool(df_object)[source]#

Turn Y/N reporting into True or False boolean statements for df or series.

pudl.transform.eia861._thousand_to_one(df_object)[source]#

Turn reporting in thousands of dollars to regular dollars for df or series.

pudl.transform.eia861.service_territory(tfr_dfs)[source]#

Transform the EIA 861 utility service territory table.

Transformations include:

  • Homogenize spelling of county names.

  • Add field for state/county FIPS code.

Parameters:

tfr_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA861 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns:

a dictionary of pandas.DataFrame objects in which pages from EIA861 form

(keys) correspond to normalized DataFrames of values from that page (values).

Return type:

dict

pudl.transform.eia861.balancing_authority(tfr_dfs)[source]#

Transform the EIA 861 Balancing Authority table.

Transformations include:

  • Fill in balancing authrority IDs based on date, utility ID, and BA Name.

  • Backfill balancing authority codes based on BA ID.

  • Fix BA code and ID typos.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.balancing_authority_assn(tfr_dfs)[source]#

Compile a balancing authority, utility, state association table.

For the years up through 2012, the only BA-Util information that’s available comes from the balancing_authority_eia861 table, and it does not include any state-level information. However, there is utility-state association information in the sales_eia861 and other data tables.

For the years from 2013 onward, there’s explicit BA-Util-State information in the data tables (e.g. sales_eia861). These observed associations can be compiled to give us a picture of which BA-Util-State associations exist. However, we need to merge in the balancing authority IDs since the data tables only contain the balancing authority codes.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 dataframes. This must include any dataframes from which we want to compile BA-Util-State associations, which means this function has to be called after all the basic transformfunctions that depend on only a single raw table.

Returns:

a dictionary of transformed dataframes. This function both compiles the association table, and finishes the normalization of the balancing authority table. It may be that once the harvesting process incorporates the EIA 861, some or all of this functionality should be pulled into the phase-2 transform functions.

Return type:

dict

pudl.transform.eia861.utility_assn(tfr_dfs)[source]#

Harvest a Utility-Date-State Association Table.

pudl.transform.eia861._harvest_associations(dfs, cols)[source]#

Compile all unique, non-null combinations of values cols within dfs.

Find all unique, non-null combinations of the columns cols in the dataframes dfs within records that are selected by query. All of cols must be present in each of the dfs.

Parameters:
  • dfs (iterable of pandas.DataFrame) – The DataFrames in which to search for

  • cols (iterable of str) – Labels of columns for which to find unique, non-null combinations of values.

Raises:

ValueError – if no associations for cols are found in dfs.

Returns:

A dataframe containing all the unique, non-null combinations of values found in cols.

Return type:

pandas.DataFrame

pudl.transform.eia861.normalize_balancing_authority(tfr_dfs)[source]#

Finish the normalization of the balancing_authority_eia861 table.

The balancing_authority_assn_eia861 table depends on information that is only available in the UN-normalized form of the balancing_authority_eia861 table, so and also on having access to a bunch of transformed data tables, so it can compile the observed combinations of report dates, balancing authorities, states, and utilities. This means that we have to hold off on the final normalization of the balancing_authority_eia861 table until the rest of the transform process is over.

pudl.transform.eia861.sales(tfr_dfs)[source]#

Transform the EIA 861 Sales table.

Transformations include:

  • Remove rows with utility ids 88888 and 99999.

  • Tidy data by customer class.

  • Drop primary key duplicates.

  • Convert 1000s of dollars into dollars.

  • Convert data_observed field I/O into boolean.

  • Map full spelling onto code values.

pudl.transform.eia861.advanced_metering_infrastructure(tfr_dfs)[source]#

Transform the EIA 861 Advanced Metering Infrastructure table.

Transformations include:

  • Tidy data by customer class.

  • Drop total_meters columns (it’s calculable with other fields).

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.demand_response(tfr_dfs)[source]#

Transform the EIA 861 Demand Response table.

Transformations include:

  • Fill in NA balancing authority codes with UNK (because it’s part of the primary key).

  • Tidy subset of the data by customer class.

  • Drop duplicate rows based on primary keys.

  • Convert 1000s of dollars into dollars.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.demand_side_management(tfr_dfs)[source]#

Transform the EIA 861 Demand Side Management table.

In 2013, the EIA changed the contents of the 861 form so that information pertaining to demand side management was no longer housed in a single table, but rather two seperate ones pertaining to energy efficiency and demand response. While the pre and post 2013 tables contain similar information, one column in the pre-2013 demand side management table may not have an obvious column equivalent in the post-2013 energy efficiency or demand response data. We’ve addressed this by keeping the demand side management and energy efficiency and demand response tables seperate. Use the DSM table for pre 2013 data and the EE / DR tables for post 2013 data. Despite the uncertainty of comparing across these years, the data are similar and we hope to provide a cohesive dataset in the future with all years and comprable columns combined.

Transformations include:

  • Clean up NERC codes and ensure one per row.

  • Remove demand_side_management and data_observed columns (they are all the same).

  • Tidy subset of the data by customer class.

  • Convert Y/N columns to booleans.

  • Convert 1000s of dollars into dollars.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.distributed_generation(tfr_dfs)[source]#

Transform the EIA 861 Distributed Generation table.

Transformations include:

  • Map full spelling onto code values.

  • Convert pre-2010 percent values in mw values.

  • Remove total columns calculable with other fields.

  • Tidy subset of the data by tech class.

  • Tidy subset of the data by fuel class.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.distribution_systems(tfr_dfs)[source]#

Transform the EIA 861 Distribution Systems table.

Transformations include:

  • No additional transformations.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.dynamic_pricing(tfr_dfs)[source]#

Transform the EIA 861 Dynamic Pricing table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Convert Y/N columns to booleans.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.energy_efficiency(tfr_dfs)[source]#

Transform the EIA 861 Energy Efficiency table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Drop website column (almost no valid information).

  • Convert 1000s of dollars into dollars.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.green_pricing(tfr_dfs)[source]#

Transform the EIA 861 Green Pricing table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Convert 1000s of dollars into dollars.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.mergers(tfr_dfs)[source]#

Transform the EIA 861 Mergers table.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.net_metering(tfr_dfs)[source]#

Transform the EIA 861 Net Metering table.

Transformations include:

  • Remove rows with utility ids 99999.

  • Tidy subset of the data by customer class.

  • Tidy subset of the data by tech class.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.non_net_metering(tfr_dfs)[source]#

Transform the EIA 861 Non-Net Metering table.

Transformations include:

  • Remove rows with utility ids 99999.

  • Drop duplicate rows.

  • Tidy subset of the data by customer class.

  • Tidy subset of the data by tech class.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.operational_data(tfr_dfs)[source]#

Transform the EIA 861 Operational Data table.

Transformations include:

  • Remove rows with utility ids 88888.

  • Remove rows with NA utility id.

  • Clean up NERC codes and ensure one per row.

  • Convert data_observed field I/O into boolean.

  • Tidy subset of the data by revenue class.

  • Convert 1000s of dollars into dollars.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.reliability(tfr_dfs)[source]#

Transform the EIA 861 Reliability table.

Transformations include:

  • Tidy subset of the data by reliability standard.

  • Convert Y/N columns to booleans.

  • Map full spelling onto code values.

  • Drop duplicate rows.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.utility_data(tfr_dfs)[source]#

Transform the EIA 861 Utility Data table.

Transformations include:

  • Remove rows with utility ids 88888.

  • Clean up NERC codes and ensure one per row.

  • Tidy subset of the data by NERC region.

  • Tidy subset of the data by RTO.

  • Convert Y/N columns to booleans.

Parameters:

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns:

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type:

dict

pudl.transform.eia861.transform(raw_dfs, eia861_settings: pudl.settings.Eia861Settings = Eia861Settings())[source]#

Transform EIA 861 DataFrames.

Parameters:
  • raw_dfs (dict) – a dictionary of tab names (keys) and DataFrames (values). This can be generated by pudl.

  • eia861_settings – Object containing validated settings relevant to EIA 861.

Returns:

A dictionary of DataFrame objects in which pages from EIA 861 form (keys) corresponds to a normalized DataFrame of values from that page (values).

Return type:

dict