pudl.transform.eia861 module

Module to perform data cleaning functions on EIA861 data tables.

All transformations include: - Replace . values with NA.

pudl.transform.eia861.advanced_metering_infrastructure(tfr_dfs)[source]

Transform the EIA 861 Advanced Metering Infrastructure table.

Transformations include:

  • Tidy data by customer class.

  • Drop total_meters columns (it’s calculable with other fields).

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.balancing_authority(tfr_dfs)[source]

Transform the EIA 861 Balancing Authority table.

Transformations include:

  • Fill in balancing authrority IDs based on date, utility ID, and BA Name.

  • Backfill balancing authority codes based on BA ID.

  • Fix BA code and ID typos.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.balancing_authority_assn(tfr_dfs)[source]

Compile a balancing authority, utility, state association table.

For the years up through 2012, the only BA-Util information that’s available comes from the balancing_authority_eia861 table, and it does not include any state-level information. However, there is utility-state association information in the sales_eia861 and other data tables.

For the years from 2013 onward, there’s explicit BA-Util-State information in the data tables (e.g. sales_eia861). These observed associations can be compiled to give us a picture of which BA-Util-State associations exist. However, we need to merge in the balancing authority IDs since the data tables only contain the balancing authority codes.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 dataframes. This must include any dataframes from which we want to compile BA-Util-State associations, which means this function has to be called after all the basic transformfunctions that depend on only a single raw table.

Returns

a dictionary of transformed dataframes. This function both compiles the association table, and finishes the normalization of the balancing authority table. It may be that once the harvesting process incorporates the EIA 861, some or all of this functionality should be pulled into the phase-2 transform functions.

Return type

dict

pudl.transform.eia861.demand_response(tfr_dfs)[source]

Transform the EIA 861 Demand Response table.

Transformations include:

  • Fill in NA balancing authority codes with UNK (because it’s part of the primary key).

  • Tidy subset of the data by customer class.

  • Drop duplicate rows based on primary keys.

  • Convert 1000s of dollars into dollars.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.demand_side_management(tfr_dfs)[source]

Transform the EIA 861 Demand Side Management table.

In 2013, the EIA changed the contents of the 861 form so that information pertaining to demand side management was no longer housed in a single table, but rather two seperate ones pertaining to energy efficiency and demand response. While the pre and post 2013 tables contain similar information, one column in the pre-2013 demand side management table may not have an obvious column equivalent in the post-2013 energy efficiency or demand response data. We’ve addressed this by keeping the demand side management and energy efficiency and demand response tables seperate. Use the DSM table for pre 2013 data and the EE / DR tables for post 2013 data. Despite the uncertainty of comparing across these years, the data are similar and we hope to provide a cohesive dataset in the future with all years and comprable columns combined.

Transformations include:

  • Clean up NERC codes and ensure one per row.

  • Remove demand_side_management and data_observed columns (they are all the same).

  • Tidy subset of the data by customer class.

  • Convert Y/N columns to booleans.

  • Convert 1000s of dollars into dollars.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.distributed_generation(tfr_dfs)[source]

Transform the EIA 861 Distributed Generation table.

Transformations include:

  • Map full spelling onto code values.

  • Convert pre-2010 percent values in mw values.

  • Remove total columns calculable with other fields.

  • Tidy subset of the data by tech class.

  • Tidy subset of the data by fuel class.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.distribution_systems(tfr_dfs)[source]

Transform the EIA 861 Distribution Systems table.

Transformations include:

  • No additional transformations.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.dynamic_pricing(tfr_dfs)[source]

Transform the EIA 861 Dynamic Pricing table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Convert Y/N columns to booleans.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.energy_efficiency(tfr_dfs)[source]

Transform the EIA 861 Energy Efficiency table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Drop website column (almost no valid information).

  • Convert 1000s of dollars into dollars.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.green_pricing(tfr_dfs)[source]

Transform the EIA 861 Green Pricing table.

Transformations include:

  • Tidy subset of the data by customer class.

  • Convert 1000s of dollars into dollars.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.mergers(tfr_dfs)[source]

Transform the EIA 861 Mergers table.

Transformations include:

  • Map full spelling onto code values.

  • Retain preceeding zeros in zipcode field.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.net_metering(tfr_dfs)[source]

Transform the EIA 861 Net Metering table.

Transformations include:

  • Remove rows with utility ids 99999.

  • Tidy subset of the data by customer class.

  • Tidy subset of the data by tech class.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.non_net_metering(tfr_dfs)[source]

Transform the EIA 861 Non-Net Metering table.

Transformations include:

  • Remove rows with utility ids 99999.

  • Drop duplicate rows.

  • Tidy subset of the data by customer class.

  • Tidy subset of the data by tech class.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.normalize_balancing_authority(tfr_dfs)[source]

Finish the normalization of the balancing_authority_eia861 table.

The balancing_authority_assn_eia861 table depends on information that is only available in the UN-normalized form of the balancing_authority_eia861 table, so and also on having access to a bunch of transformed data tables, so it can compile the observed combinations of report dates, balancing authorities, states, and utilities. This means that we have to hold off on the final normalization of the balancing_authority_eia861 table until the rest of the transform process is over.

pudl.transform.eia861.operational_data(tfr_dfs)[source]

Transform the EIA 861 Operational Data table.

Transformations include:

  • Remove rows with utility ids 88888.

  • Remove rows with NA utility id.

  • Clean up NERC codes and ensure one per row.

  • Convert data_observed field I/O into boolean.

  • Tidy subset of the data by revenue class.

  • Convert 1000s of dollars into dollars.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.reliability(tfr_dfs)[source]

Transform the EIA 861 Reliability table.

Transformations include:

  • Tidy subset of the data by reliability standard.

  • Convert Y/N columns to booleans.

  • Map full spelling onto code values.

  • Drop duplicate rows.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict

pudl.transform.eia861.sales(tfr_dfs)[source]

Transform the EIA 861 Sales table.

Transformations include:

  • Remove rows with utility ids 88888 and 99999.

  • Tidy data by customer class.

  • Drop primary key duplicates.

  • Convert 1000s of dollars into dollars.

  • Convert data_observed field I/O into boolean.

  • Map full spelling onto code values.

pudl.transform.eia861.service_territory(tfr_dfs)[source]

Transform the EIA 861 utility service territory table.

Transformations include:

  • Homogenize spelling of county names.

  • Add field for state/county FIPS code.

Parameters

tfr_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA861 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

a dictionary of pandas.DataFrame objects in which pages from EIA861 form

(keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia861.transform(raw_dfs, eia861_tables=('service_territory_eia861', 'balancing_authority_eia861', 'sales_eia861', 'advanced_metering_infrastructure_eia861', 'demand_response_eia861', 'demand_side_management_eia861', 'distributed_generation_eia861', 'distribution_systems_eia861', 'dynamic_pricing_eia861', 'energy_efficiency_eia861', 'green_pricing_eia861', 'mergers_eia861', 'net_metering_eia861', 'non_net_metering_eia861', 'operational_data_eia861', 'reliability_eia861', 'utility_data_eia861'))[source]

Transform EIA 861 DataFrames.

Parameters
  • raw_dfs (dict) – a dictionary of tab names (keys) and DataFrames (values). This can be generated by pudl.

  • eia861_tables (tuple) – A tuple containing the names of the EIA 861 tables that can be pulled into PUDL.

Returns

A dictionary of DataFrame objects in which pages from EIA 861 form (keys) corresponds to a normalized DataFrame of values from that page (values).

Return type

dict

pudl.transform.eia861.utility_assn(tfr_dfs)[source]

Harvest a Utility-Date-State Association Table.

pudl.transform.eia861.utility_data(tfr_dfs)[source]

Transform the EIA 861 Utility Data table.

Transformations include:

  • Remove rows with utility ids 88888.

  • Clean up NERC codes and ensure one per row.

  • Tidy subset of the data by NERC region.

  • Tidy subset of the data by RTO.

  • Convert Y/N columns to booleans.

Parameters

tfr_dfs (dict) – A dictionary of transformed EIA 861 DataFrames, keyed by table name. It will be mutated by this function.

Returns

A dictionary of transformed EIA 861 dataframes, keyed by table name.

Return type

dict