pudl.transform.eia923 module

Module to perform data cleaning functions on EIA923 data tables.

pudl.transform.eia923.boiler_fuel(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the boiler_fuel_eia923 table.

Transformations include:

  • Remove fields implicated elsewhere.

  • Drop values with plant and boiler id values of NA.

  • Replace . values with NA.

  • Create a fuel_type_code_pudl field that organizes fuel types into clean, distinguishable categories.

  • Combine year and month columns into a single date column.

Parameters
  • eia923_dfs (dict) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages

from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.coalmine(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the coalmine_eia923 table.

Transformations include:

  • Remove fields implicated elsewhere.

  • Drop duplicates with MSHA ID.

Parameters
  • eia923_dfs (dict) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.fuel_receipts_costs(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the fuel_receipts_costs_eia923 dataframe.

Transformations include:

  • Remove fields implicated elsewhere.

  • Replace . values with NA.

  • Standardize codes values.

  • Fix dates.

  • Replace invalid mercury content values with NA.

Fuel cost is reported in cents per mmbtu. Converts cents to dollars.

Parameters
  • eia923_dfs (dict) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.generation(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the generation_eia923 table.

Transformations include:

  • Drop rows with NA for generator id.

  • Remove fields implicated elsewhere.

  • Replace . values with NA.

  • Drop generator-date row duplicates (all have no data).

Parameters
  • eia923_dfs (dict) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.generation_fuel(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the generation_fuel_eia923 table.

Transformations include:

  • Remove fields implicated elsewhere.

  • Replace . values with NA.

  • Remove rows with utility ids 99999.

  • Create a fuel_type_code_pudl field that organizes fuel types into clean, distinguishable categories.

  • Combine year and month columns into a single date column.

Parameters
  • eia923_dfs (dict) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.plants(eia923_dfs, eia923_transformed_dfs)[source]

Transforms the plants_eia923 table.

Much of the static plant information is reported repeatedly, and scattered across several different pages of EIA 923. The data frame that this function uses is assembled from those many different pages, and passed in via the same dictionary of dataframes that all the other ingest functions use for uniformity.

Transformations include:

  • Map full spelling onto code values.

  • Convert Y/N columns to booleans.

  • Remove excess white space around values.

  • Drop duplicate rows.

Parameters
  • eia923_dfs (dictionary of pandas.DataFrame) – Each entry in this dictionary of DataFrame objects corresponds to a page from the EIA 923 form, as reported in the Excel spreadsheets they distribute.

  • eia923_transformed_dfs (dict) – A dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Returns

eia923_transformed_dfs, a dictionary of DataFrame objects in which pages from EIA923 form (keys) correspond to normalized DataFrames of values from that page (values).

Return type

dict

pudl.transform.eia923.transform(eia923_raw_dfs, eia923_tables=('generation_fuel_eia923', 'boiler_fuel_eia923', 'generation_eia923', 'coalmine_eia923', 'fuel_receipts_costs_eia923'))[source]

Transforms all the EIA 923 tables.

Parameters
  • eia923_raw_dfs (dict) – a dictionary of tab names (keys) and DataFrames (values). Generated from pudl.extract.eia923.extract().

  • eia923_tables (tuple) – A tuple containing the EIA923 tables that can be pulled into PUDL.

Returns

A dictionary of DataFrame with table names as keys and pandas.DataFrame objects as values, where the contents of the DataFrames correspond to cleaned and normalized PUDL database tables, ready for loading.

Return type

dict