pudl.extract.eia860#

Retrieve data from EIA Form 860 spreadsheets for analysis.

This modules pulls data from EIA’s published Excel spreadsheets.

This code is for use analyzing EIA Form 860 data.

Module Contents#

Classes#

Extractor

Extractor for the excel dataset EIA860.

Functions#

eia_years_from_settings(context)

Return set of years for EIA in settings.

load_single_year(→ dict[str, pandas.DataFrame])

Load a single year of EIA data from file.

merge_eia860_years(→ dict[str, pandas.DataFrame])

Merge yearly EIA-860 dataframes.

eia860_raw_dfs(→ dict[str, pandas.DataFrame])

All loaded EIA860 dataframes.

extract_eia860(context, eia860_raw_dfs)

Extract raw EIA data from excel sheets into dataframes.

Attributes#

pudl.extract.eia860.logger[source]#
class pudl.extract.eia860.Extractor(*args, **kwargs)[source]#

Bases: pudl.extract.excel.GenericExtractor

Extractor for the excel dataset EIA860.

process_raw(df, page, **partition)[source]#

Apply necessary pre-processing to the dataframe.

  • Rename columns based on our compiled spreadsheet metadata

  • Add report_year if it is missing

  • Add a flag indicating if record came from EIA 860, or EIA 860M

  • Fix any generator_id values with leading zeroes.

static get_dtypes(page, **partition)[source]#

Returns dtypes for plant id columns.

pudl.extract.eia860.raw_table_names = ('raw_eia860__boiler_cooling', 'raw_eia860__boiler_generator_assn', 'raw_eia860__boiler_info',...[source]#
pudl.extract.eia860.eia_years_from_settings(context)[source]#

Return set of years for EIA in settings.

These will be used to kick off worker processes to load each year of data in parallel.

pudl.extract.eia860.load_single_year(context, year: int) dict[str, pandas.DataFrame][source]#

Load a single year of EIA data from file.

Parameters:
  • context – context: dagster keyword that provides access to resources and config.

  • year – Year to load.

Returns:

Loaded data in a dataframe.

pudl.extract.eia860.merge_eia860_years(yearly_dfs: list[dict[str, pandas.DataFrame]]) dict[str, pandas.DataFrame][source]#

Merge yearly EIA-860 dataframes.

pudl.extract.eia860.eia860_raw_dfs() dict[str, pandas.DataFrame][source]#

All loaded EIA860 dataframes.

This asset creates a dynamic graph of ops to load EIA860 data in parallel.

pudl.extract.eia860.extract_eia860(context, eia860_raw_dfs)[source]#

Extract raw EIA data from excel sheets into dataframes.

Parameters:

context – dagster keyword that provides access to resources and config.

Returns:

A tuple of extracted EIA dataframes.