pudl.extract.eia861 module

Retrieve data from EIA Form 861 spreadsheets for analysis.

This modules pulls data from EIA’s published Excel spreadsheets.

This code is for use analyzing EIA Form 861 data.

class pudl.extract.eia861.ExtractorExcel(dataset_name, years, pudl_settings)[source]

Bases: object

A class for converting Excel files into DataFrames.

create_dfs(years)[source]

Create a dict of pages (keys) to DataDrames (values) from a dataset.

Parameters

years (list) – a list of years

Returns

A dictionary of pages (key) to DataFrames (values)

Return type

dict

get_column_map(year, file_name, page_name)[source]

Given a year and page, returns info needed to slurp it from Excel.

Parameters
  • year (int) –

  • file_name (str) –

  • page_name (str) –

Returns

sheet_loc skiprows column_map all_columns

get_file(yr, file_name)[source]

Construct the appopriate path for a given EIA860 Excel file.

Parameters
  • year (int) – The year that we’re trying to read data for.

  • file_name (str) – A string containing part of the file name for a given EIA 860 file (e.g. ‘Generat’)

Returns

Path to EIA 861 spreadsheets corresponding to a given year.

Return type

str

Raises

ValueError – If the requested year is not in the list of working years for EIA 861.

get_meta(meta_name, file_name)[source]

Grab the metadata file.

Parameters
  • meta_name (str) – the name of the top level metadata.

  • file_name (str) – if the metadata is in a nested subdirectory (such as ‘column_maps’ or ‘tab_maps’) the file_name is the file name. This name should correspond to the name of the Excel file being extracted.

Returns

pandas.DataFrame

get_page(years, page_name, file_name)[source]

Get a page from years of excel files and convert them to a DataFrame.

Parameters
  • years (list) –

  • page_name (str) –

  • file_name (str) –

get_path_name(yr, file_name)[source]

Get the ExcelFile file path name.

get_xlsx_dict(years, file_name)[source]

Read in Excel files to create Excel objects.

Rather than reading in the same Excel files several times, we can just read them each in once (one per year) and use the ExcelFile object to refer back to the data in memory.

Parameters
  • years (list) – The years that we’re trying to read data for.

  • file_name (str) – Name of the excel file.