pudl.extract.epaipm module

Retrieve data from EPA’s Integrated Planning Model (IPM) v6.

Unlike most of the PUDL data sources, IPM is not an annual timeseries. This file assumes that only v6 will be used as an input, so there are a limited number of files.

This module was written by @gschivley

class pudl.extract.epaipm.EpaIpmDatastore(datastore: pudl.workspace.datastore.Datastore)[source]

Bases: object

Helper for extracting EpaIpm dataframes from Datastore.

SETTINGS = (TableSettings(table_name='transmission_single_epaipm', file='table_3-21_annual_transmission_capabilities_of_u.s._model_regions_in_epa_platform_v6_-_2021.xlsx', excel_settings={'skiprows': 3, 'usecols': 'B:F', 'index_col': [0, 1]}), TableSettings(table_name='transmission_joint_epaipm', file='table_3-5_transmission_joint_ipm.csv', excel_settings={}), TableSettings(table_name='load_curves_epaipm', file='table_2-2_load_duration_curves_used_in_epa_platform_v6.xlsx', excel_settings={'skiprows': 3, 'usecols': 'B:AB'}), TableSettings(table_name='plant_region_map_epaipm_active', file='needs_v6_november_2018_reference_case_0.xlsx', excel_settings={'sheet_name': 'NEEDS v6_Active', 'usecols': 'C,I'}), TableSettings(table_name='plant_region_map_epaipm_retired', file='needs_v6_november_2018_reference_case_0.xlsx', excel_settings={'sheet_name': 'NEEDS v6_Retired_Through2021', 'usecols': 'C,I'}))
get_dataframe(table_name: str)pandas.core.frame.DataFrame[source]

Retrieve the specified file from the epaipm archive.

Parameters
  • table_name – table name, from self.table_filename

  • pandas_args – pandas arguments for parsing the file

Returns

Pandas dataframe of EPA IPM data.

get_table_settings(table_name: str)pudl.extract.epaipm.TableSettings[source]

Returns TableSettings for a given table_name.

class pudl.extract.epaipm.TableSettings(table_name: str, file: str, excel_settings: Dict[str, Any] = {})[source]

Bases: tuple

Contains information for how to access and load EpaIpm dataframes.

excel_settings: Dict[str, Any]

Alias for field number 2

file: str

Alias for field number 1

table_name: str

Alias for field number 0

pudl.extract.epaipm.extract(epaipm_tables: List[str], ds: pudl.workspace.datastore.Datastore)Dict[str, pandas.core.frame.DataFrame][source]

Extracts data from IPM files.

Parameters
  • epaipm_tables (iterable) – A tuple or list of table names to extract

  • ds (EpaIpmDatastore) – Initialized datastore

Returns

dictionary of DataFrames with extracted (but not yet transformed) data from each file.

Return type

dict