pudl.extract.epaipm module

Retrieve data from EPA’s Integrated Planning Model (IPM) v6.

Unlike most of the PUDL data sources, IPM is not an annual timeseries. This file assumes that only v6 will be used as an input, so there are a limited number of files.

This module was written by @gschivley

pudl.extract.epaipm.create_dfs_epaipm(files, data_dir)[source]

Makes dictionary of pages (keys) to dataframes (values) for epaipm tabs.

Parameters
  • files (list) – a list of epaipm files

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Returns

dictionary of pages (key) to dataframes (values)

Return type

dict

pudl.extract.epaipm.extract(epaipm_tables, data_dir)[source]

Extracts data from IPM files.

Parameters
  • epaipm_tables (iterable) – A tuple or list of table names to extract

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Returns

dictionary of DataFrames with extracted (but not yet transformed) data from each file.

Return type

dict

pudl.extract.epaipm.get_epaipm_file(filename, read_file_args, data_dir)[source]

Reads in files to create dataframes.

No need to use ExcelFile objects with the IPM files because each file is only a single sheet.

Parameters
  • filename (str) – [‘single_transmission’, ‘joint_transmission’]

  • read_file_args (dict) – dictionary of arguments for pandas read_*

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Returns

an xlsx file of EPA IPM data.

Return type

pandas.io.excel.ExcelFile

pudl.extract.epaipm.get_epaipm_name(file, data_dir)[source]

Returns the appropriate EPA IPM excel file.

Parameters
  • file (str) – The file that we’re trying to read data for.

  • data_dir (path-like) – Path to the top directory of the PUDL datastore.

Returns

The path to EPA IPM spreadsheet.

Return type

str