pudl.extract.phmsagas#

Retrieves data from PHMSA natural gas spreadsheets for analysis.

This modules pulls data from PHMSA’s published Excel spreadsheets.

Module Contents#

Classes#

Extractor

Extractor for the excel dataset PHMSA.

Functions#

extract_phmsagas(context, raw_phmsagas__all_dfs)

Extract raw PHMSA gas data from excel sheets into dataframes.

Attributes#

pudl.extract.phmsagas.logger[source]#
class pudl.extract.phmsagas.Extractor(*args, **kwargs)[source]#

Bases: pudl.extract.excel.GenericExtractor

Extractor for the excel dataset PHMSA.

process_renamed(newdata: pandas.DataFrame, page: str, **partition)[source]#

Drop columns that get mapped to other assets.

Older years of PHMSA data have one Excel tab in the raw data, while newer data has multiple tabs. To extract data into tables that follow the newer data format without duplicating the older data, we need to split older pages into multiple tables by column. To prevent each table from containing all columns from these older years, filter by the list of columns specified for the page, with a warning.

pudl.extract.phmsagas.raw_table_names = ('raw_phmsagas__yearly_distribution',...[source]#
pudl.extract.phmsagas.raw_phmsagas__all_dfs[source]#
pudl.extract.phmsagas.extract_phmsagas(context, raw_phmsagas__all_dfs)[source]#

Extract raw PHMSA gas data from excel sheets into dataframes.

Parameters:

context – dagster keyword that provides access to resources and config.

Returns:

A tuple of extracted PHMSA gas dataframes.