pudl.transform.ferc714#

Transformation of the FERC Form 714 data.

Module Contents#

Functions#

_pre_process(→ pandas.DataFrame)

A simple transform function for until the real ones are written.

_post_process(→ pandas.DataFrame)

Uniform post-processing of FERC 714 tables.

_standardize_offset_codes(→ pandas.DataFrame)

Convert to standardized UTC offset abbreviations.

core_ferc714__respondent_id(→ pandas.DataFrame)

Transform the FERC 714 respondent IDs, names, and EIA utility IDs.

out_ferc714__hourly_planning_area_demand(...)

Transform the hourly demand time series by Planning Area.

Attributes#

logger

OFFSET_CODE_FIXES

OFFSET_CODE_FIXES_BY_YEAR

BAD_RESPONDENTS

Fake respondent IDs for database test entities.

OFFSET_CODES

A mapping of timezone offset codes to Timedelta offsets from UTC.

TZ_CODES

Mapping between standardized time offset codes and canonical timezones.

EIA_CODE_FIXES

Overrides of FERC 714 respondent IDs with wrong or missing EIA Codes.

RENAME_COLS

pudl.transform.ferc714.logger[source]#
pudl.transform.ferc714.OFFSET_CODE_FIXES[source]#
pudl.transform.ferc714.OFFSET_CODE_FIXES_BY_YEAR[source]#
pudl.transform.ferc714.BAD_RESPONDENTS = [319, 99991, 99992, 99993, 99994, 99995][source]#

Fake respondent IDs for database test entities.

pudl.transform.ferc714.OFFSET_CODES[source]#

A mapping of timezone offset codes to Timedelta offsets from UTC.

from one year to the next, and these result in duplicate records, which are Note that the FERC 714 instructions state that all hourly demand is to be reported in STANDARD time for whatever timezone is being used. Even though many respondents use daylight savings / standard time abbreviations, a large majority do appear to conform to using a single UTC offset throughout the year. There are 6 instances in which the timezone associated with reporting changed dropped.

pudl.transform.ferc714.TZ_CODES[source]#

Mapping between standardized time offset codes and canonical timezones.

pudl.transform.ferc714.EIA_CODE_FIXES[source]#

Overrides of FERC 714 respondent IDs with wrong or missing EIA Codes.

pudl.transform.ferc714.RENAME_COLS[source]#
pudl.transform.ferc714._pre_process(df: pandas.DataFrame, table_name: str) pandas.DataFrame[source]#

A simple transform function for until the real ones are written.

  • Removes footnotes columns ending with _f

  • Drops report_prd, spplmnt_num, and row_num columns

  • Excludes records which pertain to bad (test) respondents.

pudl.transform.ferc714._post_process(df: pandas.DataFrame, table_name: str) pandas.DataFrame[source]#

Uniform post-processing of FERC 714 tables.

Applies standard data types and ensures that the tables generally conform to the schemas we have defined for them.

Parameters:

df – A dataframe to be post-processed.

Returns:

The post-processed dataframe.

pudl.transform.ferc714._standardize_offset_codes(df: pandas.DataFrame, offset_fixes) pandas.DataFrame[source]#

Convert to standardized UTC offset abbreviations.

This function ensures that all of the 3-4 letter abbreviations used to indicate a timestamp’s localized offset from UTC are standardized, so that they can be used to make the timestamps timezone aware. The standard abbreviations we’re using are:

“HST”: Hawaii Standard Time “AKST”: Alaska Standard Time “AKDT”: Alaska Daylight Time “PST”: Pacific Standard Time “PDT”: Pacific Daylight Time “MST”: Mountain Standard Time “MDT”: Mountain Daylight Time “CST”: Central Standard Time “CDT”: Central Daylight Time “EST”: Eastern Standard Time “EDT”: Eastern Daylight Time

In some cases different respondents use the same non-standard abbreviations to indicate different offsets, and so the fixes are applied on a per-respondent basis, as defined by offset_fixes.

Parameters:
  • df (pandas.DataFrame) – A DataFrame containing a utc_offset_code column that needs to be standardized.

  • offset_fixes (dict) – A dictionary with respondent_id_ferc714 values as the keys, and a dictionary mapping non-standard UTC offset codes to the standardized UTC offset codes as the value.

Returns:

Standardized UTC offset codes.

pudl.transform.ferc714.core_ferc714__respondent_id(raw_ferc714__respondent_id: pandas.DataFrame) pandas.DataFrame[source]#

Transform the FERC 714 respondent IDs, names, and EIA utility IDs.

Clean up FERC-714 respondent names and manually assign EIA utility IDs to a few FERC Form 714 respondents that report planning area demand, but which don’t have their corresponding EIA utility IDs provided by FERC for some reason (including PacifiCorp).

Parameters:

raw_ferc714__respondent_id – Raw table describing the FERC 714 Respondents.

Returns:

A clean(er) version of the FERC-714 respondents table.

pudl.transform.ferc714.out_ferc714__hourly_planning_area_demand(raw_ferc714__hourly_planning_area_demand: pandas.DataFrame) pandas.DataFrame[source]#

Transform the hourly demand time series by Planning Area.

Transformations include:

  • Clean UTC offset codes.

  • Replace UTC offset codes with UTC offset and timezone.

  • Drop 25th hour rows.

  • Set records with 0 UTC code to 0 demand.

  • Drop duplicate rows.

  • Flip negative signs for reported demand.

Parameters:

raw_ferc714__hourly_planning_area_demand – Raw table containing hourly demand time series by Planning Area.

Returns:

Clean(er) version of the hourly demand time series by Planning Area.