pudl.transform.ferc714

Transformation of the FERC Form 714 data.

Module Contents

Functions

_standardize_offset_codes(df, offset_fixes)

Convert to standardized UTC offset abbreviations.

_log_dupes(df, dupe_cols)

A macro to report the number of duplicate hours found.

respondent_id(tfr_dfs)

Transform the FERC 714 respondent IDs, names, and EIA utility IDs.

demand_hourly_pa(tfr_dfs)

Transform the hourly demand time series by Planning Area.

id_certification(tfr_dfs)

A stub transform function.

gen_plants_ba(tfr_dfs)

A stub transform function.

demand_monthly_ba(tfr_dfs)

A stub transform function.

net_energy_load_ba(tfr_dfs)

A stub transform function.

adjacency_ba(tfr_dfs)

A stub transform function.

interchange_ba(tfr_dfs)

A stub transform function.

lambda_hourly_ba(tfr_dfs)

A stub transform function.

lambda_description(tfr_dfs)

A stub transform function.

description_pa(tfr_dfs)

A stub transform function.

demand_forecast_pa(tfr_dfs)

A stub transform function.

_early_transform(raw_df)

A simple transform function for until the real ones are written.

transform(raw_dfs, ferc714_settings: pudl.settings.Ferc714Settings = Ferc714Settings())

Prepare the raw FERC 714 dataframes for loading into the PUDL database.

Attributes

logger

OFFSET_CODE_FIXES

OFFSET_CODE_FIXES_BY_YEAR

BAD_RESPONDENTS

Fake respondent IDs for database test entities.

OFFSET_CODES

A mapping of timezone offset codes to Timedelta offsets from UTC.

TZ_CODES

Mapping between standardized time offset codes and canonical timezones.

EIA_CODE_FIXES

Overrides of FERC 714 respondent IDs with wrong or missing EIA Codes

RENAME_COLS

pudl.transform.ferc714.logger[source]
pudl.transform.ferc714.OFFSET_CODE_FIXES[source]
pudl.transform.ferc714.OFFSET_CODE_FIXES_BY_YEAR[source]
pudl.transform.ferc714.BAD_RESPONDENTS = [319, 99991, 99992, 99993, 99994, 99995][source]

Fake respondent IDs for database test entities.

pudl.transform.ferc714.OFFSET_CODES[source]

A mapping of timezone offset codes to Timedelta offsets from UTC.

from one year to the next, and these result in duplicate records, which are Note that the FERC 714 instructions state that all hourly demand is to be reported in STANDARD time for whatever timezone is being used. Even though many respondents use daylight savings / standard time abbreviations, a large majority do appear to conform to using a single UTC offset throughout the year. There are 6 instances in which the timezone associated with reporting changed dropped.

pudl.transform.ferc714.TZ_CODES[source]

Mapping between standardized time offset codes and canonical timezones.

pudl.transform.ferc714.EIA_CODE_FIXES[source]

Overrides of FERC 714 respondent IDs with wrong or missing EIA Codes

pudl.transform.ferc714.RENAME_COLS[source]
pudl.transform.ferc714._standardize_offset_codes(df, offset_fixes)[source]

Convert to standardized UTC offset abbreviations.

This function ensures that all of the 3-4 letter abbreviations used to indicate a timestamp’s localized offset from UTC are standardized, so that they can be used to make the timestamps timezone aware. The standard abbreviations we’re using are:

“HST”: Hawaii Standard Time “AKST”: Alaska Standard Time “AKDT”: Alaska Daylight Time “PST”: Pacific Standard Time “PDT”: Pacific Daylight Time “MST”: Mountain Standard Time “MDT”: Mountain Daylight Time “CST”: Central Standard Time “CDT”: Central Daylight Time “EST”: Eastern Standard Time “EDT”: Eastern Daylight Time

In some cases different respondents use the same non-standard abbreviations to indicate different offsets, and so the fixes are applied on a per-respondent basis, as defined by offset_fixes.

Parameters
  • df (pandas.DataFrame) – A DataFrame containing a utc_offset_code column that needs to be standardized.

  • offset_fixes (dict) – A dictionary with respondent_id_ferc714 values as the keys, and a dictionary mapping non-standard UTC offset codes to the standardized UTC offset codes as the value.

Returns

Standardized UTC offset codes.

pudl.transform.ferc714._log_dupes(df, dupe_cols)[source]

A macro to report the number of duplicate hours found.

pudl.transform.ferc714.respondent_id(tfr_dfs)[source]

Transform the FERC 714 respondent IDs, names, and EIA utility IDs.

This consists primarily of dropping test respondents and manually assigning EIA utility IDs to a few FERC Form 714 respondents that report planning area demand, but which don’t have their corresponding EIA utility IDs provided by FERC for some reason (including PacifiCorp).

Parameters

tfr_dfs (dict) – A dictionary of (partially) transformed dataframes, to be cleaned up.

Returns

The input dictionary of dataframes, but with a finished respondent_id_ferc714 dataframe.

Return type

dict

pudl.transform.ferc714.demand_hourly_pa(tfr_dfs)[source]

Transform the hourly demand time series by Planning Area.

Transformations include:

  • Clean UTC offset codes.

  • Replace UTC offset codes with UTC offset and timezone.

  • Drop 25th hour rows.

  • Set records with 0 UTC code to 0 demand.

  • Drop duplicate rows.

  • Flip negative signs for reported demand.

Parameters

tfr_dfs (dict) – A dictionary of (partially) transformed dataframes, to be cleaned up.

Returns

The input dictionary of dataframes, but with a finished pa_demand_hourly_ferc714 dataframe.

Return type

dict

pudl.transform.ferc714.id_certification(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.gen_plants_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.demand_monthly_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.net_energy_load_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.adjacency_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.interchange_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.lambda_hourly_ba(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.lambda_description(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.description_pa(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714.demand_forecast_pa(tfr_dfs)[source]

A stub transform function.

pudl.transform.ferc714._early_transform(raw_df)[source]

A simple transform function for until the real ones are written.

  • Removes footnotes columns ending with _f

  • Drops report_prd, spplmnt_num, and row_num columns

  • Excludes records which pertain to bad (test) respondents.

pudl.transform.ferc714.transform(raw_dfs, ferc714_settings: pudl.settings.Ferc714Settings = Ferc714Settings())[source]

Prepare the raw FERC 714 dataframes for loading into the PUDL database.

Parameters
  • raw_dfs (dict) – A dictionary of raw pandas.DataFrame objects, as read out of the original FERC 714 CSV files. Generated by the pudl.extract.ferc714.extract() function.

  • tables (iterable) – The set of PUDL tables within FERC 714 that we should process. Typically set to all of them, unless

Returns

A dictionary of pandas.DataFrame objects that are ready to be output in a data package / database table.

Return type

dict