pudl.output.ferc714

Functions & classes for compiling derived aspects of the FERC Form 714 data.

Module Contents

Classes

Respondents

A class coordinating compilation of data related to FERC 714 Respondents.

Functions

add_dates(rids_ferc714, report_dates)

Broadcast respondent data across dates.

categorize_eia_code(eia_codes, ba_ids, util_ids, priority='balancing_authority')

Categorize FERC 714 eia_codes as either balancing authority or utility IDs.

Attributes

ASSOCIATIONS

Adjustments to balancing authority-utility associations from EIA 861.

UTILITIES

Balancing authorities to treat as utilities in associations from EIA 861.

pudl.output.ferc714.ASSOCIATIONS :List[Dict[str, Any]][source]

Adjustments to balancing authority-utility associations from EIA 861.

The changes are applied locally to EIA 861 tables.

  • id (int): EIA balancing authority identifier (balancing_authority_id_eia).

  • from (int): Reference year, to use as a template for target years.

  • to (List[int]): Target years, in the closed interval format [minimum, maximum]. Rows in balancing_authority_eia861 are added (if missing) for every target year with the attributes from the reference year. Rows in balancing_authority_assn_eia861 are added (or replaced, if existing) for every target year with the utility associations from the reference year. Rows in service_territory_eia861 are added (if missing) for every target year with the nearest year’s associated utilities’ counties.

  • exclude (Optional[List[str]]): Utilities to exclude, by state (two-letter code). Rows are excluded from balancing_authority_assn_eia861 with target year and state.

pudl.output.ferc714.UTILITIES :List[Dict[str, Any]][source]

Balancing authorities to treat as utilities in associations from EIA 861.

The changes are applied locally to EIA 861 tables.

  • id (int): EIA balancing authority (BA) identifier (balancing_authority_id_eia). Rows for id are removed from balancing_authority_eia861.

  • reassign (Optional[bool]): Whether to reassign utilities to parent BAs. Rows for id as BA in balancing_authority_assn_eia861 are removed. Utilities assigned to id for a given year are reassigned to the BAs for which id is an associated utility.

  • replace (Optional[bool]): Whether to remove rows where id is a utility in balancing_authority_assn_eia861. Applies only if reassign=True.

pudl.output.ferc714.add_dates(rids_ferc714, report_dates)[source]

Broadcast respondent data across dates.

Parameters
  • rids_ferc714 (pandas.DataFrame) – A simple FERC 714 Respondent ID dataframe, without any date information.

  • report_dates (ordered collection of datetime) – Dates for which each respondent should be given a record.

Raises

ValueError – if a report_date column exists in rids_ferc714.

Returns

Dataframe having all the same columns as the input rids_ferc714 with the addition of a report_date column, but with all records associated with each respondent_id_ferc714 duplicated on a per-date basis.

Return type

pandas.DataFrame

pudl.output.ferc714.categorize_eia_code(eia_codes, ba_ids, util_ids, priority='balancing_authority')[source]

Categorize FERC 714 eia_codes as either balancing authority or utility IDs.

Most FERC 714 respondent IDs are associated with an eia_code which refers to either a balancing_authority_id_eia or a utility_id_eia but no indication as to which type of ID each one is. This is further complicated by the fact that EIA uses the same numerical ID to refer to the same entity in most but not all cases, when that entity acts as both a utility and as a balancing authority.

This function associates a respondent_type of utility, balancing_authority or pandas.NA with each input eia_code using the following rules: * If a eia_code appears only in util_ids the respondent_type will be utility. * If eia_code appears only in ba_ids the respondent_type will be assigned balancing_authority. * If eia_code appears in neither set of IDs, respondent_type will be assigned pandas.NA. * If eia_code appears in both sets of IDs, then whichever respondent_type has been selected with the priority flag will be assigned.

Note that the vast majority of balancing_authority_id_eia values also show up as utility_id_eia values, but only a small subset of the utility_id_eia values are associated with balancing authorities. If you use priority="utility" you should probably also be specifically compiling the list of Utility IDs because you know they should take precedence. If you use utility priority with all utility IDs

Parameters
  • eia_codes (ordered collection of ints) – A collection of IDs which may be either associated with EIA balancing authorities or utilities, to be categorized.

  • ba_ids_eia (ordered collection of ints) – A collection of IDs which should be interpreted as belonging to EIA Balancing Authorities.

  • util_ids_eia (ordered collection of ints) – A collection of IDs which should be interpreted as belonging to EIA Utilities.

  • priorty (str) – Which respondent_type to give priority to if the eia_code shows up in both util_ids_eia and ba_ids_eia. Must be one of “utility” or “balancing_authority”. The default is “balanacing_authority”.

Returns

A dataframe containing 2 columns: eia_code and respondent_type.

Return type

pandas.DataFrame

class pudl.output.ferc714.Respondents(pudl_out, pudl_settings=None, ba_ids=None, util_ids=None, priority='balancing_authority', limit_by_state=True)[source]

Bases: object

A class coordinating compilation of data related to FERC 714 Respondents.

The FERC 714 Respondents themselves are not complex as they are reported, but various ambiguities and the need to associate service territories with them mean there are a lot of different derived aspects related to them which we repeatedly need to compile in a self consistent way. This class allows you to choose several parameters for that compilation, and then easily access the resulting derived tabular outputs.

Some of these derived attributes are computationally expensive, and so they are cached internally. You can force a new computation in most cases by using update=True in the access methods. However, this functionality isn’t totally implemented because we’re still depending on the interim ETL processes for the FERC 714 and EIA 861 data, and we don’t want to trigger whole new ETL runs every time a derived value is updated.

pudl_out

The PUDL output object which should be used to obtain PUDL data.

Type

pudl.output.pudltabl.PudlTabl

pudl_settings

A dictionary of settings indicating where data related to PUDL can be found. Needed to obtain US Census DP1 data which has the county geometries.

Type

dict or None

ba_ids

EIA IDs that should be treated as referring to balancing authorities in respondent categorization process. If None, all known values of balancing_authority_id_eia will be used.

Type

ordered collection or None

util_ids

EIA IDs that should be treated as referring to utilities in respondent categorization process. If None, all known values of utility_id_eia will be used.

Type

ordered collection or None

priority

Which type of entity should take priority in the categorization of FERC 714 respondents. Must be either utility or balancing_authority. The default is balancing_authority.

Type

str

limit_by_state

Whether to limit respondent service territories to the states where they have documented activity in the EIA 861. Currently this is only implemented for Balancing Authorities.

Type

bool

balancing_authority_eia861(self) pandas.DataFrame[source]

Modified balancing_authority_eia861 table.

balancing_authority_assn_eia861(self) pandas.DataFrame[source]

Modified balancing_authority_assn_eia861 table.

service_territory_eia861(self) pandas.DataFrame[source]

Modified service_territory_eia861 table.

annualize(self, update=False)[source]

Broadcast respondent data across all years with reported demand.

The FERC 714 Respondent IDs and names are reported in their own table, without any refence to individual years, but much of the information we are associating with them varies annually. This method creates an annualized version of the respondent table, with each respondent having an entry corresponding to every year in which hourly demand was reported in the FERC 714 dataset as a whole – this necessarily means that many of the respondents will end up having entries for years in which they reported no demand, and that’s fine. They can be filtered later.

categorize(self, update=False)[source]

Annualized respondents with respondent_type assigned if possible.

Categorize each respondent as either a utility or a balancing_authority using the parameters stored in the instance of the class. While categorization can also be done without annualizing, this function annualizes as well, since we are adding the respondent_type in order to be able to compile service territories for the respondent, which vary annually.

summarize_demand(self, update=False)[source]

Compile annualized, categorized respondents and summarize values.

Calculated summary values include: * Total reported electricity demand per respondent (demand_annual_mwh) * Reported per-capita electrcity demand (demand_annual_per_capita_mwh) * Population density (population_density_km2) * Demand density (demand_density_mwh_km2)

These metrics are helpful identifying suspicious changes in the compiled annual geometries for the planning areas.

fipsify(self, update=False)[source]

Annual respondents with the county FIPS IDs for their service territories.

Given the respondent_type associated with each respondent (either utility or balancing_authority) compile a list of counties that are part of their service territory on an annual basis, and merge those into the annualized respondent table. This results in a very long dataframe, since there are thousands of counties and many of them are served by more than one entity.

Currently respondents categorized as utility will include any county that appears in the service_territory_eia861 table in association with that utility ID in each year, while for balancing_authority respondents, some counties can be excluded based on state (if self.limit_by_state==True).

georef_counties(self, update=False)[source]

Annual respondents with all associated county-level geometries.

Given the county FIPS codes associated with each respondent in each year, pull in associated geometries from the US Census DP1 dataset, so we can do spatial analyses. This keeps each county record independent – so there will be many records for each respondent in each year. This is fast, and still good for mapping, and retains all of the FIPS IDs so you can also still do ID based analyses.

georef_respondents(self, update=False)[source]

Annual respondents with a single all-encompassing geometry for each year.

Given the county FIPS codes associated with each responent in each year, compile a geometry for the respondent’s entire service territory annually. This results in just a single record per respondent per year, but is computationally expensive and you lose the information about what all counties are associated with the respondent in that year. But it’s useful for merging in other annual data like total demand, so you can see which respondent-years have both reported demand and decent geometries, calculate their areas to see if something changed from year to year, etc.