`pudl.output.ferc1`#

A collection of denormalized FERC assets and helper functions.

Module Contents#

Classes#

`NodeId`	The primary keys which identify a node in a calculation tree.
`OffByFactoid`	A calculated factoid which is off by one other factoid.
`Exploder`	Get unique, granular datapoints from a set of related, nested FERC1 tables.
`XbrlCalculationForestFerc1`	A class for manipulating groups of hierarchically nested XBRL calculations.

Functions#

`get_core_ferc1_asset_description`(→ str)	Get the asset description portion of a core FERC FORM 1 asset.
`_out_ferc1__yearly_plants_utilities`(→ pandas.DataFrame)	A denormalized table containing FERC plant and utility names and IDs.
`out_ferc1__yearly_steam_plants_sched402`(→ pandas.DataFrame)	Select and joins some useful fields from the FERC Form 1 steam table.
`out_ferc1__yearly_small_plants_sched410`(→ pandas.DataFrame)	Pull a useful dataframe related to the FERC Form 1 small plants.
`out_ferc1__yearly_hydroelectric_plants_sched406`(...)	Pull a useful dataframe related to the FERC Form 1 hydro plants.
`out_ferc1__yearly_pumped_storage_plants_sched408`(...)	Pull a dataframe of FERC Form 1 Pumped Storage plant data.
`out_ferc1__yearly_steam_plants_fuel_sched402`(...)	Pull a useful dataframe related to FERC Form 1 fuel information.
`out_ferc1__yearly_purchased_power_and_exchanges_sched326`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_plant_in_service_sched204`(...)	Pull a dataframe of FERC Form 1 Electric Plant in Service data.
`out_ferc1__yearly_balance_sheet_assets_sched110`(...)	Pull a useful dataframe of FERC Form 1 balance sheet assets data.
`out_ferc1__yearly_balance_sheet_liabilities_sched110`(...)	Pull a useful dataframe of FERC Form 1 balance_sheet liabilities data.
`out_ferc1__yearly_cash_flows_sched120`(→ pandas.DataFrame)	Pull a useful dataframe of FERC Form 1 cash flow data.
`out_ferc1__yearly_depreciation_summary_sched336`(...)	Pull a useful dataframe of FERC Form 1 depreciation amortization data.
`out_ferc1__yearly_energy_dispositions_sched401`(...)	Pull a useful dataframe of FERC Form 1 energy dispositions data.
`out_ferc1__yearly_energy_sources_sched401`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_operating_expenses_sched320`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_operating_revenues_sched300`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_depreciation_changes_sched219`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_depreciation_by_function_sched219`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_sales_by_rate_schedules_sched304`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_income_statements_sched114`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_other_regulatory_liabilities_sched278`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_retained_earnings_sched118`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_transmission_lines_sched422`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_utility_plant_summary_sched200`(...)	Pull a useful dataframe of FERC Form 1 Purchased Power data.
`out_ferc1__yearly_all_plants`(→ pandas.DataFrame)	Combine the steam, small generators, hydro, and pumped storage tables.
`out_ferc1__yearly_steam_plants_fuel_by_plant_sched402`(...)	Summarize FERC fuel data by plant for output.
`calc_annual_capital_additions_ferc1`(→ pandas.DataFrame)	Calculate annual capital additions for FERC1 steam records.
`add_mean_cap_additions`(steam_df)	Add mean capital additions over lifetime of plant.
`_out_ferc1__detailed_tags`(→ pandas.DataFrame)	Grab the stored tables of tags and add inferred dimension.
`_get_tags`(→ pandas.DataFrame)	Grab tags from a stored CSV file and apply `make_xbrl_factoid_dimensions_explicit()`.
`_aggregatable_dimension_tags`(→ pandas.DataFrame)
`exploded_table_asset_factory`(→ dagster.AssetsDefinition)	Create an exploded table based on a set of related input tables.
`create_exploded_table_assets`(...)	Create a list of exploded FERC Form 1 assets.
`in_explosion_tables`(→ bool)	Determine if any of a list of table_names in the list of thre explosion tables.
`nodes_to_df`(→ pandas.DataFrame)	Construct a dataframe from a list of nodes, including their annotations.
`_propagate_tags_leafward`(→ networkx.DiGraph)	Push a parent's tags down to its descendants.
`_propagate_tag_rootward`(→ networkx.DiGraph)	Set the tag for nodes when all of its children have same tag.
`_propagate_tags_to_corrections`(→ networkx.DiGraph)
`check_tag_propagation_compared_to_compiled_tags`(df, ...)	Check if tags got propagated.
`check_for_correction_xbrl_factoids_with_tag`(df, ...)	Check if any correction records have tags.
`out_ferc1__yearly_rate_base`(→ pandas.DataFrame)	Make a table of granular utility rate-base data.

Attributes#

`logger`
`EXPLOSION_CALCULATION_TOLERANCES`
`MANUAL_DBF_METADATA_FIXES`	Manually compiled metadata from DBF-only or PUDL-generated xbrl_factios.
`EXPLOSION_ARGS`
`exploded_ferc1_assets`

pudl.output.ferc1.logger[source]#

pudl.output.ferc1.EXPLOSION_CALCULATION_TOLERANCES: dict[str, pudl.transform.ferc1.GroupMetricChecks][source]#

pudl.output.ferc1.MANUAL_DBF_METADATA_FIXES: dict[str, dict[str, str]][source]#

Manually compiled metadata from DBF-only or PUDL-generated xbrl_factios.

Note: the factoids beginning with “less” here could be removed after a transition of expectations from assuming the calculation components in any given explosion is a tree structure to being a dag. These xbrl_factoids were added in transform.ferc1 and could be removed upon this transition.

pudl.output.ferc1.get_core_ferc1_asset_description(asset_name: str) → str[source]#

Get the asset description portion of a core FERC FORM 1 asset.

This is useful when programatically constructing output assets from core assets using asset factories.

Parameters:: asset_name – The name of the core asset.
Returns:: The asset description portion of the asset name.
Return type:: asset_description

pudl.output.ferc1._out_ferc1__yearly_plants_utilities(core_pudl__assn_ferc1_pudl_plants: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: A denormalized table containing FERC plant and utility names and IDs.

pudl.output.ferc1.out_ferc1__yearly_steam_plants_sched402(_out_ferc1__yearly_plants_utilities: pandas.DataFrame, _out_ferc1__yearly_steam_plants_sched402_with_plant_ids: pandas.DataFrame) → pandas.DataFrame[source]#

Select and joins some useful fields from the FERC Form 1 steam table.

Select the FERC Form 1 steam plant table entries, add in the reporting utility’s name, and the PUDL ID for the plant and utility for readability and integration with other tables that have PUDL IDs. Also calculates capacity_factor (based on net_generation_mwh & capacity_mw)

Parameters:

_out_ferc1__yearly_plants_utilities – Denormalized dataframe of FERC Form 1 plants and utilities data.
_out_ferc1__yearly_steam_plants_sched402_with_plant_ids – The FERC Form 1 steam table with imputed plant IDs to group plants across report years.

Returns:

A DataFrame containing useful fields from the FERC Form 1 steam table.

pudl.output.ferc1.out_ferc1__yearly_small_plants_sched410(core_ferc1__yearly_small_plants_sched410: pandas.DataFrame, _out_ferc1__yearly_plants_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe related to the FERC Form 1 small plants.

pudl.output.ferc1.out_ferc1__yearly_hydroelectric_plants_sched406(core_ferc1__yearly_hydroelectric_plants_sched406: pandas.DataFrame, _out_ferc1__yearly_plants_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe related to the FERC Form 1 hydro plants.

pudl.output.ferc1.out_ferc1__yearly_pumped_storage_plants_sched408(core_ferc1__yearly_pumped_storage_plants_sched408: pandas.DataFrame, _out_ferc1__yearly_plants_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a dataframe of FERC Form 1 Pumped Storage plant data.

pudl.output.ferc1.out_ferc1__yearly_steam_plants_fuel_sched402(core_ferc1__yearly_steam_plants_fuel_sched402: pandas.DataFrame, _out_ferc1__yearly_plants_utilities: pandas.DataFrame) → pandas.DataFrame[source]#

Pull a useful dataframe related to FERC Form 1 fuel information.

This function pulls the FERC Form 1 fuel data, and joins in the name of the reporting utility, as well as the PUDL IDs for that utility and the plant, allowing integration with other PUDL tables. Useful derived values include:

fuel_consumed_mmbtu (total fuel heat content consumed)
fuel_consumed_total_cost (total cost of that fuel)

pudl.output.ferc1.out_ferc1__yearly_purchased_power_and_exchanges_sched326(core_ferc1__yearly_purchased_power_and_exchanges_sched326: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_plant_in_service_sched204(core_ferc1__yearly_plant_in_service_sched204: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a dataframe of FERC Form 1 Electric Plant in Service data.

pudl.output.ferc1.out_ferc1__yearly_balance_sheet_assets_sched110(core_ferc1__yearly_balance_sheet_assets_sched110: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 balance sheet assets data.

pudl.output.ferc1.out_ferc1__yearly_balance_sheet_liabilities_sched110(core_ferc1__yearly_balance_sheet_liabilities_sched110: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 balance_sheet liabilities data.

pudl.output.ferc1.out_ferc1__yearly_cash_flows_sched120(core_ferc1__yearly_cash_flows_sched120: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 cash flow data.

pudl.output.ferc1.out_ferc1__yearly_depreciation_summary_sched336(core_ferc1__yearly_depreciation_summary_sched336: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 depreciation amortization data.

pudl.output.ferc1.out_ferc1__yearly_energy_dispositions_sched401(core_ferc1__yearly_energy_dispositions_sched401: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 energy dispositions data.

pudl.output.ferc1.out_ferc1__yearly_energy_sources_sched401(core_ferc1__yearly_energy_sources_sched401: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_operating_expenses_sched320(core_ferc1__yearly_operating_expenses_sched320: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_operating_revenues_sched300(core_ferc1__yearly_operating_revenues_sched300: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_depreciation_changes_sched219(core_ferc1__yearly_depreciation_changes_sched219: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_depreciation_by_function_sched219(core_ferc1__yearly_depreciation_by_function_sched219: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_sales_by_rate_schedules_sched304(core_ferc1__yearly_sales_by_rate_schedules_sched304: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_income_statements_sched114(core_ferc1__yearly_income_statements_sched114: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_other_regulatory_liabilities_sched278(core_ferc1__yearly_other_regulatory_liabilities_sched278: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_retained_earnings_sched118(core_ferc1__yearly_retained_earnings_sched118: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_transmission_lines_sched422(core_ferc1__yearly_transmission_lines_sched422: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_utility_plant_summary_sched200(core_ferc1__yearly_utility_plant_summary_sched200: pandas.DataFrame, core_pudl__assn_ferc1_pudl_utilities: pandas.DataFrame) → pandas.DataFrame[source]#: Pull a useful dataframe of FERC Form 1 Purchased Power data.

pudl.output.ferc1.out_ferc1__yearly_all_plants(out_ferc1__yearly_steam_plants_sched402: pandas.DataFrame, out_ferc1__yearly_small_plants_sched410: pandas.DataFrame, out_ferc1__yearly_hydroelectric_plants_sched406: pandas.DataFrame, out_ferc1__yearly_pumped_storage_plants_sched408: pandas.DataFrame) → pandas.DataFrame[source]#

Combine the steam, small generators, hydro, and pumped storage tables.

While this table may have many purposes, the main one is to prepare it for integration with the EIA Master Unit List (MUL). All subtables included in this output table must have pudl ids. Table prepping involves ensuring that the individual tables can merge correctly (like columns have the same name) both with each other and the EIA MUL.

pudl.output.ferc1.out_ferc1__yearly_steam_plants_fuel_by_plant_sched402(context, core_ferc1__yearly_steam_plants_fuel_sched402: pandas.DataFrame, _out_ferc1__yearly_plants_utilities: pandas.DataFrame) → pandas.DataFrame[source]#

Summarize FERC fuel data by plant for output.

This is mostly a wrapper around pudl.analysis.record_linkage.classify_plants_ferc1.fuel_by_plant_ferc1() which calculates some summary values on a per-plant basis (as indicated by utility_id_ferc1 and plant_name_ferc1) related to fuel consumption.

Parameters:

context – Dagster context object
core_ferc1__yearly_steam_plants_fuel_sched402 – Normalized FERC fuel table.
_out_ferc1__yearly_plants_utilities – Denormalized table of FERC1 plant & utility IDs.

Returns:

A DataFrame with fuel use summarized by plant.

pudl.output.ferc1.calc_annual_capital_additions_ferc1(steam_df: pandas.DataFrame, window: int = 3) → pandas.DataFrame[source]#

Calculate annual capital additions for FERC1 steam records.

Convert the capex_total column into annual capital additons the capex_total column is the cumulative capital poured into the plant over time. This function takes the annual difference should generate the annual capial additions. It also want generates a rolling average, to smooth out the big annual fluxuations.

Parameters:

steam_df – result of prep_plants_ferc()
window – number of years for window to generate rolling average. Argument for pudl.helpers.generate_rolling_avg()

Returns:

capex_annual_addition and capex_annual_addition_rolling.

Return type:

Augemented version of steam_df with two additional columns

pudl.output.ferc1.add_mean_cap_additions(steam_df)[source]#: Add mean capital additions over lifetime of plant.

class pudl.output.ferc1.NodeId[source]#

Bases: NamedTuple

The primary keys which identify a node in a calculation tree.

Since NodeId is just a NamedTuple a list of NodeId instances can also be used to index into a :class:pandas.DataFrame` that uses these fields as its index. This is convenient since many networkx functions and methods return iterable containers of graph nodes, which can be turned into lists and used directly to index into dataframes.

The additional dimensions (utility_type, plant_status, and plant_function) each have a small number of allowable values, which we could further impose as constraints on the values here using Pydantic if we wanted.

table_name: str[source]#

xbrl_factoid: str[source]#

utility_type: str | pandas._libs.missing.NAType[source]#

plant_status: str | pandas._libs.missing.NAType[source]#

plant_function: str | pandas._libs.missing.NAType[source]#

class pudl.output.ferc1.OffByFactoid[source]#

Bases: NamedTuple

A calculated factoid which is off by one other factoid.

A factoid where a sizeable majority of utilities are using a non-standard and non-reported calculation to generate it. These calculated factoids are either missing one factoid, or include an additional factoid not included in the FERC metadata. Thus, the calculations are ‘off by’ this factoid.

table_name: str[source]#

xbrl_factoid: str[source]#

utility_type: str | pandas._libs.missing.NAType[source]#

plant_status: str | pandas._libs.missing.NAType[source]#

plant_function: str | pandas._libs.missing.NAType[source]#

table_name_off_by: str[source]#

xbrl_factoid_off_by: str[source]#

utility_type_off_by: str | pandas._libs.missing.NAType[source]#

plant_status_off_by: str | pandas._libs.missing.NAType[source]#

plant_function_off_by: str | pandas._libs.missing.NAType[source]#

pudl.output.ferc1._out_ferc1__detailed_tags(_core_ferc1__table_dimensions) → pandas.DataFrame[source]#: Grab the stored tables of tags and add inferred dimension.

pudl.output.ferc1._get_tags(file_name: str, _core_ferc1__table_dimensions: pandas.DataFrame) → pandas.DataFrame[source]#: Grab tags from a stored CSV file and apply make_xbrl_factoid_dimensions_explicit().

pudl.output.ferc1._aggregatable_dimension_tags(_core_ferc1__table_dimensions: pandas.DataFrame, dimension: Literal[plant_status, plant_function]) → pandas.DataFrame[source]#

pudl.output.ferc1.exploded_table_asset_factory(root_table: str, table_names: list[str], seed_nodes: list[NodeId], group_metric_checks: pudl.transform.ferc1.GroupMetricChecks, off_by_facts: list[OffByFactoid], io_manager_key: str | None = None) → dagster.AssetsDefinition[source]#: Create an exploded table based on a set of related input tables.

pudl.output.ferc1.EXPLOSION_ARGS[source]#

pudl.output.ferc1.create_exploded_table_assets() → list[dagster.AssetsDefinition][source]#

Create a list of exploded FERC Form 1 assets.

Returns:: A list of AssetsDefinitions where each asset is an exploded FERC Form 1 table.

pudl.output.ferc1.exploded_ferc1_assets[source]#

class pudl.output.ferc1.Exploder(table_names: list[str], root_table: str, metadata_xbrl_ferc1: pandas.DataFrame, calculation_components_xbrl_ferc1: pandas.DataFrame, seed_nodes: list[NodeId], tags: pandas.DataFrame = pd.DataFrame(), group_metric_checks: pudl.transform.ferc1.GroupMetricChecks = GroupMetricChecks(), off_by_facts: list[OffByFactoid] = None)[source]#

Get unique, granular datapoints from a set of related, nested FERC1 tables.

property calc_idx: list[str][source]#: Primary key columns for calculations in this explosion.

exploded_calcs()[source]#

Remove any calculation components that aren’t relevant to the explosion.

At the end of this process several things should be true:

Only parents with table_name in the explosion tables should be retained.
All calculations in which any components were outside of the tables in the explosion should be turned into leaves – i.e. they should be replaced with a single calculation component filled with NA values.
Every remaining calculation component must also appear as a parent (if it is a leaf, then it will have a single null calculation component)
There should be no records where only one of table_name or xbrl_factoid are null. They should either both or neither be null.
table_name_parent and xbrl_factoid_parent should be non-null.

add_sizable_minority_corrections_to_calcs(exploded_calcs: pandas.DataFrame) → pandas.DataFrame[source]#: Add correction calculation records for the sizable fuck up utilities.

exploded_meta() → pandas.DataFrame[source]#

Combine a set of interrelated table’s metatada for use in Exploder.

Any calculations containing components that are part of tables outside the set of exploded tables will be converted to reported values with an empty calculation. Then we verify that all referenced calculation components actually appear as their own records within the concatenated metadata dataframe.

calculation_forest() → XbrlCalculationForestFerc1[source]#: Construct a calculation forest based on class attributes.

dimensions() → list[str][source]#: Get all of the column names for the other dimensions.

exploded_pks() → list[str][source]#: Get the joint primary keys of the exploded tables.

value_col() → str[source]#: Get the value column for the exploded tables.

prep_table_to_explode(table_name: str, table_df: pandas.DataFrame) → pandas.DataFrame[source]#: Assign table name and rename factoid column in preparation for explosion.

boom(tables_to_explode: dict[str, pandas.DataFrame]) → pandas.DataFrame[source]#

Explode a set of nested tables.

There are five main stages of this process:

Prep all of the individual tables for explosion.
Concatenate all of the tabels together.
Remove duplication in the concatenated exploded table.
Annotate the fine-grained data with additional metadata.
Validate that calculated top-level values are correct. (not implemented)

Parameters:: tables_to_explode – dictionary of table name (key) to transfomed table (value).

initial_explosion_concatenation(tables_to_explode: dict[str, pandas.DataFrame]) → pandas.DataFrame[source]#

Concatenate all of the tables for the explosion.

Merge in some basic pieces of the each table’s metadata and add table_name. At this point in the explosion, there will be a lot of duplicaiton in the output.

calculate_intertable_non_total_calculations(exploded: pandas.DataFrame) → pandas.DataFrame[source]#: Calculate the inter-table non-total calculable xbrl_factoids.

add_sizable_minority_corrections(calculated_df: pandas.DataFrame) → pandas.DataFrame[source]#

Identify and fix the utilities that report calcs off-by one other fact.

We noticed that there are a sizable minority of utilities that report some calculated values with a different set of child subcomponents. Our tooling for these calculation expects all utilities to report in the same manner. So we have identified the handful of worst calculable xbrl_factiod offenders self.off_by_facts. This method identifies data corrections for those self.off_by_facts and adds them into the exploded data table.

The data corrections are identified by calculating the absolute difference between the reported value and calculable value from the standard set of subcomponents (via pudl.transform.ferc1.calculate_values_from_components()) and finding the child factiods that have the same value as the absolute difference. This indicates that the calculable parent factiod is off by that cooresponding child fact.

Relatedly, add_sizable_minority_corrections_to_calcs() adds these self.off_by_facts to self.exploded_calcs.

reconcile_intertable_calculations(exploded: pandas.DataFrame) → pandas.DataFrame[source]#

Generate calculated values for inter-table calculated factoids.

This function sums components of calculations for a given factoid when the components originate entirely or partially outside of the table. It also accounts for components that only sum to a factoid within a particular dimension (e.g., for an electric utility or for plants whose plant_function is “in_service”). This returns a dataframe with a “calculated_value” column.

Parameters:: exploded – concatenated tables for table explosion.

pudl.output.ferc1.in_explosion_tables(table_name: str, in_explosion_table_names: list[str]) → bool[source]#

Determine if any of a list of table_names in the list of thre explosion tables.

Parameters:

table_name – tables name. Typically from the source_tables element from an xbrl calculation component
in_explosion_table_names – list of tables involved in a particular set of exploded tables.

class pudl.output.ferc1.XbrlCalculationForestFerc1(/, **data: Any)[source]#

Bases: pydantic.BaseModel

A class for manipulating groups of hierarchically nested XBRL calculations.

We expect that the facts reported in high-level FERC tables like core_ferc1__yearly_income_statements_sched114 and core_ferc1__yearly_balance_sheet_assets_sched110 should be calculable from many individually reported granular values, based on the calculations encoded in the XBRL Metadata, and that these relationships should have a hierarchical tree structure. Several individual values from the higher level tables will appear as root nodes at the top of each hierarchy, and the leaves in the underlying tree structure are the individually reported non-calculated values that make them up. Because the top-level tables have several distinct values in them, composed of disjunct sets of reported values, we have a forest (a group of several trees) rather than a single tree.

The information required to build a calculation forest is most readily found in the Exploder.exploded_calcs() A list of seed nodes can also be supplied, indicating which nodes must be present in the resulting forest. This can be used to prune irrelevant portions of the overall forest out of the exploded metadata. If no seeds are provided, then all of the nodes referenced in the exploded_calcs input dataframe will be used as seeds.

This class makes heavy use of networkx to manage the graph that we build from calculation relationships.

property parent_cols: list[str][source]#: Construct parent_cols based on the provided calc_cols.

calc_cols: list[str][source]#

exploded_calcs: pandas.DataFrame[source]#

seeds: list[NodeId] = [][source]#

tags: pandas.DataFrame[source]#

group_metric_checks: pudl.transform.ferc1.GroupMetricChecks[source]#

model_config[source]#

unique_associations()[source]#: Ensure parent-child associations in exploded calculations are unique.

calcs_have_required_cols()[source]#: Ensure exploded calculations include all required columns.

calc_parents_notna()[source]#: Ensure that parent table_name and xbrl_factoid columns are non-null.

classmethod tags_have_required_cols(v: pandas.DataFrame, info: pydantic.ValidationInfo) → pandas.DataFrame[source]#: Ensure tagging dataframe contains all required index columns.

classmethod tags_cols_notnull(v: pandas.DataFrame) → pandas.DataFrame[source]#: Ensure all tags have non-null table_name and xbrl_factoid.

classmethod single_valued_tags(v: pandas.DataFrame, info: pydantic.ValidationInfo) → pandas.DataFrame[source]#: Ensure all tags have unique values.

seeds_within_bounds()[source]#

Ensure that all seeds are present within exploded_calcs index.

For some reason this validator is being run before exploded_calcs has been added to the values dictionary, which doesn’t make sense, since “seeds” is defined after exploded_calcs in the model.

exploded_calcs_to_digraph(exploded_calcs: pandas.DataFrame) → networkx.DiGraph[source]#

Construct networkx.DiGraph of all calculations in exploded_calcs.

First we construct a directed graph based on the calculation components. The “parent” or “source” nodes are the results of the calculations, and the “child” or “target” nodes are the individual calculation components. The structure of the directed graph is determined entirely by the primary key columns in the calculation components table.

Then we compile a dictionary of node attributes, based on the individual calculation components in the exploded calcs dataframe.

node_attrs() → dict[NodeId, dict[str, dict[str, str]]][source]#

Construct a dictionary of node attributes for application to the forest.

Note attributes consist of the manually assigned tags.

edge_attrs() → dict[Any, Any][source]#

Construct a dictionary of edge attributes for application to the forest.

The only edge attribute is the calculation component weight.

annotated_forest() → networkx.DiGraph[source]#

Annotate the calculation forest with node calculation weights and tags.

The annotated forest should have exactly the same structure as the forest, but with additional data associated with each of the nodes. This method also does some error checking to try and ensure that the weights and tags that are being associated with the forest are internally self-consistent.

We check whether there are multiple different weights assocated with the same node in the calculation components. There are a few instances where this is expected, but if there a lot of conflicting weights something is probably wrong.

We check whether any of the nodes that were orphaned (never connected to the graph) or that were pruned in the course of enforcing a forest structure had manually assigned tags (e.g. indicating whether they contribute to rate base). If they do, then the final exploded data table may not capture all of the manually assigned metadata, and we either need to edit the metadata, or figure out why those nodes aren’t being included in the final calculation forest.

propagate_node_attributes(annotated_forest: networkx.DiGraph)[source]#

Propagate tags.

Propagate tags leafwards, rootward & to the _correction nodes.

check_lost_tags(lost_nodes: list[NodeId]) → None[source]#: Check whether any of the input lost nodes were also tagged nodes.

static check_conflicting_tags(annotated_forest: networkx.DiGraph) → None[source]#

Check for conflicts between ancestor and descendant tags.

At this point, we have just applied the manually compiled tags to the nodes in the forest, and haven’t yet propagated them down to the leaves. It’s possible that ancestor nodes (closer to the roots) might have tags associated with them that are in conflict with descendant nodes (closer to the leaves). If that’s the case then when we propagate the tags to the leaves, whichever tag is propagated last will end up taking precedence.

These kinds of conflicts are probably due to errors in the tagging metadata, and should be investigated.

full_digraph() → networkx.DiGraph[source]#: A digraph of all calculations described by the exploded metadata.

prune_unrooted(graph: networkx.DiGraph) → networkx.DiGraph[source]#

Prune those parts of the input graph that aren’t reachable from the roots.

Build a table of exploded calculations that includes only those nodes that are part of the input graph, and that are reachable from the roots of the calculation forest. Then use that set of exploded calculations to construct a new graph.

This is complicated by the fact that some nodes may have already been pruned from the input graph, and so when selecting both parent and child nodes from the calculations, we need to make sure that they are present in the input graph, as well as the complete set of calculation components.

seeded_digraph() → networkx.DiGraph[source]#

A digraph of all calculations that contribute to the seed values.

Prune the full digraph to contain only those nodes in the full_digraph() that are descendants of the seed nodes – i.e. that are reachable along the directed edges, and thus contribute to the values reported to the XBRL facts associated with the seed nodes.

We compile a list of all the NodeId values that should be included in the pruned graph, and then use that list to select a subset of the exploded metadata to pass to exploded_calcs_to_digraph(), so that all of the associated metadata is also added to the pruned graph.

forest() → networkx.DiGraph[source]#

A pruned version of the seeded digraph that should be one or more trees.

This method contains any special logic that’s required to convert the seeded_digraph() into a collection of trees. The main issue we currently have to deal with is passthrough calculations that we’ve added to avoid having duplicated calculations in the graph.

In practice this method will probably return a single tree rather than a forest, but a forest with several root nodes might also be appropriate, since the root table may or may not have a top level summary value that includes all underlying calculated values of interest.

static roots(graph: networkx.DiGraph) → list[NodeId][source]#: Identify all root nodes in a digraph.

full_digraph_roots() → list[NodeId][source]#: Find all roots in the full digraph described by the exploded metadata.

seeded_digraph_roots() → list[NodeId][source]#: Find all roots in the seeded digraph.

forest_roots() → list[NodeId][source]#: Find all roots in the pruned calculation forest.

static leaves(graph: networkx.DiGraph) → list[NodeId][source]#: Identify all leaf nodes in a digraph.

full_digraph_leaves() → list[NodeId][source]#: All leaf nodes in the full digraph.

seeded_digraph_leaves() → list[NodeId][source]#: All leaf nodes in the seeded digraph.

forest_leaves() → list[NodeId][source]#: All leaf nodes in the pruned forest.

orphans() → list[NodeId][source]#

Identify all nodes that appear in the exploded_calcs but not in the full digraph.

Because we removed the metadata and are now building the tree entirely based on the exploded_calcs, this should now never produce any orphans and is a bit redundant.

pruned() → list[NodeId][source]#: List of all nodes that appear in the DAG but not in the pruned forest.

stepchildren(graph: networkx.DiGraph) → list[NodeId][source]#: Find all nodes in the graph that have more than one parent.

stepparents(graph: networkx.DiGraph) → list[NodeId][source]#: Find all nodes in the graph with children having more than one parent.

_get_path_weight(path: list[NodeId], graph: networkx.DiGraph) → float[source]#: Multiply all weights along a path together.

leafy_meta() → pandas.DataFrame[source]#

Identify leaf facts and compile their metadata.

identify the root and leaf nodes of those minimal trees
adjust the weights associated with the leaf nodes to equal the product of the weights of all their ancestors.
Set leaf node tags to be the union of all the tags associated with all of their ancestors.

Leafy metadata in the output dataframe includes:

The ID of the leaf node itself (this is the index).
The ID of the root node the leaf is descended from.
What tags the leaf has inherited from its ancestors.
The leaf node’s xbrl_factoid_original
The weight associated with the leaf, in relation to its root.

root_calculations() → pandas.DataFrame[source]#

Produce a calculation components dataframe containing only roots and leaves.

This dataframe has a format similar to exploded_calcs and can be used with the exploded data to verify that the root values can still be correctly calculated from the leaf values.

table_names() → list[str][source]#: Produce the list of tables involved in this explosion.

plot_graph(graph: networkx.DiGraph) → None[source]#: Visualize a CalculationForest graph.

plot_nodes(nodes: list[NodeId]) → None[source]#: Plot a list of nodes based on edges found in exploded_calcs.

plot(graph: Literal[full_digraph, seeded_digraph, forest]) → None[source]#: Visualize various stages of the calculation forest.

leafy_data(exploded_data: pandas.DataFrame, value_col: str) → pandas.DataFrame[source]#

Use the calculation forest to prune the exploded dataframe.

Drop all rows that don’t correspond to either root or leaf facts.
Verify that the reported root values can still be generated by calculations that only refer to leaf values. (not yet implemented)
Merge the leafy metadata onto the exploded data, keeping only those rows which refer to the leaf facts.
Use the leaf weights to adjust the reported data values.

TODO: This method could either live here, or in the Exploder class, which would also have access to exploded_meta, exploded_data, and the calculation forest. - Still need to validate the root node calculations.

forest_as_table() → pandas.DataFrame[source]#

Construct a tabular representation of the calculation forest.

Each generation of nodes, starting with the root(s) of the calculation forest, make up a set of columns in the table. Each set of columns is merged onto

_add_layers_to_forest_as_table(df: pandas.DataFrame) → pandas.DataFrame[source]#

Recursively add additional layers of nodes from the forest to the table.

Given a dataframe with one or more set of columns with names corresponding to the components of a NodeId with suffixes of the form _layerN, identify the children of the nodes in the set of columns with the largest N, and merge them onto the table, recursively until there are no more children to add. Creating a tabular representation of the calculation forest that can be inspected in Excel.

Include inter-layer calculation weights and tags associated with the nodes pre propagation.

pudl.output.ferc1.nodes_to_df(calc_forest: networkx.DiGraph, nodes: list[NodeId]) → pandas.DataFrame[source]#

Construct a dataframe from a list of nodes, including their annotations.

NodeIds that are not present in the calculation forest will be ignored.

Parameters:

calc_forest – A calculation forest made of nodes with “weight” and “tags” data.
nodes – List of NodeId values to extract from the calculation forest.

Returns:

A tabular dataframe representation of the nodes, including their tags, extracted from the calculation forest.

pudl.output.ferc1._propagate_tags_leafward(annotated_forest: networkx.DiGraph, leafward_inherited_tags: list[str]) → networkx.DiGraph[source]#

Push a parent’s tags down to its descendants.

Only push the leafward_inherited_tags - others will be left alone.

pudl.output.ferc1._propagate_tag_rootward(annotated_forest: networkx.DiGraph, tag_name: Literal[in_rate_base]) → networkx.DiGraph[source]#

Set the tag for nodes when all of its children have same tag.

This function returns the value of a tag, but also sets node attributes down the tree when all children of a node share the same tag.

pudl.output.ferc1._propagate_tags_to_corrections(annotated_forest: networkx.DiGraph) → networkx.DiGraph[source]#

pudl.output.ferc1.check_tag_propagation_compared_to_compiled_tags(df: pandas.DataFrame, propagated_tag: Literal[in_rate_base], _out_ferc1__explosion_tags: pandas.DataFrame)[source]#

Check if tags got propagated.

Parameters:

df – table to check. This should be either the out_ferc1__yearly_rate_base(), exploded_balance_sheet_assets_ferc1 or exploded_balance_sheet_liabilities_ferc1. The exploded_income_statement_ferc1 table does not currently have propagated tags.
propagated_tag – name of tag. Currently in_rate_base is the only propagated tag.
_out_ferc1__explosion_tags – mannually compiled tags. This table includes tags from many of the explosion tables so we will filter it before checking if the tag was propagated.

Raises:

AssertionError – If there are more mannually compiled tags for the xbrl_factoids in df than found in _out_ferc1__explosion_tags.
AssertionError – If there are more mannually compiled tags for the correction xbrl_factoids in df than found in _out_ferc1__explosion_tags.

pudl.output.ferc1.check_for_correction_xbrl_factoids_with_tag(df: pandas.DataFrame, propagated_tag: Literal[in_rate_base])[source]#

Check if any correction records have tags.

Parameters:

df – table to check. This should be either the out_ferc1__yearly_rate_base(), exploded_balance_sheet_assets_ferc1 or exploded_balance_sheet_liabilities_ferc1. The exploded_income_statement_ferc1 table does not currently have propagated tags.
propagated_tag – name of tag. Currently in_rate_base is the only propagated tag.

Raises:

AssertionError – If there are zero correction xbrl_factoids in df with tags.

pudl.output.ferc1.out_ferc1__yearly_rate_base(_out_ferc1__detailed_balance_sheet_assets: pandas.DataFrame, _out_ferc1__detailed_balance_sheet_liabilities: pandas.DataFrame, core_ferc1__yearly_operating_expenses_sched320: pandas.DataFrame, _out_ferc1__detailed_tags: pandas.DataFrame) → pandas.DataFrame[source]#

Make a table of granular utility rate-base data.

This table contains granular data consisting of what utilities can include in their rate bases. This information comes from two core inputs: exploded_balance_sheet_assets_ferc1 and exploded_balance_sheet_liabilities_ferc1. These tables include granular data from the nested calculations that are build into the accounting tables. See Exploder for more details.

This rate base table also contains one specific addition from core_ferc1__yearly_operating_expenses_sched320. In standard ratemaking processes, utilities are enabled to include working capital - sometimes referred to as cash on hand or cash reverves. A standard ratemaking process is to consider the available rate-baseable working capital to be one eigth of the average operations and maintenance expense. This function grabs that expense and concatenates it with the rest of the assets and liabilities from the granular exploded data.

pudl.output.ferc1#

Module Contents#

Classes#

Functions#

Attributes#

`pudl.output.ferc1`#