pudl.output.ferc1
#
A collection of denormalized FERC assets and helper functions.
Module Contents#
Classes#
Data quality expectations related to FERC 1 calculations. 

The primary keys which identify a node in a calculation tree. 

Get unique, granular datapoints from a set of related, nested FERC1 tables. 

A class for manipulating groups of hierarchically nested XBRL calculations. 
Functions#

A denormalized table containing FERC plant and utility names and IDs. 

Select and joins some useful fields from the FERC Form 1 steam table. 

Pull a useful dataframe related to the FERC Form 1 small plants. 

Pull a useful dataframe related to the FERC Form 1 hydro plants. 

Pull a dataframe of FERC Form 1 Pumped Storage plant data. 

Pull a useful dataframe related to FERC Form 1 fuel information. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a dataframe of FERC Form 1 Electric Plant in Service data. 

Pull a useful dataframe of FERC Form 1 balance sheet assets data. 

Pull a useful dataframe of FERC Form 1 balance_sheet liabilities data. 

Pull a useful dataframe of FERC Form 1 cash flow data. 
Pull a useful dataframe of FERC Form 1 depreciation amortization data. 

Pull a useful dataframe of FERC Form 1 energy dispositions data. 


Pull a useful dataframe of FERC Form 1 Purchased Power data. 
Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 


Pull a useful dataframe of FERC Form 1 Purchased Power data. 
Pull a useful dataframe of FERC Form 1 Purchased Power data. 


Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Pull a useful dataframe of FERC Form 1 Purchased Power data. 

Combine the steam, small generators, hydro, and pumped storage tables. 

Summarize FERC fuel data by plant for output. 

Calculate annual capital additions for FERC1 steam records. 

Add mean capital additions over lifetime of plant. 

Grab the stored table of tags and add infered dimension. 

Create an exploded table based on a set of related input tables. 
Create a list of exploded FERC Form 1 assets. 


Determine if any of a list of table_names in the list of thre explosion tables. 

Construct a dataframe from a list of nodes, including their annotations. 
Attributes#
 class pudl.output.ferc1.CalculationToleranceFerc1[source]#
Bases:
pydantic.BaseModel
Data quality expectations related to FERC 1 calculations.
We are doing a lot of comparisons between calculated and reported values to identify reporting errors in the data, errors in FERC’s metadata, and bugs in our own code. This class provides a structure for encoding our expectations about the level of acceptable (or at least expected) errors, and allows us to pass them around.
In the future we might also want to specify much more granular expectations, pertaining to individual tables, years, utilities, or facts to ensure that we don’t have low overall error rates, but a problem with the way the data or metadata is reported in a particular year. We could also define perfiling and pertable error tolerances to help us identify individual utilities that have e.g. used an outdated version of Form 1 when filing.
 pudl.output.ferc1.EXPLOSION_CALCULATION_TOLERANCES: dict[str, CalculationToleranceFerc1][source]#
 pudl.output.ferc1.denorm_plants_utilities_ferc1(plants_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
A denormalized table containing FERC plant and utility names and IDs.
 pudl.output.ferc1.denorm_plants_steam_ferc1(denorm_plants_utilities_ferc1: pandas.DataFrame, plants_steam_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Select and joins some useful fields from the FERC Form 1 steam table.
Select the FERC Form 1 steam plant table entries, add in the reporting utility’s name, and the PUDL ID for the plant and utility for readability and integration with other tables that have PUDL IDs. Also calculates
capacity_factor
(based onnet_generation_mwh
&capacity_mw
) Parameters:
denorm_plants_utilities_ferc1 – Denormalized dataframe of FERC Form 1 plants and utilities data.
plants_steam_ferc1 – The normalized FERC Form 1 steam table.
 Returns:
A DataFrame containing useful fields from the FERC Form 1 steam table.
 pudl.output.ferc1.denorm_plants_small_ferc1(plants_small_ferc1: pandas.DataFrame, denorm_plants_utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe related to the FERC Form 1 small plants.
 pudl.output.ferc1.denorm_plants_hydro_ferc1(plants_hydro_ferc1: pandas.DataFrame, denorm_plants_utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe related to the FERC Form 1 hydro plants.
 pudl.output.ferc1.denorm_plants_pumped_storage_ferc1(plants_pumped_storage_ferc1: pandas.DataFrame, denorm_plants_utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a dataframe of FERC Form 1 Pumped Storage plant data.
 pudl.output.ferc1.denorm_fuel_ferc1(fuel_ferc1: pandas.DataFrame, denorm_plants_utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe related to FERC Form 1 fuel information.
This function pulls the FERC Form 1 fuel data, and joins in the name of the reporting utility, as well as the PUDL IDs for that utility and the plant, allowing integration with other PUDL tables. Useful derived values include:
fuel_consumed_mmbtu
(total fuel heat content consumed)fuel_consumed_total_cost
(total cost of that fuel)
 Parameters:
pudl_engine (sqlalchemy.engine.Engine) – Engine for connecting to the PUDL database.
 Returns:
A DataFrame containing useful FERC Form 1 fuel information.
 pudl.output.ferc1.denorm_purchased_power_ferc1(purchased_power_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_plant_in_service_ferc1(plant_in_service_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a dataframe of FERC Form 1 Electric Plant in Service data.
 pudl.output.ferc1.denorm_balance_sheet_assets_ferc1(balance_sheet_assets_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 balance sheet assets data.
 pudl.output.ferc1.denorm_balance_sheet_liabilities_ferc1(balance_sheet_liabilities_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 balance_sheet liabilities data.
 pudl.output.ferc1.denorm_cash_flow_ferc1(cash_flow_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 cash flow data.
 pudl.output.ferc1.denorm_depreciation_amortization_summary_ferc1(depreciation_amortization_summary_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 depreciation amortization data.
 pudl.output.ferc1.denorm_electric_energy_dispositions_ferc1(electric_energy_dispositions_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 energy dispositions data.
 pudl.output.ferc1.denorm_electric_energy_sources_ferc1(electric_energy_sources_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_electric_operating_expenses_ferc1(electric_operating_expenses_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_electric_operating_revenues_ferc1(electric_operating_revenues_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_electric_plant_depreciation_changes_ferc1(electric_plant_depreciation_changes_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_electric_plant_depreciation_functional_ferc1(electric_plant_depreciation_functional_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_electricity_sales_by_rate_schedule_ferc1(electricity_sales_by_rate_schedule_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_income_statement_ferc1(income_statement_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_other_regulatory_liabilities_ferc1(other_regulatory_liabilities_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_retained_earnings_ferc1(retained_earnings_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_transmission_statistics_ferc1(transmission_statistics_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_utility_plant_summary_ferc1(utility_plant_summary_ferc1: pandas.DataFrame, utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Pull a useful dataframe of FERC Form 1 Purchased Power data.
 pudl.output.ferc1.denorm_plants_all_ferc1(denorm_plants_steam_ferc1: pandas.DataFrame, denorm_plants_small_ferc1: pandas.DataFrame, denorm_plants_hydro_ferc1: pandas.DataFrame, denorm_plants_pumped_storage_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Combine the steam, small generators, hydro, and pumped storage tables.
While this table may have many purposes, the main one is to prepare it for integration with the EIA Master Unit List (MUL). All subtables included in this output table must have pudl ids. Table prepping involves ensuring that the individual tables can merge correctly (like columns have the same name) both with each other and the EIA MUL.
 pudl.output.ferc1.denorm_fuel_by_plant_ferc1(context, fuel_ferc1: pandas.DataFrame, denorm_plants_utilities_ferc1: pandas.DataFrame) pandas.DataFrame [source]#
Summarize FERC fuel data by plant for output.
This is mostly a wrapper around
pudl.analysis.classify_plants_ferc1.fuel_by_plant_ferc1()
which calculates some summary values on a perplant basis (as indicated byutility_id_ferc1
andplant_name_ferc1
) related to fuel consumption. Parameters:
context – Dagster context object
fuel_ferc1 – Normalized FERC fuel table.
denorm_plants_utilities_ferc1 – Denormalized table of FERC1 plant & utility IDs.
 Returns:
A DataFrame with fuel use summarized by plant.
 pudl.output.ferc1.calc_annual_capital_additions_ferc1(steam_df: pandas.DataFrame, window: int = 3) pandas.DataFrame [source]#
Calculate annual capital additions for FERC1 steam records.
Convert the capex_total column into annual capital additons the capex_total column is the cumulative capital poured into the plant over time. This function takes the annual difference should generate the annual capial additions. It also want generates a rolling average, to smooth out the big annual fluxuations.
 Parameters:
steam_df – result of prep_plants_ferc()
window – number of years for window to generate rolling average. Argument for
pudl.helpers.generate_rolling_avg()
 Returns:
capex_annual_addition
andcapex_annual_addition_rolling
. Return type:
Augemented version of steam_df with two additional columns
 pudl.output.ferc1.add_mean_cap_additions(steam_df)[source]#
Add mean capital additions over lifetime of plant.
 class pudl.output.ferc1.NodeId[source]#
Bases:
NamedTuple
The primary keys which identify a node in a calculation tree.
Since NodeId is just a
NamedTuple
a list of NodeId instances can also be used to index into a :class:pandas.DataFrame` that uses these fields as its index. This is convenient since manynetworkx
functions and methods return iterable containers of graph nodes, which can be turned into lists and used directly to index into dataframes.The additional dimensions (
utility_type
,plant_status
, andplant_function
) each have a small number of allowable values, which we could further impose as constraints on the values here using Pydantic if we wanted.
 pudl.output.ferc1._out_ferc1__explosion_tags(table_dimensions_ferc1) pandas.DataFrame [source]#
Grab the stored table of tags and add infered dimension.
 pudl.output.ferc1.exploded_table_asset_factory(root_table: str, table_names_to_explode: list[str], seed_nodes: list[NodeId], calculation_tolerance: CalculationToleranceFerc1, io_manager_key: str  None = None) dagster.AssetsDefinition [source]#
Create an exploded table based on a set of related input tables.
 pudl.output.ferc1.create_exploded_table_assets() list[dagster.AssetsDefinition] [source]#
Create a list of exploded FERC Form 1 assets.
 Returns:
A list of
AssetsDefinitions
where each asset is an exploded FERC Form 1 table.
 class pudl.output.ferc1.Exploder(table_names: list[str], root_table: str, metadata_xbrl_ferc1: pandas.DataFrame, calculation_components_xbrl_ferc1: pandas.DataFrame, seed_nodes: list[NodeId], tags: pandas.DataFrame = pd.DataFrame(), calculation_tolerance: CalculationToleranceFerc1 = CalculationToleranceFerc1())[source]#
Get unique, granular datapoints from a set of related, nested FERC1 tables.
 exploded_calcs()[source]#
Remove any calculation components that aren’t relevant to the explosion.
At the end of this process several things should be true:
Only parents with table_name in the explosion tables should be retained.
All calculations in which any components were outside of the tables in the explosion should be turned into leaves – i.e. they should be replaced with a single calculation component filled with NA values.
Every remaining calculation component must also appear as a parent (if it is a leaf, then it will have a single null calculation component)
There should be no records where only one of table_name or xbrl_factoid are null. They should either both or neither be null.
table_name_parent and xbrl_factoid_parent should be nonnull.
 exploded_meta() pandas.DataFrame [source]#
Combine a set of interrelated table’s metatada for use in
Exploder
.Any calculations containing components that are part of tables outside the set of exploded tables will be converted to reported values with an empty calculation. Then we verify that all referenced calculation components actually appear as their own records within the concatenated metadata dataframe.
 calculation_forest() XbrlCalculationForestFerc1 [source]#
Construct a calculation forest based on class attributes.
 boom(tables_to_explode: dict[str, pandas.DataFrame]) pandas.DataFrame [source]#
Explode a set of nested tables.
There are five main stages of this process:
Prep all of the individual tables for explosion.
Concatenate all of the tabels together.
Remove duplication in the concatenated exploded table.
Annotate the finegrained data with additional metadata.
Validate that calculated toplevel values are correct. (not implemented)
 Parameters:
tables_to_explode – dictionary of table name (key) to transfomed table (value).
calculation_tolerance – What proportion (01) of calculated values are allowed to be incorrect without raising an AssertionError.
 initial_explosion_concatenation(tables_to_explode: dict[str, pandas.DataFrame]) pandas.DataFrame [source]#
Concatenate all of the tables for the explosion.
Merge in some basic pieces of the each table’s metadata and add
table_name
. At this point in the explosion, there will be a lot of duplicaiton in the output.
 generate_intertable_calculations(exploded: pandas.DataFrame) pandas.DataFrame [source]#
Generate calculated values for intertable calculated factoids.
This function sums components of calculations for a given factoid when the components originate entirely or partially outside of the table. It also accounts for components that only sum to a factoid within a particular dimension (e.g., for an electric utility or for plants whose plant_function is “in_service”). This returns a dataframe with a “calculated_amount” column.
 Parameters:
exploded – concatenated tables for table explosion.
 pudl.output.ferc1.in_explosion_tables(table_name: str, in_explosion_table_names: list[str]) bool [source]#
Determine if any of a list of table_names in the list of thre explosion tables.
 Parameters:
table_name – tables name. Typically from the
source_tables
element from an xbrl calculation componentin_explosion_table_names – list of tables involved in a particular set of exploded tables.
 class pudl.output.ferc1.XbrlCalculationForestFerc1[source]#
Bases:
pydantic.BaseModel
A class for manipulating groups of hierarchically nested XBRL calculations.
We expect that the facts reported in highlevel FERC tables like income_statement_ferc1 and balance_sheet_assets_ferc1 should be calculable from many individually reported granular values, based on the calculations encoded in the XBRL Metadata, and that these relationships should have a hierarchical tree structure. Several individual values from the higher level tables will appear as root nodes at the top of each hierarchy, and the leaves in the underlying tree structure are the individually reported noncalculated values that make them up. Because the toplevel tables have several distinct values in them, composed of disjunct sets of reported values, we have a forest (a group of several trees) rather than a single tree.
The information required to build a calculation forest is most readily found in the
Exploder.exploded_calcs()
A list of seed nodes can also be supplied, indicating which nodes must be present in the resulting forest. This can be used to prune irrelevant portions of the overall forest out of the exploded metadata. If no seeds are provided, then all of the nodes referenced in the exploded_calcs input dataframe will be used as seeds.This class makes heavy use of
networkx
to manage the graph that we build from calculation relationships. exploded_meta: pandas.DataFrame[source]#
 exploded_calcs: pandas.DataFrame[source]#
 calculation_tolerance: CalculationToleranceFerc1[source]#
 unique_associations(v: pandas.DataFrame, values) pandas.DataFrame [source]#
Ensure parentchild associations in exploded calculations are unique.
 calcs_have_required_cols(v: pandas.DataFrame, values) pandas.DataFrame [source]#
Ensure exploded calculations include all required columns.
 calc_parents_notna(v: pandas.DataFrame) pandas.DataFrame [source]#
Ensure that parent table_name and xbrl_factoid columns are nonnull.
 tags_have_required_cols(v: pandas.DataFrame, values) pandas.DataFrame [source]#
Ensure tagging dataframe contains all required index columns.
 tags_cols_notnull(v: pandas.DataFrame) pandas.DataFrame [source]#
Ensure all tags have nonnull table_name and xbrl_factoid.
 single_valued_tags(v: pandas.DataFrame, values) pandas.DataFrame [source]#
Ensure all tags have unique values.
 seeds_within_bounds(v: pandas.DataFrame, values) pandas.DataFrame [source]#
Ensure that all seeds are present within exploded_calcs index.
For some reason this validator is being run before exploded_calcs has been added to the values dictionary, which doesn’t make sense, since “seeds” is defined after exploded_calcs in the model.
 exploded_calcs_to_digraph(exploded_calcs: pandas.DataFrame) networkx.DiGraph [source]#
Construct
networkx.DiGraph
of all calculations in exploded_calcs.First we construct a directed graph based on the calculation components. The “parent” or “source” nodes are the results of the calculations, and the “child” or “target” nodes are the individual calculation components. The structure of the directed graph is determined entirely by the primary key columns in the calculation components table.
Then we compile a dictionary of node attributes, based on the individual calculation components in the exploded calcs dataframe.
 node_attrs() dict[NodeId, dict[str, dict[str, str]]] [source]#
Construct a dictionary of node attributes for application to the forest.
Note attributes consist of the manually assigned tags.
 edge_attrs() dict[Any, Any] [source]#
Construct a dictionary of edge attributes for application to the forest.
The only edge attribute is the calculation component weight.
 annotated_forest() networkx.DiGraph [source]#
Annotate the calculation forest with node calculation weights and tags.
The annotated forest should have exactly the same structure as the forest, but with additional data associated with each of the nodes. This method also does some error checking to try and ensure that the weights and tags that are being associated with the forest are internally selfconsistent.
We check whether there are multiple different weights assocated with the same node in the calculation components. There are a few instances where this is expected, but if there a lot of conflicting weights something is probably wrong.
We check whether any of the nodes that were orphaned (never connected to the graph) or that were pruned in the course of enforcing a forest structure had manually assigned tags (e.g. indicating whether they contribute to rate base). If they do, then the final exploded data table may not capture all of the manually assigned metadata, and we either need to edit the metadata, or figure out why those nodes aren’t being included in the final calculation forest.
 check_lost_tags(lost_nodes: list[NodeId]) None [source]#
Check whether any of the input lost nodes were also tagged nodes.
 static check_conflicting_tags(annotated_forest: networkx.DiGraph) None [source]#
Check for conflicts between ancestor and descendant tags.
At this point, we have just applied the manually compiled tags to the nodes in the forest, and haven’t yet propagated them down to the leaves. It’s possible that ancestor nodes (closer to the roots) might have tags associated with them that are in conflict with descendant nodes (closer to the leaves). If that’s the case then when we propagate the tags to the leaves, whichever tag is propagated last will end up taking precedence.
These kinds of conflicts are probably due to errors in the tagging metadata, and should be investigated.
 full_digraph() networkx.DiGraph [source]#
A digraph of all calculations described by the exploded metadata.
 prune_unrooted(graph: networkx.DiGraph) networkx.DiGraph [source]#
Prune those parts of the input graph that aren’t reachable from the roots.
Build a table of exploded calculations that includes only those nodes that are part of the input graph, and that are reachable from the roots of the calculation forest. Then use that set of exploded calculations to construct a new graph.
This is complicated by the fact that some nodes may have already been pruned from the input graph, and so when selecting both parent and child nodes from the calculations, we need to make sure that they are present in the input graph, as well as the complete set of calculation components.
 seeded_digraph() networkx.DiGraph [source]#
A digraph of all calculations that contribute to the seed values.
Prune the full digraph to contain only those nodes in the
full_digraph()
that are descendants of the seed nodes – i.e. that are reachable along the directed edges, and thus contribute to the values reported to the XBRL facts associated with the seed nodes.We compile a list of all the
NodeId
values that should be included in the pruned graph, and then use that list to select a subset of the exploded metadata to pass toexploded_meta_to_digraph()
, so that all of the associated metadata is also added to the pruned graph.
 forest() networkx.DiGraph [source]#
A pruned version of the seeded digraph that should be one or more trees.
This method contains any special logic that’s required to convert the
seeded_digraph()
into a collection of trees. The main issue we currently have to deal with is passthrough calculations that we’ve added to avoid having duplicated calculations in the graph.In practice this method will probably return a single tree rather than a forest, but a forest with several root nodes might also be appropriate, since the root table may or may not have a top level summary value that includes all underlying calculated values of interest.
 static roots(graph: networkx.DiGraph) list[NodeId] [source]#
Identify all root nodes in a digraph.
 full_digraph_roots() list[NodeId] [source]#
Find all roots in the full digraph described by the exploded metadata.
 static leaves(graph: networkx.DiGraph) list[NodeId] [source]#
Identify all leaf nodes in a digraph.
 orphans() list[NodeId] [source]#
Identify all nodes that appear in metadata but not in the full digraph.
 pruned() list[NodeId] [source]#
List of all nodes that appear in the DAG but not in the pruned forest.
 stepchildren(graph: networkx.DiGraph) list[NodeId] [source]#
Find all nodes in the graph that have more than one parent.
 stepparents(graph: networkx.DiGraph) list[NodeId] [source]#
Find all nodes in the graph with children having more than one parent.
 leafy_meta() pandas.DataFrame [source]#
Identify leaf facts and compile their metadata.
identify the root and leaf nodes of those minimal trees
adjust the weights associated with the leaf nodes to equal the product of the weights of all their ancestors.
Set leaf node tags to be the union of all the tags associated with all of their ancestors.
Leafy metadata in the output dataframe includes:
The ID of the leaf node itself (this is the index).
The ID of the root node the leaf is descended from.
What tags the leaf has inherited from its ancestors.
The leaf node’s xbrl_factoid_original
The weight associated with the leaf, in relation to its root.
 root_calculations() pandas.DataFrame [source]#
Produce a calculation components dataframe containing only roots and leaves.
This dataframe has a format similar to exploded_calcs and can be used with the exploded data to verify that the root values can still be correctly calculated from the leaf values.
 plot_graph(graph: networkx.DiGraph) None [source]#
Visualize a CalculationForest graph.
 plot_nodes(nodes: list[NodeId]) None [source]#
Plot a list of nodes based on edges found in exploded_calcs.
 plot(graph: Literal[full_digraph, seeded_digraph, forest]) None [source]#
Visualize various stages of the calculation forest.
 leafy_data(exploded_data: pandas.DataFrame, value_col: str) pandas.DataFrame [source]#
Use the calculation forest to prune the exploded dataframe.
Drop all rows that don’t correspond to either root or leaf facts.
Verify that the reported root values can still be generated by calculations that only refer to leaf values. (not yet implemented)
Merge the leafy metadata onto the exploded data, keeping only those rows which refer to the leaf facts.
Use the leaf weights to adjust the reported data values.
TODO: This method could either live here, or in the Exploder class, which would also have access to exploded_meta, exploded_data, and the calculation forest.
There are a handful of NA values for
report_year
andutility_id_ferc1
because of missing correction records in data. Why are those correction records missing? Should we be doing an inner merge instead of a left merge?Still need to validate the root node calculations.
 forest_as_table() pandas.DataFrame [source]#
Construct a tabular representation of the calculation forest.
Each generation of nodes, starting with the root(s) of the calculation forest, make up a set of columns in the table. Each set of columns is merged onto
 _add_layers_to_forest_as_table(df: pandas.DataFrame) pandas.DataFrame [source]#
Recursively add additional layers of nodes from the forest to the table.
Given a dataframe with one or more set of columns with names corresponding to the components of a NodeId with suffixes of the form _layerN, identify the children of the nodes in the set of columns with the largest N, and merge them onto the table, recursively until there are no more children to add. Creating a tabular representation of the calculation forest that can be inspected in Excel.
Include interlayer calculation weights and tags associated with the nodes pre propagation.
 pudl.output.ferc1.nodes_to_df(calc_forest: networkx.DiGraph, nodes: list[NodeId]) pandas.DataFrame [source]#
Construct a dataframe from a list of nodes, including their annotations.
NodeIds that are not present in the calculation forest will be ignored.
 Parameters:
calc_forest – A calculation forest made of nodes with “weight” and “tags” data.
nodes – List of
NodeId
values to extract from the calculation forest.
 Returns:
A tabular dataframe representation of the nodes, including their tags, extracted from the calculation forest.