pudl.transform.ferc¶
Module for shared helpers for FERC Form transforms.
Attributes¶
Functions¶
|
Take the latest reported non-null value for each group. |
|
Take the row that has most non-null values out of each group. |
|
Compare deduplication methodologies. |
|
Get most updated values for each XBRL context. |
|
Get the primary key for a raw XBRL table from the XBRL datapackage. |
Module Contents¶
- pudl.transform.ferc.__apply_diffs(duped_groups: pandas.core.groupby.DataFrameGroupBy) pandas.DataFrame [source]¶
Take the latest reported non-null value for each group.
- pudl.transform.ferc.__best_snapshot(duped_groups: pandas.core.groupby.DataFrameGroupBy) pandas.DataFrame [source]¶
Take the row that has most non-null values out of each group.
- pudl.transform.ferc.__compare_dedupe_methodologies(applied_diffs: pandas.DataFrame, best_snapshot: pandas.DataFrame, xbrl_context_cols: list[str])[source]¶
Compare deduplication methodologies.
By cross-referencing these we can make sure that the apply-diff methodology isn’t doing something unexpected.
The main things we want to keep tabs on are: whether apply-diff is adding more than expected differences compared to best-snapshot and whether or not apply-diff is giving us more values than best-snapshot.
- pudl.transform.ferc.filter_for_freshest_data_xbrl(xbrl_table: pandas.DataFrame, primary_keys, compare_methods: bool = False) pandas.DataFrame [source]¶
Get most updated values for each XBRL context.
An XBRL context includes an entity ID, the time period the data applies to, and other dimensions such as utility type. Each context has its own ID, but they are frequently redefined with the same contents but different IDs - so we identify them by their actual content.
Each row in our SQLite database includes all the facts for one context/filing pair.
If one context is represented in multiple filings, we take the most recently-reported non-null value.
This means that if a utility reports a non-null value, then later either reports a null value for it or simply omits it from the report, we keep the old non-null value, which may be erroneous. This appears to be fairly rare, affecting < 0.005% of reported values.