pudl.metadata.resources
#
A subpackage to define and organize PUDL database tables by data group.
Submodules#
pudl.metadata.resources.allocate_gen_fuel
pudl.metadata.resources.eia
pudl.metadata.resources.eia860
pudl.metadata.resources.eia861
pudl.metadata.resources.eia923
pudl.metadata.resources.eia_bulk_elec
pudl.metadata.resources.epacems
pudl.metadata.resources.ferc1
pudl.metadata.resources.ferc714
pudl.metadata.resources.glue
pudl.metadata.resources.mcoe
pudl.metadata.resources.pudl
Package Contents#
Functions#
|
Build foreign keys for each resource. |
Attributes#
Generated foreign key constraints by resource name. |
|
Columns kept for either entity or annual EIA tables in the harvesting process. |
- pudl.metadata.resources.build_foreign_keys(resources: dict[str, dict], prune: bool = True) dict[str, list[dict]] [source]#
Build foreign keys for each resource.
A resource’s foreign_key_rules (if present) determines which other resources will be assigned a foreign key (foreign_keys) to the reference’s primary key:
fields (list[list[str]]): Sets of field names for which to create a foreign key. These are assumed to match the order of the reference’s primary key fields.
exclude (Optional[list[str]]): Names of resources to exclude.
- Parameters:
resources – Resource descriptors by name.
prune – Whether to prune redundant foreign keys.
- Returns:
Foreign keys for each resource (if any), by resource name.
fields (list[str]): Field names.
reference[‘resource’] (str): Reference resource name.
reference[‘fields’] (list[str]): Reference resource field names.
Examples
>>> resources = { ... 'x': { ... 'schema': { ... 'fields': ['z'], ... 'primary_key': ['z'], ... 'foreign_key_rules': {'fields': [['z']]} ... } ... }, ... 'y': { ... 'schema': { ... 'fields': ['z', 'yy'], ... 'primary_key': ['z', 'yy'], ... 'foreign_key_rules': {'fields': [['z', 'zz']]} ... } ... }, ... 'z': {'schema': {'fields': ['z', 'zz']}} ... } >>> keys = build_foreign_keys(resources) >>> keys['z'] [{'fields': ['z', 'zz'], 'reference': {'resource': 'y', 'fields': ['z', 'yy']}}] >>> keys['y'] [{'fields': ['z'], 'reference': {'resource': 'x', 'fields': ['z']}}] >>> keys = build_foreign_keys(resources, prune=False) >>> keys['z'][0] {'fields': ['z'], 'reference': {'resource': 'x', 'fields': ['z']}}
- pudl.metadata.resources.FOREIGN_KEYS: dict[str, list[dict]][source]#
Generated foreign key constraints by resource name.
- pudl.metadata.resources.ENTITIES: dict[str, dict[str, list[str] | dict[str, str]]][source]#
Columns kept for either entity or annual EIA tables in the harvesting process.
For each entity type (key), the ID columns, static columns, annual columns, and mapped columns.
The order of the entities matters. Plants must be harvested before utilities, since plant location must be removed before the utility locations are harvested.
Mapped columns allow for harvesting the same entity ID / value relationship from multiple columns in the same input dataframe. This is useful if a table has multiple sets of entities that should be harvested, for example owner and operator utilities showing up in the same ownership table records.
map_col_dict
maps from column names of the ‘other’ group of entity ID / value columns to a column name in one of theid_cols
,static_cols
, orannual_cols list
. In the harvesting process, these columns are renamed so the relationship can be harvested and added to the normalized entity tables. Note that not all of the columns in themap_cols_dict
need to be present at once, i.e. ifmap_cols_dict
has keyscol_a
andcol_b
, thencol_a
andcol_b
don’t need to be present in the same table.