pudl.transform.eia_bulk_elec¶
Clean and normalize EIA bulk electricity data.
EIA’s bulk electricity data contains 680,000 timeseries. These timeseries contain a variety of measures (fuel amount and cost are just two) across multiple levels of aggregation, from individual plants to national averages.
The data is formatted as a single 1.1GB text file of line-delimited JSON with one line per timeseries. Each JSON structure has two nested levels: the top level contains metadata describing the series and the second level (under the “data” heading) contains an array of timestamp/value pairs. This structure leads to a natural normalization into two tables: one of metadata and one of timeseries. That is the format delivered by the extract module.
The transform module parses a compound primary key out of long string IDs (“series_id”). The rest of the metadata is not very valuable so is not transformed or returned.
The EIA aggregates are related to their component categories via a set of association tables defined in pudl.metadata.dfs. For example, the “all_coal” fuel aggregate is linked to all the coal-related energy_source_code values: BIT, SUB, LIG, and WC. Similar relationships are defined for aggregates over fuel, sector, geography, and time.
Functions¶
|
Parse primary key codes from EIA series_id. |
|
|
|
Transform raw timeseries. |
|
Transform raw EIA bulk electricity aggregates. |
Module Contents¶
- pudl.transform.eia_bulk_elec._extract_keys_from_series_id(raw_df: pandas.DataFrame) pandas.DataFrame [source]¶
Parse primary key codes from EIA series_id.
These codes comprise the compound primary key that uniquely identifies a data series: (metric, fuel, region, sector, frequency).
- pudl.transform.eia_bulk_elec._map_key_codes_to_readable_values(compound_keys: pandas.DataFrame) pandas.DataFrame [source]¶
- pudl.transform.eia_bulk_elec._transform_timeseries(raw_ts: pandas.DataFrame) pandas.DataFrame [source]¶
Transform raw timeseries.
Transform to tidy format and replace the obscure series_id with a readable compound primary key.
- Returns:
A dataframe with compound key (“fuel_agg”, “geo_agg”, “sector_agg”, “temporal_agg”, “report_date”) and two value columns: “fuel_received_mmbtu”, “fuel_cost_per_mmbtu”
- pudl.transform.eia_bulk_elec.transform(raw_dfs: dict[str, pandas.DataFrame]) pandas.DataFrame [source]¶
Transform raw EIA bulk electricity aggregates.
- Parameters:
raw_dfs – raw timeseries dataframe
- Returns:
(“fuel_agg”, “geo_agg”, “sector_agg”, “temporal_agg”, “report_date”) and two value columns: “fuel_received_mmbtu”, “fuel_cost_per_mmbtu”
- Return type:
Transformed timeseries dataframe with compound key