pudl.extract.eia_bulk_elec¶
Module to extract aggregate data from the EIA bulk electricity download.
EIA’s bulk electricity data contains 680,000 objects, most of which are timeseries. These timeseries contain a variety of measures (fuel amount and cost are just two) across multiple levels of aggregation from individual plants to national averages.
The data is formatted as a single 1.1GB text file of line-delimited JSON with one line per object. Each JSON structure has two nested levels: the top level contains metadata describing the series and the second level (under the “data” heading) contains an array of timestamp/value pairs. This structure leads to a natural normalization into two tables: one of metadata and one of timeseries. That is the format delivered by this module.
Functions¶
|
Pick out the desired data series. |
|
Decompress and filter the 1100 MB file down to the 16 MB we actually want. |
|
|
|
Extract metadata and timeseries from raw EIA bulk electricity data. |
|
Extract metadata and timeseries from raw EIA bulk electricity data. |
Module Contents¶
- pudl.extract.eia_bulk_elec._filter_for_fuel_receipts_costs_series(df: pandas.DataFrame) pandas.DataFrame [source]¶
Pick out the desired data series.
Fuel receipts and costs are about 1% of the total lines. This function filters for series that contain the name “RECEIPTS_BTU” or “COST_BTU” in their
series_id
.Of the approximately 680,000 objects in the dataset, about 19,000 represent things other than data series (such as category definitions or plot axes). Those non-series objects do not have a field called
series_id
. Theexcept KeyError:
clause handles that situation.
- pudl.extract.eia_bulk_elec._filter_and_read_to_dataframe(raw_zipfile: pathlib.Path) pandas.DataFrame [source]¶
Decompress and filter the 1100 MB file down to the 16 MB we actually want.
This produces a dataframe with all text fields. The timeseries data is left as JSON strings in the ‘data’ column. The other columns are metadata.
- pudl.extract.eia_bulk_elec._parse_data_column(elec_df: pandas.DataFrame) pandas.DataFrame [source]¶
- pudl.extract.eia_bulk_elec._extract(raw_zipfile) dict[str, pandas.DataFrame] [source]¶
Extract metadata and timeseries from raw EIA bulk electricity data.
- Parameters:
raw_zipfile – Path or other file-like object that can be read by pd.read_json()
- Returns:
Dictionary of dataframes with keys ‘metadata’ and ‘timeseries’
- pudl.extract.eia_bulk_elec.extract(ds: pudl.workspace.datastore.Datastore) dict[str, pandas.DataFrame] [source]¶
Extract metadata and timeseries from raw EIA bulk electricity data.
- Parameters:
ds – Datastore object
- Returns:
Dictionary of dataframes with keys ‘metadata’ and ‘timeseries’