pudl.output.pudltabl
#
This module provides a class enabling tabular compilations from the PUDL DB.
Many of our potential users are comfortable using spreadsheets, not databases, so we are creating a collection of tabular outputs that contain the most useful core information from the PUDL data packages, including additional keys and human readable names for the objects (utilities, plants, generators) being described in the table.
These tabular outputs can be joined with each other using those keys, and used as a data source within Microsoft Excel, Access, R Studio, or other data analysis packages that folks may be familiar with. They aren’t meant to completely replicate all the data and relationships contained within the full PUDL database, but should serve as a generally usable set of PUDL data products.
The PudlTabl class can also provide access to complex derived values, like the generator and plant level marginal cost of electricity (MCOE), which are defined in the analysis module.
In the long run, this is a probably a kind of prototype for pre-packaged API outputs or data products that we might want to be able to provide to users a la carte.
Module Contents#
Classes#
A class for compiling common useful tabular outputs from the PUDL DB. |
Attributes#
- class pudl.output.pudltabl.PudlTabl(pudl_engine: sqlalchemy.engine.Engine, freq: Literal[AS, MS, None] = None, start_date: str | datetime.date | datetime.datetime | pandas.Timestamp = None, end_date: str | datetime.date | datetime.datetime | pandas.Timestamp = None, fill_fuel_cost: bool = False, roll_fuel_cost: bool = False, fill_net_gen: bool = False, fill_tech_desc: bool = True, unit_ids: bool = False)[source]#
A class for compiling common useful tabular outputs from the PUDL DB.
- _register_output_methods()[source]#
Load output assets and register a class method for retrieving each one.
- _get_table_from_db(table_name: str, allowed_freqs: list[str | None] = [None, 'AS', 'MS'], update: bool = False) pandas.DataFrame [source]#
Grab output table from PUDL DB.
- Parameters:
table_name – Name of table to get.
allowed_freqs – List of allowed aggregation frequencies for table.
update – Ignored. Retained for backwards compatibility only.
- _agg_table_name(table_name: str) str [source]#
Substitute appropriate frequency in aggregated table names.
If the table isn’t aggregated, return the original name.
- _select_between_dates(table: str) sqlalchemy.sql.expression.Select [source]#
For a given table, returns an SQL query that filters by date, if specified.
Method uses the PudlTabl
start_date
andend_date
attributes. For EIA and most other tables, it comparesreport_date
column against start and end dates. For FERC1report_year
is used. If neitherreport_date
norreport_year
are present, no date filtering is done.- Parameters:
table – name of table to be called in SQL query.
- Returns:
A SQLAlchemy select object restricting the date column (either
report_date
orreport_year
) to lie betweenself.start_date
andself.end_date
(inclusive).
- gen_eia923(update: bool = False) pandas.DataFrame [source]#
Pull EIA 923 net generation data by generator.
Net generation is reported in two seperate tables in EIA 923: in the generation_eia923 and generation_fuel_eia923 tables. While the generation_fuel_eia923 table is more complete (the generation_eia923 table includes only ~55% of the reported MWhs), the generation_eia923 table is more granular (it is reported at the generator level).
This method either grabs the generation_eia923 table that is reported by generator, or allocates net generation from the generation_fuel_eia923 table to the generator level.
- Parameters:
update – Ignored. Retained for backwards compatibility only.
- Returns:
A denormalized generation table for interactive use.
- ferc1_eia(update: bool = False, update_plant_parts_eia: bool = False, update_plants_all_ferc1: bool = False, update_fbp_ferc1: bool = False) pandas.DataFrame [source]#
Generate the connection between FERC1 and EIA.
- epacamd_eia() pandas.DataFrame [source]#
Read the EPACAMD-EIA Crosswalk from the PUDL DB.