pudl.output.pudltabl#

This module provides a class enabling tabular compilations from the PUDL DB.

Many of our potential users are comfortable using spreadsheets, not databases, so we are creating a collection of tabular outputs that contain the most useful core information from the PUDL data packages, including additional keys and human readable names for the objects (utilities, plants, generators) being described in the table.

These tabular outputs can be joined with each other using those keys, and used as a data source within Microsoft Excel, Access, R Studio, or other data analysis packages that folks may be familiar with. They aren’t meant to completely replicate all the data and relationships contained within the full PUDL database, but should serve as a generally usable set of PUDL data products.

The PudlTabl class can also provide access to complex derived values, like the generator and plant level marginal cost of electricity (MCOE), which are defined in the analysis module.

In the long run, this is a probably a kind of prototype for pre-packaged API outputs or data products that we might want to be able to provide to users a la carte.

Module Contents#

Classes#

PudlTabl

A class for compiling common useful tabular outputs from the PUDL DB.

Attributes#

pudl.output.pudltabl.logger[source]#
class pudl.output.pudltabl.PudlTabl(pudl_engine: sqlalchemy.Engine, freq: Literal[YS, MS, None] = None, start_date: str | datetime.date | datetime.datetime | pandas.Timestamp = None, end_date: str | datetime.date | datetime.datetime | pandas.Timestamp = None, fill_fuel_cost: bool = False, roll_fuel_cost: bool = False, fill_net_gen: bool = False, fill_tech_desc: bool = True, unit_ids: bool = False)[source]#

A class for compiling common useful tabular outputs from the PUDL DB.

_register_output_methods()[source]#

Load output assets and register a class method for retrieving each one.

_get_table_from_db(table_name: str, allowed_freqs: list[str | None] = [None, 'YS', 'MS'], update: bool = False) pandas.DataFrame[source]#

Grab output table from PUDL DB.

Parameters:
  • table_name – Name of table to get.

  • allowed_freqs – List of allowed aggregation frequencies for table.

  • update – Ignored. Retained for backwards compatibility only.

_agg_table_name(table_name: str) str[source]#

Substitute appropriate frequency in aggregated table names.

If the table isn’t aggregated, return the original name.

_select_between_dates(table: str) sqlalchemy.sql.expression.Select[source]#

For a given table, returns an SQL query that filters by date, if specified.

Method uses the PudlTabl start_date and end_date attributes. For EIA and most other tables, it compares report_date column against start and end dates. For FERC1 report_year is used. If neither report_date nor report_year are present, no date filtering is done.

Parameters:

table – name of table to be called in SQL query.

Returns:

A SQLAlchemy select object restricting the date column (either report_date or report_year) to lie between self.start_date and self.end_date (inclusive).

gen_eia923(update: bool = False) pandas.DataFrame[source]#

Pull EIA 923 net generation data by generator.

Net generation is reported in two seperate tables in EIA 923: in the core_eia923__monthly_generation and core_eia923__monthly_generation_fuel tables. While the core_eia923__monthly_generation_fuel table is more complete (the core_eia923__monthly_generation table includes only ~55% of the reported MWhs), the core_eia923__monthly_generation table is more granular (it is reported at the generator level).

This method either grabs the core_eia923__monthly_generation table that is reported by generator, or allocates net generation from the core_eia923__monthly_generation_fuel table to the generator level.

Parameters:

update – Ignored. Retained for backwards compatibility only.

Returns:

A denormalized generation table for interactive use.

epacamd_eia() pandas.DataFrame[source]#

Read the EPACAMD-EIA Crosswalk from the PUDL DB.