pudl.output.pudltabl

This module provides a class enabling tabular compilations from the PUDL DB.

Many of our potential users are comfortable using spreadsheets, not databases, so we are creating a collection of tabular outputs that contain the most useful core information from the PUDL data packages, including additional keys and human readable names for the objects (utilities, plants, generators) being described in the table.

These tabular outputs can be joined with each other using those keys, and used as a data source within Microsoft Excel, Access, R Studio, or other data analysis packages that folks may be familiar with. They aren’t meant to completely replicate all the data and relationships contained within the full PUDL database, but should serve as a generally usable set of PUDL data products.

The PudlTabl class can also provide access to complex derived values, like the generator and plant level marginal cost of electricity (MCOE), which are defined in the analysis module.

In the long run, this is a probably a kind of prototype for pre-packaged API outputs or data products that we might want to be able to provide to users a la carte.

Todo

Return to for update arg and returns values in functions below

Module Contents

Classes

PudlTabl

A class for compiling common useful tabular outputs from the PUDL DB.

Functions

get_table_meta(pudl_engine)

Grab the pudl sqlitie database table metadata.

Attributes

logger

pudl.output.pudltabl.logger[source]
class pudl.output.pudltabl.PudlTabl(pudl_engine: sqlalchemy.engine.Engine, ds: Union[pudl.workspace.datastore.Datastore, None] = None, freq: Literal[AS, MS, None] = None, start_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, end_date: Union[str, datetime.date, datetime.datetime, pandas.Timestamp] = None, fill_fuel_cost: bool = False, roll_fuel_cost: bool = False, fill_net_gen: bool = False, backfill_tech_desc: bool = True, unit_ids: bool = False)[source]

Bases: object

A class for compiling common useful tabular outputs from the PUDL DB.

pu_eia860(self, update=False)[source]

Pull a dataframe of EIA plant-utility associations.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

pu_ferc1(self, update=False)[source]

Pull a dataframe of FERC plant-utility associations.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

etl_eia861(self, update: bool = False)[source]

A single function that runs the temporary EIA 861 ETL and sets all DFs.

This is an interim solution that provides a (somewhat) standard way of accessing the EIA 861 data prior to its being fully integrated into the PUDL database. If any of the dataframes is attempted to be accessed, all of them are set. Only the tables that have actual transform functions are included, and as new transform functions are completed, they would need to be added to the list below. Surely there is a way to do this automatically / magically but that’s beyond my knowledge right now.

Parameters

update – Whether to overwrite the existing dataframes if they exist.

advanced_metering_infrastructure_eia861(self, update=False)[source]

An interim EIA 861 output function.

balancing_authority_eia861(self, update=False)[source]

An interim EIA 861 output function.

balancing_authority_assn_eia861(self, update=False)[source]

An interim EIA 861 output function.

demand_response_eia861(self, update=False)[source]

An interim EIA 861 output function.

demand_response_water_heater_eia861(self, update=False)[source]

An interim EIA 861 output function.

demand_side_management_sales_eia861(self, update=False)[source]

An interim EIA 861 output function.

demand_side_management_ee_dr_eia861(self, update=False)[source]

An interim EIA 861 output function.

demand_side_management_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

distributed_generation_tech_eia861(self, update=False)[source]

An interim EIA 861 output function.

distributed_generation_fuel_eia861(self, update=False)[source]

An interim EIA 861 output function.

distributed_generation_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

distribution_systems_eia861(self, update=False)[source]

An interim EIA 861 output function.

dynamic_pricing_eia861(self, update=False)[source]

An interim EIA 861 output function.

energy_efficiency_eia861(self, update=False)[source]

An interim EIA 861 output function.

green_pricing_eia861(self, update=False)[source]

An interim EIA 861 output function.

mergers_eia861(self, update=False)[source]

An interim EIA 861 output function.

net_metering_customer_fuel_class_eia861(self, update=False)[source]

An interim EIA 861 output function.

net_metering_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

non_net_metering_customer_fuel_class_eia861(self, update=False)[source]

An interim EIA 861 output function.

non_net_metering_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

operational_data_revenue_eia861(self, update=False)[source]

An interim EIA 861 output function.

operational_data_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

reliability_eia861(self, update=False)[source]

An interim EIA 861 output function.

sales_eia861(self, update=False)[source]

An interim EIA 861 output function.

service_territory_eia861(self, update=False)[source]

An interim EIA 861 output function.

utility_assn_eia861(self, update=False)[source]

An interim EIA 861 output function.

utility_data_nerc_eia861(self, update=False)[source]

An interim EIA 861 output function.

utility_data_rto_eia861(self, update=False)[source]

An interim EIA 861 output function.

utility_data_misc_eia861(self, update=False)[source]

An interim EIA 861 output function.

etl_ferc714(self, update: bool = False)[source]

A single function that runs the temporary FERC 714 ETL and sets all DFs.

This is an interim solution, so that we can have a (relatively) standard way of accessing the FERC 714 data prior to getting it integrated into the PUDL DB. Some of these are not yet cleaned up, but there are dummy transform functions which pass through the raw DFs with some minor alterations, so all the data is available as it exists right now.

An attempt to access any of the dataframes results in all of them being populated, since generating all of them is almost the same amount of work as generating one of them.

Parameters

update – Whether to overwrite the existing dataframes if they exist.

respondent_id_ferc714(self, update=False)[source]

An interim FERC 714 output function.

demand_hourly_pa_ferc714(self, update=False)[source]

An interim FERC 714 output function.

description_pa_ferc714(self, update=False)[source]

An interim FERC 714 output function.

id_certification_ferc714(self, update=False)[source]

An interim FERC 714 output function.

gen_plants_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

demand_monthly_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

net_energy_load_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

adjacency_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

interchange_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

lambda_hourly_ba_ferc714(self, update=False)[source]

An interim FERC 714 output function.

lambda_description_ferc714(self, update=False)[source]

An interim FERC 714 output function.

demand_forecast_pa_ferc714(self, update=False)[source]

An interim FERC 714 output function.

utils_eia860(self, update=False)[source]

Pull a dataframe describing utilities reported in EIA 860.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

bga_eia860(self, update=False)[source]

Pull a dataframe of boiler-generator associations from EIA 860.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

plants_eia860(self, update=False)[source]

Pull a dataframe of plant level info reported in EIA 860.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

gens_eia860(self, update=False)[source]

Pull a dataframe describing generators, as reported in EIA 860.

If you want to backfill the technology_description field, recreate the pudl_out object with the parameter backfill_tech_desc = True.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

own_eia860(self, update=False)[source]

Pull a dataframe of generator level ownership data from EIA 860.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

gf_eia923(self, update=False)[source]

Pull EIA 923 generation and fuel consumption data.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

gfn_eia923(self, update=False)[source]

Pull EIA 923 generation and fuel consumption data for nuclear units.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

frc_eia923(self, update=False)[source]

Pull EIA 923 fuel receipts and costs data.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

bf_eia923(self, update=False)[source]

Pull EIA 923 boiler fuel consumption data.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

gen_eia923(self, update=False)[source]

Pull EIA 923 net generation data by generator.

Net generation is reported in two seperate tables in EIA 923: in the generation_eia923 and generation_fuel_eia923 tables. While the generation_fuel_eia923 table is more complete (the generation_eia923 table includes only ~55% of the reported MWhs), the generation_eia923 table is more granular (it is reported at the generator level).

This method either grabs the generation_eia923 table that is reported by generator, or allocates net generation from the generation_fuel_eia923 table to the generator level.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

gen_original_eia923(self, update=False)[source]

Pull the original EIA 923 net generation data by generator.

gen_allocated_eia923(self, update=False)[source]

Net generation from gen fuel table allocated to generators.

plants_steam_ferc1(self, update=False)[source]

Pull the FERC Form 1 steam plants data.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

fuel_ferc1(self, update=False)[source]

Pull the FERC Form 1 steam plants fuel consumption data.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

fbp_ferc1(self, update=False)[source]

Summarize FERC Form 1 fuel usage by plant.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

plants_small_ferc1(self, update=False)[source]

Pull the FERC Form 1 Small Plants Table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

plants_hydro_ferc1(self, update=False)[source]

Pull the FERC Form 1 Hydro Plants Table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

plants_pumped_storage_ferc1(self, update=False)[source]

Pull the FERC Form 1 Pumped Storage Table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

purchased_power_ferc1(self, update=False)[source]

Pull the FERC Form 1 Purchased Power Table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

plant_in_service_ferc1(self, update=False)[source]

Pull the FERC Form 1 Plant in Service Table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

all_plants_ferc1(self, update=False)[source]

Pull the FERC Form 1 all plants table.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

hr_by_gen(self, update=False)[source]

Calculate and return generator level heat rates (mmBTU/MWh).

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

hr_by_unit(self, update=False)[source]

Calculate and return generation unit level heat rates.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

fuel_cost(self, update=False)[source]

Calculate and return generator level fuel costs per MWh.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

capacity_factor(self, update=False, min_cap_fact=None, max_cap_fact=None)[source]

Calculate and return generator level capacity factors.

Parameters

update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

Returns

a denormalized table for interactive use.

Return type

pandas.DataFrame

mcoe(self, update=False, min_heat_rate=5.5, min_fuel_cost_per_mwh=0.0, min_cap_fact=0.0, max_cap_fact=1.5, all_gens=True)[source]

Calculate and return generator level MCOE based on EIA data.

Eventually this calculation will include non-fuel operating expenses as reported in FERC Form 1, but for now only the fuel costs reported to EIA are included. They are attibuted based on the unit-level heat rates and fuel costs.

Parameters
  • update (bool) – If true, re-calculate the output dataframe, even if a cached version exists.

  • min_heat_rate – lowest plausible heat rate, in mmBTU/MWh. Any MCOE records with lower heat rates are presumed to be invalid, and are discarded before returning.

  • min_cap_fact – minimum generator capacity factor. Generator records with a lower capacity factor will be filtered out before returning. This allows the user to exclude generators that aren’t being used enough to have valid.

  • min_fuel_cost_per_mwh – minimum fuel cost on a per MWh basis that is required for a generator record to be considered valid. For some reason there are now a large number of $0 fuel cost records, which previously would have been NaN.

  • max_cap_fact – maximum generator capacity factor. Generator records with a lower capacity factor will be filtered out before returning. This allows the user to exclude generators that aren’t being used enough to have valid.

  • all_gens (bool) – Controls whether the output contains records for all generators in the generators_eia860 table, or only those generators with associated MCOE data. True by default.

Returns

a compilation of generator attributes, including fuel costs per MWh.

Return type

pandas.DataFrame

pudl.output.pudltabl.get_table_meta(pudl_engine)[source]

Grab the pudl sqlitie database table metadata.