pudl.validate module¶
PUDL data validation functions and test case specifications.
- What defines a data validation?
What data are we checking? * What table or output does it come from? * What selection criteria do we apply to that table or output?
What are we checking it against? * Itself (helps validate that the tests themselves are working) * A processed version of itself (aggregation or derived values) * A hard-coded external standard (e.g. heat rates, fuel heat content)
-
pudl.validate.
bf_eia923_agg
= [{'title': 'Coal ash content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.2, 'mid_q': 0.7, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Coal sulfur content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': False, 'mid_q': False, 'hi_q': False, 'data_col': 'sulfur_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Coal heat content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Petroleum heat content', 'query': "fuel_type_code_pudl=='oil'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Gas heat content', 'query': "fuel_type_code_pudl=='gas'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ EIA923 Boiler Fuel data validation against aggregated data.
-
pudl.validate.
bf_eia923_coal_ash_content
= [{'title': 'Bituminous coal ash content (middle)', 'query': "fuel_type_code=='BIT'", 'low_q': 0.5, 'low_bound': 6.0, 'hi_q': 0.5, 'hi_bound': 15.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Sub-bituminous coal ash content (middle)', 'query': "fuel_type_code=='SUB'", 'low_q': 0.5, 'low_bound': 4.5, 'hi_q': 0.5, 'hi_bound': 7.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Lignite ash content (middle)', 'query': "fuel_type_code=='LIG'", 'low_q': 0.5, 'low_bound': 7.0, 'hi_q': 0.5, 'hi_bound': 30.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'All coal ash content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 4.0, 'hi_q': 0.5, 'hi_bound': 20.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}]¶ Valid coal ash content (%). Based on historical reporting in EIA 923.
-
pudl.validate.
bf_eia923_coal_heat_content
= [{'title': 'Bituminous coal heat content (middle)', 'query': "fuel_type_code=='BIT'", 'low_q': 0.5, 'low_bound': 20.5, 'hi_q': 0.5, 'hi_bound': 26.5, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Bituminous coal heat content (tails)', 'query': "fuel_type_code=='BIT'", 'low_q': 0.05, 'low_bound': 17.0, 'hi_q': 0.95, 'hi_bound': 30.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Sub-bituminous coal heat content (middle)', 'query': "fuel_type_code=='SUB'", 'low_q': 0.5, 'low_bound': 16.5, 'hi_q': 0.5, 'hi_bound': 18.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Sub-bituminous coal heat content (tails)', 'query': "fuel_type_code=='SUB'", 'low_q': 0.05, 'low_bound': 15.0, 'hi_q': 0.95, 'hi_bound': 20.5, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Lignite heat content (middle)', 'query': "fuel_type_code=='LIG'", 'low_q': 0.5, 'low_bound': 12.0, 'hi_q': 0.5, 'hi_bound': 14.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Lignite heat content (tails)', 'query': "fuel_type_code=='LIG'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 15.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'All coal heat content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 10.0, 'hi_q': 0.5, 'hi_bound': 30.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid coal (bituminous, sub-bituminous, and lignite) heat content values.
Based on IEA coal grade definitions: https://www.iea.org/statistics/resources/balancedefinitions/
-
pudl.validate.
bf_eia923_coal_sulfur_content
= [{'title': 'Coal sulfur content (tails)', 'query': "fuel_type_code_pudl=='coal'", 'hi_q': 0.95, 'hi_bound': 4.0, 'low_q': 0.05, 'low_bound': 0.15, 'data_col': 'sulfur_content_pct', 'weight_col': 'fuel_consumed_units'}]¶ Valid coal sulfur content values.
Based on historically reported values in EIA 923 Fuel Receipts and Costs.
-
pudl.validate.
bf_eia923_gas_heat_content
= [{'title': 'Natural Gas heat content (middle)', 'query': "fuel_type_code_pudl=='gas'", 'hi_q': 0.5, 'hi_bound': 1.036, 'low_q': 0.5, 'low_bound': 1.018, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Natural Gas heat content (tails)', 'query': "fuel_type_code_pudl=='gas'", 'hi_q': 0.99, 'hi_bound': 1.15, 'low_q': 0.01, 'low_bound': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid natural gas heat content values.
Based on historically reported values in EIA 923 Fuel Receipts and Costs. May fail because of a population of bad data around 0.1 mmbtu/unit. This appears to be an off-by-10x error, possibly due to reporting error in units used.
-
pudl.validate.
bf_eia923_oil_heat_content
= [{'title': 'Diesel Fuel Oil heat content (tails)', 'query': "fuel_type_code=='DFO'", 'low_q': 0.05, 'low_bound': 5.5, 'hi_q': 0.95, 'hi_bound': 6.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Diesel Fuel Oil heat content (middle)', 'query': "fuel_type_code=='DFO'", 'low_q': 0.5, 'low_bound': 5.75, 'hi_q': 0.5, 'hi_bound': 5.85, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'All petroleum heat content (tails)', 'query': "fuel_type_code_pudl=='oil'", 'low_q': 0.05, 'low_bound': 5.0, 'hi_q': 0.95, 'hi_bound': 6.5, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid petroleum based fuel heat content values.
Based on historically reported values in EIA 923 Fuel Receipts and Costs.
-
pudl.validate.
bf_eia923_self
= [{'title': 'Bituminous coal ash content', 'query': "fuel_type_code=='BIT'", 'low_q': 0.05, 'mid_q': 0.25, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Subbituminous coal ash content', 'query': "fuel_type_code=='SUB'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Lignite coal ash content', 'query': "fuel_type_code=='LIG'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_consumed_units'}, {'title': 'Bituminous coal heat content', 'query': "fuel_type_code=='BIT'", 'low_q': 0.07, 'mid_q': 0.5, 'hi_q': 0.98, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Subbituminous coal heat content', 'query': "fuel_type_code=='SUB'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.9, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Lignite heat content', 'query': "fuel_type_code=='LIG'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Diesel Fuel Oil heat content', 'query': "fuel_type_code=='DFO'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ EIA923 Boiler Fuel data validation against itself.
-
pudl.validate.
bounds_histogram
(df, data_col, weight_col, query, low_q, hi_q, low_bound, hi_bound, title='')[source]¶ Plot a weighted histogram showing acceptable bounds/actual values.
-
pudl.validate.
check_max_rows
(df, n_rows=inf, df_name='')[source]¶ Validate that a dataframe has less than a maximum number of rows.
-
pudl.validate.
check_min_rows
(df, n_rows=0, df_name='')[source]¶ Validate that a dataframe has a certain minimum number of rows.
-
pudl.validate.
check_unique_rows
(df, subset=None, df_name='')[source]¶ Test whether dataframe has unique records within a subset of columns.
- Parameters
df (pandas.DataFrame) – DataFrame to check for duplicate records.
subset (iterable or None) – Columns to consider in checking for dupes.
df_name (str) – Name of the dataframe, to aid in debugging/logging.
- Returns
- The same DataFrame as was passed in, for use in
DataFrame.pipe().
- Return type
- Raises
ValueError – If there are duplicate records in the subset of selected columns.
-
pudl.validate.
frc_eia923_ag_byproduct_heat_content
= [{'title': 'Agricultural byproduct heat content (tails)', 'query': "energy_source_code=='AB'", 'low_q': 0.05, 'low_bound': 7.0, 'hi_q': 0.95, 'hi_bound': 18.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable agricultural byproduct heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_agg
= [{'title': 'Coal ash content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.2, 'mid_q': 0.7, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Coal chlorine content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': False, 'mid_q': False, 'hi_q': False, 'data_col': 'chlorine_content_ppm', 'weight_col': 'fuel_qty_units'}, {'title': 'Coal fuel costs', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'fuel_qty_units'}, {'title': 'Coal sulfur content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': False, 'mid_q': False, 'hi_q': False, 'data_col': 'sulfur_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Gas heat content', 'query': "fuel_type_code_pudl=='gas'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Gas fuel costs', 'query': "fuel_type_code_pudl=='gas'", 'low_q': False, 'mid_q': 0.5, 'hi_q': False, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'fuel_qty_units'}, {'title': 'Petroleum fuel cost', 'query': "fuel_type_code_pudl=='oil'", 'low_q': False, 'mid_q': 0.5, 'hi_q': False, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'fuel_qty_units'}, {'title': 'Petroleum heat content', 'query': "fuel_type_code_pudl=='oil'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ EIA923 fuel receipts & costs data validation against aggregated data.
-
pudl.validate.
frc_eia923_biomass_gas_heat_content
= [{'title': 'Other biomass gas heat content (tails)', 'query': "energy_source_code=='OBG'", 'low_q': 0.05, 'low_bound': 0.36, 'hi_q': 0.95, 'hi_bound': 1.6, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable other biomass gas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_biomass_liquids_heat_content
= [{'title': 'Other biomass liquids heat content (tails)', 'query': "energy_source_code=='OBL'", 'low_q': 0.05, 'low_bound': 3.5, 'hi_q': 0.95, 'hi_bound': 4.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable other biomass liquids heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_biomass_solids_heat_content
= [{'title': 'Other biomass solids heat content (tails)', 'query': "energy_source_code=='OBS'", 'low_q': 0.05, 'low_bound': 8.0, 'hi_q': 0.95, 'hi_bound': 25.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable other biomass solids heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_black_liquor_heat_content
= [{'title': 'Black liquor heat content (tails)', 'query': "energy_source_code=='BLQ'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 14.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable black liquor heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_blast_furnace_gas_heat_content
= [{'title': 'Blast furnace gas heat content (tails)', 'query': "energy_source_code=='BFG'", 'low_q': 0.05, 'low_bound': 0.07, 'hi_q': 0.95, 'hi_bound': 0.12, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable blast furnace gas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_ant_heat_content
= [{'title': 'Anthracite coal heat content (middle)', 'query': "energy_source_code=='ANT'", 'low_q': 0.5, 'low_bound': 20.5, 'hi_q': 0.5, 'hi_bound': 26.5, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Anthracite coal heat content (tails)', 'query': "energy_source_code=='ANT'", 'low_q': 0.05, 'low_bound': 22.0, 'hi_q': 0.95, 'hi_bound': 29.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable anthracite coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_ash_content
= [{'title': 'Bituminous coal ash content (middle)', 'query': "energy_source_code=='BIT'", 'low_q': 0.5, 'low_bound': 6.0, 'hi_q': 0.5, 'hi_bound': 15.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Sub-bituminous coal ash content (middle)', 'query': "energy_source_code=='SUB'", 'low_q': 0.5, 'low_bound': 4.5, 'hi_q': 0.5, 'hi_bound': 7.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite ash content (middle)', 'query': "energy_source_code=='LIG'", 'low_q': 0.5, 'low_bound': 7.0, 'hi_q': 0.5, 'hi_bound': 30.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'All coal ash content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 4.0, 'hi_q': 0.5, 'hi_bound': 20.0, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}]¶ Valid coal ash content (%). Based on historical reporting in EIA 923.
-
pudl.validate.
frc_eia923_coal_bit_heat_content
= [{'title': 'Bituminous coal heat content (middle)', 'query': "energy_source_code=='BIT'", 'low_q': 0.5, 'low_bound': 20.5, 'hi_q': 0.5, 'hi_bound': 26.5, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Bituminous coal heat content (tails)', 'query': "energy_source_code=='BIT'", 'low_q': 0.05, 'low_bound': 18.0, 'hi_q': 0.95, 'hi_bound': 29.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable bituminous coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_cc_heat_content
= [{'title': 'Refined coal heat content (tails)', 'query': "energy_source_code=='RC'", 'low_q': 0.05, 'low_bound': 6.5, 'hi_q': 0.95, 'hi_bound': 16.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable refined coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_lig_heat_content
= [{'title': 'Lignite heat content (middle)', 'query': "energy_source_code=='LIG'", 'low_q': 0.5, 'low_bound': 12.0, 'hi_q': 0.5, 'hi_bound': 14.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite heat content (tails)', 'query': "energy_source_code=='LIG'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 15.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable lignite coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_mercury_content
= [{'title': 'Coal mercury content (upper tail)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': False, 'low_bound': False, 'hi_q': 0.95, 'hi_bound': 0.125, 'data_col': 'mercury_content_ppm', 'weight_col': 'fuel_qty_units'}, {'title': 'Coal mercury content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 0.0, 'hi_q': 0.5, 'hi_bound': 0.1, 'data_col': 'mercury_content_ppm', 'weight_col': 'fuel_qty_units'}]¶ Valid coal mercury content limits.
Based on USGS FS095-01: https://pubs.usgs.gov/fs/fs095-01/fs095-01.html Upper tail may fail because of a population of extremely high mercury content coal (9.0ppm) which is likely a reporting error.
-
pudl.validate.
frc_eia923_coal_moisture_content
= [{'title': 'Bituminous coal moisture content (middle)', 'query': "energy_source_code=='BIT'", 'low_q': 0.5, 'low_bound': 5.0, 'hi_q': 0.5, 'hi_bound': 16.5, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Sub-bituminous coal moisture content (middle)', 'query': "energy_source_code=='SUB'", 'low_q': 0.5, 'low_bound': 15.0, 'hi_q': 0.5, 'hi_bound': 32.5, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite moisture content (middle)', 'query': "energy_source_code=='LIG'", 'low_q': 0.5, 'low_bound': 25.0, 'hi_q': 0.5, 'hi_bound': 45.0, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'All coal moisture content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 5.0, 'hi_q': 0.5, 'hi_bound': 40.0, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}]¶ Valid coal moisture content, based on historical EIA 923 reporting.
-
pudl.validate.
frc_eia923_coal_sub_heat_content
= [{'title': 'Sub-bituminous coal heat content (middle)', 'query': "energy_source_code=='SUB'", 'low_q': 0.5, 'low_bound': 16.5, 'hi_q': 0.5, 'hi_bound': 18.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Sub-bituminous coal heat content (tails)', 'query': "energy_source_code=='SUB'", 'low_q': 0.05, 'low_bound': 15.0, 'hi_q': 0.95, 'hi_bound': 20.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable Sub-bituminous coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_coal_sulfur_content
= [{'title': 'Coal sulfur content (tails)', 'query': "fuel_type_code_pudl=='coal'", 'hi_q': 0.95, 'hi_bound': 4.0, 'low_q': 0.05, 'low_bound': 0.15, 'data_col': 'sulfur_content_pct', 'weight_col': 'fuel_qty_units'}]¶ Valid coal sulfur content values.
Based on historically reported values in EIA 923 Fuel Receipts and Costs.
-
pudl.validate.
frc_eia923_coal_wc_heat_content
= [{'title': 'Waste coal heat content (tails)', 'query': "energy_source_code=='WC'", 'low_q': 0.05, 'low_bound': 6.5, 'hi_q': 0.95, 'hi_bound': 16.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable waste coal heat content.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_gas_sgc_heat_content
= [{'title': 'Coal syngas heat content (tails)', 'query': "energy_source_code=='SGC'", 'low_q': 0.05, 'low_bound': 0.2, 'hi_q': 0.95, 'hi_bound': 0.3, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable coal syngas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_landfill_gas_heat_content
= [{'title': 'Landfill gas heat content (tails)', 'query': "energy_source_code=='LFG'", 'low_q': 0.05, 'low_bound': 0.3, 'hi_q': 0.95, 'hi_bound': 0.6, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable landfill gas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_muni_solids_heat_content
= [{'title': 'Municipal solid waste heat content (tails)', 'query': "energy_source_code=='MSW'", 'low_q': 0.05, 'low_bound': 9.0, 'hi_q': 0.95, 'hi_bound': 12.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable municipal solid waste heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_natural_gas_heat_content
= [{'title': 'Natural gas heat content (tails)', 'query': "energy_source_code=='NG'", 'low_q': 0.05, 'low_bound': 0.8, 'hi_q': 0.95, 'hi_bound': 1.2, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable natural gas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_oil_dfo_heat_content
= [{'title': 'Diesel Fuel Oil heat content (tails)', 'query': "energy_source_code=='DFO'", 'low_q': 0.05, 'low_bound': 5.5, 'hi_q': 0.95, 'hi_bound': 6.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Diesel Fuel Oil heat content (middle)', 'query': "energy_source_code=='DFO'", 'low_q': 0.5, 'low_bound': 5.75, 'hi_q': 0.5, 'hi_bound': 5.85, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable diesel fuel oil heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_oil_jf_heat_content
= [{'title': 'Jet fuel heat content (tails)', 'query': "energy_source_code=='JF'", 'low_q': 0.05, 'low_bound': 5.0, 'hi_q': 0.95, 'hi_bound': 6.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable jet fuel heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_oil_ker_heat_content
= [{'title': 'Kerosene heat content (tails)', 'query': "energy_source_code=='KER'", 'low_q': 0.05, 'low_bound': 5.6, 'hi_q': 0.95, 'hi_bound': 6.1, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable kerosene heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_other_gas_heat_content
= [{'title': 'Other gas heat content (tails)', 'query': "energy_source_code=='OG'", 'low_q': 0.05, 'low_bound': 0.07, 'hi_q': 0.95, 'hi_bound': 3.3, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable other gas heat contents.
Based on values given in the EIA 923 instructions, but with the lower bound set by the expected lower bound of heat content on blast furnace gas (since there were “other” gasses with bounds lower than the expected 0.32 in the data) https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_petcoke_heat_content
= [{'title': 'Petroleum coke heat content (tails)', 'query': "energy_source_code=='PC'", 'low_q': 0.05, 'low_bound': 24.0, 'hi_q': 0.95, 'hi_bound': 30.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable petroleum coke heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_petcoke_syngas_heat_content
= [{'title': 'Petcoke syngas heat content (tails)', 'query': "energy_source_code=='SGP'", 'low_q': 0.05, 'low_bound': 0.2, 'hi_q': 0.95, 'hi_bound': 1.1, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable petcoke syngas heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_propane_heat_content
= [{'title': 'Propane heat content (tails)', 'query': "energy_source_code=='PG'", 'low_q': 0.05, 'low_bound': 2.5, 'hi_q': 0.95, 'hi_bound': 2.75, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable propane heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_rfo_heat_content
= [{'title': 'Residual fuel oil heat content (tails)', 'query': "energy_source_code=='RFO'", 'low_q': 0.05, 'low_bound': 5.7, 'hi_q': 0.95, 'hi_bound': 6.9, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable residual fuel oil heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_self
= [{'title': 'Bituminous coal ash content', 'query': "energy_source_code=='BIT'", 'low_q': 0.05, 'mid_q': 0.25, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Subbituminous coal ash content', 'query': "energy_source_code=='SUB'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite coal ash content', 'query': "energy_source_code=='LIG'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'ash_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Bituminous coal heat content', 'query': "energy_source_code=='BIT'", 'low_q': 0.07, 'mid_q': 0.5, 'hi_q': 0.98, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Subbituminous coal heat content', 'query': "energy_source_code=='SUB'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.9, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite heat content', 'query': "energy_source_code=='LIG'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Diesel Fuel Oil heat content', 'query': "energy_source_code=='DFO'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}, {'title': 'Bituminous coal moisture content', 'query': "energy_source_code=='BIT'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Subbituminous coal moisture content', 'query': "energy_source_code=='SUB'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}, {'title': 'Lignite moisture content', 'query': "energy_source_code=='LIG'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 1.0, 'data_col': 'moisture_content_pct', 'weight_col': 'fuel_qty_units'}]¶ EIA923 fuel receipts & costs data validation against itself.
-
pudl.validate.
frc_eia923_sludge_heat_content
= [{'title': 'Sludge waste heat content (tails)', 'query': "energy_source_code=='SLW'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 16.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable sludget waste heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_waste_oil_heat_content
= [{'title': 'Waste oil heat content (tails)', 'query': "energy_source_code=='WO'", 'low_q': 0.05, 'low_bound': 3.0, 'hi_q': 0.95, 'hi_bound': 5.8, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable waste oil heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_wood_liquids_heat_content
= [{'title': 'Wood waste liquids heat content (tails)', 'query': "energy_source_code=='WDL'", 'low_q': 0.05, 'low_bound': 8.0, 'hi_q': 0.95, 'hi_bound': 14.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable wood waste liquids heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
frc_eia923_wood_solids_heat_content
= [{'title': 'Wood solids heat content (tails)', 'query': "energy_source_code=='WDS'", 'low_q': 0.05, 'low_bound': 7.0, 'hi_q': 0.95, 'hi_bound': 18.0, 'data_col': 'heat_content_mmbtu_per_unit', 'weight_col': 'fuel_qty_units'}]¶ Check for reasonable wood solids heat contents.
Based on values given in the EIA 923 instructions: https://www.eia.gov/survey/form/eia_923/instructions.pdf
-
pudl.validate.
gf_eia923_agg
= [{'title': 'Coal heat content', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Petroleum heat content', 'query': "fuel_type_code_pudl=='oil'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Gas heat content', 'query': "fuel_type_code_pudl=='gas'", 'low_q': 0.1, 'mid_q': 0.5, 'hi_q': 0.95, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ EIA923 Boiler Fuel data validation against aggregated data.
-
pudl.validate.
gf_eia923_coal_heat_content
= [{'title': 'All coal heat content (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 10.0, 'hi_q': 0.5, 'hi_bound': 30.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid coal heat content values (all coal types).
The Generation Fuel table does not break different coal types out separately, so we can only test the validity of the entire suite of coal records.
Based on IEA coal grade definitions: https://www.iea.org/statistics/resources/balancedefinitions/
-
pudl.validate.
gf_eia923_gas_heat_content
= [{'title': 'All gas heat content (middle)', 'query': "fuel_type_code_pudl=='gas'", 'low_q': 0.5, 'low_bound': 0.975, 'hi_q': 0.5, 'hi_bound': 1.075, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'All gas heat content (middle)', 'query': "fuel_type_code_pudl=='gas'", 'low_q': 0.2, 'low_bound': 0.95, 'hi_q': 0.9, 'hi_bound': 1.1, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid natural gas heat content values.
Focuses on natural gas proper. Lower bound excludes other types of gaseous fuels intentionally.
-
pudl.validate.
gf_eia923_oil_heat_content
= [{'title': 'Diesel Fuel Oil heat content (tails)', 'query': "fuel_type_code_aer=='DFO'", 'low_q': 0.05, 'low_bound': 5.5, 'hi_q': 0.95, 'hi_bound': 6.0, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'Diesel Fuel Oil heat content (middle)', 'query': "fuel_type_code_aer=='DFO'", 'low_q': 0.5, 'low_bound': 5.75, 'hi_q': 0.5, 'hi_bound': 5.85, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}, {'title': 'All petroleum heat content (tails)', 'query': "fuel_type_code_pudl=='oil'", 'low_q': 0.05, 'low_bound': 5.0, 'hi_q': 0.95, 'hi_bound': 6.5, 'data_col': 'fuel_mmbtu_per_unit', 'weight_col': 'fuel_consumed_units'}]¶ Valid petroleum based fuel heat content values.
Based on historically reported values in EIA 923 Fuel Receipts and Costs.
-
pudl.validate.
historical_distribution
(df, data_col, weight_col, quantile)[source]¶ Calculate a historical distribution of weighted values of a column.
In order to know what a “reasonable” value of a particular column is in the pudl data, we can use this function to see what the value in that column has been in each of the years of data we have on hand, and a given quantile. This population of values can then be used to set boundaries on acceptable data distributions in the aggregated and processed data.
- Parameters
df (pandas.DataFrame) – a dataframe containing historical data, with a column named either
report_date
orreport_year
.data_col (str) – Label of the column containing the data of interest.
weight_col (str) – Label of the column containing the weights to be used in scaling the data.
- Returns
The weighted quantiles of data, for each of the years found in the historical data of df.
- Return type
-
pudl.validate.
historical_histogram
(orig_df, test_df, data_col, weight_col, query='', low_q=0.05, mid_q=0.5, hi_q=0.95, low_bound=None, hi_bound=None, title='')[source]¶ Weighted histogram comparing distribution with historical subsamples.
-
pudl.validate.
mcoe_coal_capacity_factor
= [{'title': 'Coal Capacity Factor (middle)', 'query': "fuel_type_code_pudl=='coal' and capacity_factor!=0.0", 'low_q': 0.6, 'low_bound': 0.5, 'hi_q': 0.6, 'hi_bound': 0.9, 'data_col': 'capacity_factor', 'weight_col': 'capacity_mw'}, {'title': 'Coal Capacity Factor (tails)', 'query': "fuel_type_code_pudl=='coal' and capacity_factor!=0.0", 'low_q': 0.1, 'low_bound': 0.04, 'hi_q': 0.95, 'hi_bound': 0.95, 'data_col': 'capacity_factor', 'weight_col': 'capacity_mw'}]¶ Static constraints on coal fired generator capacity factors.
-
pudl.validate.
mcoe_coal_heat_rate
= [{'title': 'Coal Unit Heat Rates (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 10.0, 'hi_q': 0.5, 'hi_bound': 11.0, 'data_col': 'heat_rate_mmbtu_mwh', 'weight_col': 'net_generation_mwh'}, {'title': 'Coal Unit Heat Rates (tails)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'low_bound': 9.0, 'hi_q': 0.95, 'hi_bound': 12.5, 'data_col': 'heat_rate_mmbtu_mwh', 'weight_col': 'net_generation_mwh'}]¶ Static constraints on coal fired generator heat rates.
-
pudl.validate.
mcoe_fuel_cost_per_mmbtu
= [{'title': 'Coal Fuel Costs (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 1.5, 'hi_q': 0.5, 'hi_bound': 3.0, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'total_mmbtu'}, {'title': 'Coal Fuel Costs (tails)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'low_bound': 1.25, 'hi_q': 0.95, 'hi_bound': 4.5, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'total_mmbtu'}, {'title': 'Natural Gas Fuel Costs (middle, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.5, 'low_bound': 2.0, 'hi_q': 0.5, 'hi_bound': 4.0, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'total_mmbtu'}, {'title': 'Natural Gas Fuel Costs (tails, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.05, 'low_bound': 1.75, 'hi_q': 0.95, 'hi_bound': 6.0, 'data_col': 'fuel_cost_per_mmbtu', 'weight_col': 'total_mmbtu'}]¶ Static constraints on fuel costs per mmbtu of fuel consumed.
-
pudl.validate.
mcoe_fuel_cost_per_mwh
= [{'title': 'Coal Fuel Costs (middle)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.5, 'low_bound': 18.0, 'hi_q': 0.5, 'hi_bound': 27.0, 'data_col': 'fuel_cost_per_mwh', 'weight_col': 'net_generation_mwh'}, {'title': 'Coal Fuel Costs (tails)', 'query': "fuel_type_code_pudl=='coal'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 50.0, 'data_col': 'fuel_cost_per_mwh', 'weight_col': 'net_generation_mwh'}, {'title': 'Natural Gas Fuel Costs (middle, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.5, 'low_bound': 20.0, 'hi_q': 0.5, 'hi_bound': 30.0, 'data_col': 'fuel_cost_per_mwh', 'weight_col': 'net_generation_mwh'}, {'title': 'Natural Gas Fuel Costs (tails, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.05, 'low_bound': 10.0, 'hi_q': 0.95, 'hi_bound': 50.0, 'data_col': 'fuel_cost_per_mwh', 'weight_col': 'net_generation_mwh'}]¶ Static constraints on fuel costs per MWh net generation.
-
pudl.validate.
mcoe_gas_capacity_factor
= [{'title': 'Natural Gas Capacity Factor (middle, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01' and capacity_factor!=0.0", 'low_q': 0.65, 'low_bound': 0.4, 'hi_q': 0.65, 'hi_bound': 0.7, 'data_col': 'capacity_factor', 'weight_col': 'capacity_mw'}, {'title': 'Natural Gas Capacity Factor (tails, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01' and capacity_factor!=0.0", 'low_q': 0.15, 'low_bound': 0.01, 'hi_q': 0.95, 'hi_bound': 0.95, 'data_col': 'capacity_factor', 'weight_col': 'capacity_mw'}]¶ Static constraints on natural gas generator capacity factors.
-
pudl.validate.
mcoe_gas_heat_rate
= [{'title': 'Natural Gas Unit Heat Rates (middle, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.5, 'low_bound': 7.0, 'hi_q': 0.5, 'hi_bound': 7.5, 'data_col': 'heat_rate_mmbtu_mwh', 'weight_col': 'net_generation_mwh'}, {'title': 'Natural Gas Unit Heat Rates (tails, 2015+)', 'query': "fuel_type_code_pudl=='gas' and report_date>='2015-01-01'", 'low_q': 0.05, 'low_bound': 6.5, 'hi_q': 0.95, 'hi_bound': 13.0, 'data_col': 'heat_rate_mmbtu_mwh', 'weight_col': 'net_generation_mwh'}]¶ Static constraints on gas fired generator heat rates.
-
pudl.validate.
no_null_cols
(df, cols='all', df_name='')[source]¶ Check that a dataframe has no all-NaN columns.
Occasionally in the concatenation / merging of dataframes we get a label wrong, and it results in a fully NaN column… which should probably never actually happen. This is a quick verification.
- Parameters
df (pandas.DataFrame) – DataFrame to check for null columns.
cols (iterable or "all") – The labels of columns to check for all-null values. If “all” check all columns.
df_name (str) – Name of the dataframe, to aid in debugging/logging.
- Returns
- The same DataFrame as was passed in, for use in
DataFrame.pipe().
- Return type
- Raises
ValueError – If any completely NaN / Null valued columns are found.
-
pudl.validate.
plot_vs_agg
(orig_df, agg_df, validation_cases)[source]¶ Validate a bunch of distributions against aggregated versions.
-
pudl.validate.
plot_vs_bounds
(df, validation_cases)[source]¶ Run through a data validation based on absolute bounds.
-
pudl.validate.
plot_vs_self
(df, validation_cases)[source]¶ Validate a bunch of distributions against themselves.
-
pudl.validate.
vs_bounds
(df, data_col, weight_col, query='', title='', low_q=False, low_bound=False, hi_q=False, hi_bound=False)[source]¶ Test a distribution against an upper bound, lower bound, or both.
-
pudl.validate.
vs_historical
(orig_df, test_df, data_col, weight_col, query='', low_q=0.05, mid_q=0.5, hi_q=0.95, title='')[source]¶ Validate aggregated distributions against original data.
-
pudl.validate.
vs_self
(df, data_col, weight_col, query='', title='', low_q=0.05, mid_q=0.5, hi_q=0.95)[source]¶ Test a distribution against its own historical range.
This is a special case of the
pudl.validate.vs_historical()
function, in which both theorig_df
andtest_df
are the same. Mostly it helps ensure that the test itself is valid for the given distribution.
-
pudl.validate.
weighted_quantile
(data, weights, quantile)[source]¶ Calculate the weighted quantile of a Series or DataFrame column.
This function allows us to take two columns from a
pandas.DataFrame
one of which contains an observed value (data) like heat content per unit of fuel, and the other of which (weights) contains a quantity like quantity of fuel delivered which should be used to scale the importance of the observed value in an overall distribution, and calculate the values that the scaled distribution will have at various quantiles.- Parameters
data (pandas.Series) – A series containing numeric data.
weights (pandas.series) – Weights to use in scaling the data. Must have the same length as data.
quantile (float) – A number between 0 and 1, representing the quantile at which we want to find the value of the weighted data.
- Returns
the value in the weighted data corresponding to the given quantile. If there are no values in the data, return
numpy.na
.- Return type