pudl.load
Routines for loading PUDL data into various storage formats.
Module Contents
Functions
|
Load a dictionary of dataframes into the PUDL SQLite DB. |
|
Write a PUDL table out to a partitioned Parquet dataset. |
Attributes
- pudl.load.dfs_to_sqlite(dfs: Dict[str, pandas.DataFrame], engine: sqlalchemy.engine.Engine, check_foreign_keys: bool = True, check_types: bool = True, check_values: bool = True) None [source]
Load a dictionary of dataframes into the PUDL SQLite DB.
- Parameters
dfs – Dictionary mapping table names to dataframes.
engine – PUDL DB connection engine.
check_foreign_keys – if True, enforce foreign key constraints.
check_types – if True, enforce column data types.
check_values – if True, enforce value constraints.
- pudl.load.df_to_parquet(df: pandas.DataFrame, resource_id: str, root_path: Union[str, pathlib.Path], partition_cols: Union[List[str], Literal[None]] = None) None [source]
Write a PUDL table out to a partitioned Parquet dataset.
Uses the name of the table to look up appropriate metadata and construct a PyArrow schema.
- Parameters
df – The tabular data to be written to a Parquet dataset.
resource_id – Name of the table that’s being written to Parquet.
root_path – Top level directory for the partitioned dataset.
partition_cols – Columns to use to partition the Parquet dataset. For EPA CEMS we use [“year”, “state”].