`pudl.load`

Routines for loading PUDL data into various storage formats.

Module Contents

`dfs_to_sqlite`(dfs: Dict[str, pandas.DataFrame], engine: sqlalchemy.engine.Engine, check_foreign_keys: bool = True, check_types: bool = True, check_values: bool = True) → None	Load a dictionary of dataframes into the PUDL SQLite DB.
`df_to_parquet`(df: pandas.DataFrame, resource_id: str, root_path: Union[str, pathlib.Path], partition_cols: Union[List[str], Literal[None]] = None) → None	Write a PUDL table out to a partitioned Parquet dataset.

`logger`
`MINIMUM_SQLITE_VERSION`

pudl.load.dfs_to_sqlite(dfs: Dict[str, pandas.DataFrame], engine: sqlalchemy.engine.Engine, check_foreign_keys: bool = True, check_types: bool = True, check_values: bool = True) → None[source]

Load a dictionary of dataframes into the PUDL SQLite DB.

Parameters

pudl.load.df_to_parquet(df: pandas.DataFrame, resource_id: str, root_path: Union[str, pathlib.Path], partition_cols: Union[List[str], Literal[None]] = None) → None[source]

Write a PUDL table out to a partitioned Parquet dataset.

Uses the name of the table to look up appropriate metadata and construct a PyArrow schema.

Parameters

df – The tabular data to be written to a Parquet dataset.
resource_id – Name of the table that’s being written to Parquet.
root_path – Top level directory for the partitioned dataset.
partition_cols – Columns to use to partition the Parquet dataset. For EPA CEMS we use [“year”, “state”].