pudl.etl.check_foreign_keys#

Check that foreign key constraints in the PUDL database are respected.

Module Contents#

Functions#

pudl_check_fks(logfile, loglevel, db_path)

Check that foreign key constraints in the PUDL database are respected.

_get_fk_list(→ pandas.DataFrame)

Retrieve a dataframe of foreign keys for a table.

check_foreign_keys(engine)

Check foreign key relationships in the database.

Attributes#

pudl.etl.check_foreign_keys.logger[source]#
pudl.etl.check_foreign_keys.pudl_check_fks(logfile: pathlib.Path, loglevel: str, db_path: pathlib.Path)[source]#

Check that foreign key constraints in the PUDL database are respected.

Dagster manages the dependencies between various assets in our ETL pipeline, attempting to materialize tables only after their upstream dependencies have been satisfied. However, this order is non deterministic because they are executed in parallel, and doesn’t necessarily correspond to the foreign-key constraints within the database, so durint the ETL we disable foreign key constraints within pudl.sqlite.

However, we still expect foreign key constraints to be satisfied once all of the tables have been loaded, so we check that they are valid after the ETL has completed. This script runs the same check.

exception pudl.etl.check_foreign_keys.ForeignKeyError(child_table: str, parent_table: str, foreign_key: str, rowids: list[int])[source]#

Bases: sqlalchemy.exc.SQLAlchemyError

Raised when data in a database violates a foreign key constraint.

__str__()[source]#

Create string representation of ForeignKeyError object.

__eq__(other)[source]#

Compare a ForeignKeyError with another object.

exception pudl.etl.check_foreign_keys.ForeignKeyErrors(fk_errors: list[ForeignKeyError])[source]#

Bases: sqlalchemy.exc.SQLAlchemyError

Raised when data in a database violate multiple foreign key constraints.

__str__()[source]#

Create string representation of ForeignKeyErrors object.

__iter__()[source]#

Iterate over the fk errors.

__getitem__(idx)[source]#

Index the fk errors.

pudl.etl.check_foreign_keys._get_fk_list(engine: sqlalchemy.Engine, table: str) pandas.DataFrame[source]#

Retrieve a dataframe of foreign keys for a table.

Description from the SQLite Docs: ‘This pragma returns one row for each foreign key constraint created by a REFERENCES clause in the CREATE TABLE statement of table “table-name”.’

The PRAGMA returns one row for each field in a foreign key constraint. This method collapses foreign keys with multiple fields into one record for readability.

pudl.etl.check_foreign_keys.check_foreign_keys(engine: sqlalchemy.Engine)[source]#

Check foreign key relationships in the database.

The order assets are loaded into the database will not satisfy foreign key constraints so we can’t enable foreign key constraints. However, we can check for foreign key failures once all of the data has been loaded into the database using the foreign_key_check and foreign_key_list PRAGMAs.

You can learn more about the PRAGMAs in the SQLite docs.

Raises:

ForeignKeyErrors – if data in the database violate foreign key constraints.