pudl.extract.sec10k¶
Load pre-processed SEC 10-K assets from Google Cloud Storage.
These “raw” tables are generated by the SEC 10-K data extraction pipeline which can be found in this repository: https://github.com/catalyst-cooperative/mozilla-sec-eia
Upstream data is not partitioned by year, but we want to be able to extract a subset of
the data for testing, so the Sec10kSettings
allow specification of which years
to extract, and those are used to filter the extracted data before returning it.
Attributes¶
Functions¶
|
Extract SEC 10-K data from the datastore. |
|
An asset factory for extracting SEC 10-K data by table. |
Module Contents¶
- pudl.extract.sec10k.extract(ds: pudl.workspace.datastore.Datastore, table: str, years: list[int]) pandas.DataFrame [source]¶
Extract SEC 10-K data from the datastore.
Allows filtering by year to enable testing of the pipeline with a smaller amount of data, like a pseudo-partition. This is necessary because the SEC 10-K data is not partitioned uppstream.
- Parameters:
ds – Initialized PUDL datastore.
table – Which of the valid tables should be extracted?
years – Which years of data to include in the output.
- Returns:
A dataframe containing the SEC 10-K data.
- pudl.extract.sec10k.raw_sec10k_asset_factory(table) dagster.AssetsDefinition [source]¶
An asset factory for extracting SEC 10-K data by table.