pudl.extract.parquet
¶
Extractor for Parquet data.
Module Contents¶
Classes¶
Class for extracting dataframes from parquet files. |
Attributes¶
- class pudl.extract.parquet.ParquetExtractor(ds)[source]¶
Bases:
pudl.extract.extractor.GenericExtractor
Class for extracting dataframes from parquet files.
The extraction logic is invoked by calling extract() method of this class.
- source_filename(page: str, **partition: pudl.extract.extractor.PartitionSelection) str [source]¶
Produce the source Parquet file name as it will appear in the archive.
- Parameters:
page – pudl name for the dataset contents, eg “boiler_generator_assn” or “data”
partition – partition to load. Examples: {‘year’: 2009}
- Returns:
string name of the parquet file
- load_source(page: str, **partition: pudl.extract.extractor.PartitionSelection) pandas.DataFrame [source]¶
Produce the dataframe object for the given partition.
This method assumes that the archive includes one unzipped file per partition.
- Parameters:
page – pudl name for the dataset contents, eg “boiler_generator_assn” or “data”
partition – partition to load. Examples: {‘year’: 2009} {‘year_month’: ‘2020-08’}
- Returns:
pd.DataFrame instance containing CSV data