Models Glossary
This document provides an overview of all available models in the ETL framework. Models are organized by their processing stage and chain specificity.
Models are located at etl_lib/models/ with subdirectories for each processing stage (raw, clean, processed) and chain-specific models in their respective chain directories.
Common Models
These models are available across all chains and represent the standard data entities in the ETL pipeline.
Raw Models
RawReservationModel
File: etl_lib/models/raw/RawReservationModel.py
Represents raw reservation data as ingested from source systems.
Database: GlueDatabases.RAW
Table: RawTables.RESERVATIONS
Cleaned Models
CleanGuestModel
File: etl_lib/models/clean/CleanGuestModel.py
Represents cleaned and standardized guest data.
Database: GlueDatabases.CLEAN
Table: CleanTables.GUESTS
CleanReservationModel
File: etl_lib/models/clean/CleanReservationModel.py
Represents cleaned and standardized reservation data.
Database: GlueDatabases.CLEAN
Table: CleanTables.RESERVATIONS
CleanRoomModel
Represents cleaned and standardized room data.
Database: GlueDatabases.CLEAN
Table: CleanTables.ROOMS
Processed Models
ProcessedGuestModel
File: etl_lib/models/processed/ProcessedGuestModel.py
Represents fully processed guest data ready for analytics.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.GUESTS
ProcessedReservationModel
File: etl_lib/models/processed/ProcessedReservationModel.py
Represents fully processed reservation data ready for analytics.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.RESERVATIONS
ProcessedRoomModel
File: etl_lib/models/processed/ProcessedRoomModel.py
Represents fully processed room data ready for analytics.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ROOMS
Incremental / Added Models
ProcessedAddedGuestModel
File: etl_lib/models/processed/ProcessedAddedGuestModel.py
Represents newly processed guest records produced by incremental tasks. These are intended to be merged into ProcessedGuestModel.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_GUESTS
ProcessedAddedReservationModel
File: etl_lib/models/processed/ProcessedAddedReservationModel.py
Represents newly processed reservation records produced by incremental tasks. These are intended to be merged into ProcessedReservationModel.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_RESERVATIONS
ProcessedAddedRoomModel
File: etl_lib/models/processed/ProcessedAddedRoomModel.py
Represents newly processed room/stay records produced by incremental tasks. These are intended to be merged into ProcessedRoomModel.
Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_ROOMS
Chain-Specific Models
These models are specific to individual hotel chains and handle chain-specific data structures.
Athenaeum Chain
RawRevenueModel
File: etl_lib/models/chains/athenaeum/RawRevenueModel.py
Represents raw revenue data specific to the Athenaeum hotel chain.
Database: GlueDatabases.RAW
Table: RawTables.REVENUE
Opera Chain
RawRateModel
File: etl_lib/models/raw/RawRateModel.py
Represents raw rate data specific to the Opera hotel chain, as produced by the OperaRawTask.
Database: GlueDatabases.RAW
Table: RawTables.RATES
Dev Reference Links
RawProfileModel
File: etl_lib/models/raw/RawProfileModel.py
Represents raw profile data specific to the Opera hotel chain, as produced by the OperaRawTask.
Database: GlueDatabases.RAW
Table: RawTables.PROFILES
Dev Reference Links
Guestline (Raw Models)
Guestline Chain
RawRoompicksModel
File: etl_lib/models/raw/RawRoompicksModel.py
Represents raw room picks data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.
Database: GlueDatabases.RAW
Table: RawTables.ROOMPICKS
RawPersonprofilesModel
File: etl_lib/models/raw/RawPersonprofilesModel.py
Represents raw person profiles data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.
Database: GlueDatabases.RAW
Table: RawTables.PERSONPROFILES
Model Base Class
All models inherit from the Model base class, which provides common functionality:
Methods
get() -> DataFrame: Returns the DataFrame if loaded, otherwise reads it from the catalog.write(): Writes the DataFrame to the catalog using the configured database and table.read() -> DataFrame: Reads the DataFrame from the catalog, persists, and repartitions it.persist(storage_level=None, partitions=None): Persists and repartitions the DataFrame with optional storage level and partition count.set(df, persist=True, storage_level=None): Sets the DataFrame, optionally persisting it.unpersist(): Unpersists the DataFrame from memory and clears the reference.
Constructor Parameters
job_context(required): The job context containing catalog and configuration.catalog(optional): Catalog instance to use (defaults tojob_context.catalog).database(required): Database name or Enum.table(required): Table name or Enum.df(optional): Existing DataFrame to use instead of reading from catalog.storage_level(optional): Storage level for DataFrame persistence (default: MEMORY_AND_DISK).property_partition(optional): Whether to use property partitioning (default: False).overwrite_partitions(optional): Whether to overwrite partitions on write (default: True).partitions(optional): Number of partitions to use (default: from job context).