Skip to Content
DevelopmentReferenceModelsModels Glossary

Models Glossary

This document provides an overview of all available models in the ETL framework. Models are organized by their processing stage and chain specificity.

Models are located at etl_lib/models/ with subdirectories for each processing stage (raw, clean, processed) and chain-specific models in their respective chain directories.

Common Models

These models are available across all chains and represent the standard data entities in the ETL pipeline.

Raw Models

RawReservationModel

File: etl_lib/models/raw/RawReservationModel.py

Represents raw reservation data as ingested from source systems.

Database: GlueDatabases.RAW
Table: RawTables.RESERVATIONS


Cleaned Models

CleanGuestModel

File: etl_lib/models/clean/CleanGuestModel.py

Represents cleaned and standardized guest data.

Database: GlueDatabases.CLEAN
Table: CleanTables.GUESTS

CleanReservationModel

File: etl_lib/models/clean/CleanReservationModel.py

Represents cleaned and standardized reservation data.

Database: GlueDatabases.CLEAN
Table: CleanTables.RESERVATIONS

CleanRoomModel

Represents cleaned and standardized room data.

Database: GlueDatabases.CLEAN
Table: CleanTables.ROOMS


Processed Models

ProcessedGuestModel

File: etl_lib/models/processed/ProcessedGuestModel.py

Represents fully processed guest data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.GUESTS

ProcessedReservationModel

File: etl_lib/models/processed/ProcessedReservationModel.py

Represents fully processed reservation data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.RESERVATIONS

ProcessedRoomModel

File: etl_lib/models/processed/ProcessedRoomModel.py

Represents fully processed room data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ROOMS

Incremental / Added Models

ProcessedAddedGuestModel

File: etl_lib/models/processed/ProcessedAddedGuestModel.py

Represents newly processed guest records produced by incremental tasks. These are intended to be merged into ProcessedGuestModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_GUESTS

ProcessedAddedReservationModel

File: etl_lib/models/processed/ProcessedAddedReservationModel.py

Represents newly processed reservation records produced by incremental tasks. These are intended to be merged into ProcessedReservationModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_RESERVATIONS

ProcessedAddedRoomModel

File: etl_lib/models/processed/ProcessedAddedRoomModel.py

Represents newly processed room/stay records produced by incremental tasks. These are intended to be merged into ProcessedRoomModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_ROOMS


Chain-Specific Models

These models are specific to individual hotel chains and handle chain-specific data structures.

Athenaeum Chain

RawRevenueModel

File: etl_lib/models/chains/athenaeum/RawRevenueModel.py

Represents raw revenue data specific to the Athenaeum hotel chain.

Database: GlueDatabases.RAW
Table: RawTables.REVENUE


Opera Chain

RawRateModel

File: etl_lib/models/raw/RawRateModel.py

Represents raw rate data specific to the Opera hotel chain, as produced by the OperaRawTask.

Database: GlueDatabases.RAW
Table: RawTables.RATES


RawProfileModel

File: etl_lib/models/raw/RawProfileModel.py

Represents raw profile data specific to the Opera hotel chain, as produced by the OperaRawTask.

Database: GlueDatabases.RAW
Table: RawTables.PROFILES


Guestline (Raw Models)

Guestline Chain

RawRoompicksModel

File: etl_lib/models/raw/RawRoompicksModel.py

Represents raw room picks data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.

Database: GlueDatabases.RAW
Table: RawTables.ROOMPICKS


RawPersonprofilesModel

File: etl_lib/models/raw/RawPersonprofilesModel.py

Represents raw person profiles data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.

Database: GlueDatabases.RAW
Table: RawTables.PERSONPROFILES


Model Base Class

All models inherit from the Model base class, which provides common functionality:

Methods

  • get() -> DataFrame: Returns the DataFrame if loaded, otherwise reads it from the catalog.
  • write(): Writes the DataFrame to the catalog using the configured database and table.
  • read() -> DataFrame: Reads the DataFrame from the catalog, persists, and repartitions it.
  • persist(storage_level=None, partitions=None): Persists and repartitions the DataFrame with optional storage level and partition count.
  • set(df, persist=True, storage_level=None): Sets the DataFrame, optionally persisting it.
  • unpersist(): Unpersists the DataFrame from memory and clears the reference.

Constructor Parameters

  • job_context (required): The job context containing catalog and configuration.
  • catalog (optional): Catalog instance to use (defaults to job_context.catalog).
  • database (required): Database name or Enum.
  • table (required): Table name or Enum.
  • df (optional): Existing DataFrame to use instead of reading from catalog.
  • storage_level (optional): Storage level for DataFrame persistence (default: MEMORY_AND_DISK).
  • property_partition (optional): Whether to use property partitioning (default: False).
  • overwrite_partitions (optional): Whether to overwrite partitions on write (default: True).
  • partitions (optional): Number of partitions to use (default: from job context).
Last updated on