Models Glossary

This document provides an overview of all available models in the ETL framework. Models are organized by their processing stage and chain specificity.

Models are located at etl_lib/models/ with subdirectories for each processing stage (raw, clean, processed) and chain-specific models in their respective chain directories.

Common Models

These models are available across all chains and represent the standard data entities in the ETL pipeline.

Raw Models

`RawReservationModel`

File: etl_lib/models/raw/RawReservationModel.py

Represents raw reservation data as ingested from source systems.

Database: GlueDatabases.RAW
Table: RawTables.RESERVATIONS

Cleaned Models

`CleanGuestModel`

File: etl_lib/models/clean/CleanGuestModel.py

Represents cleaned and standardized guest data.

Database: GlueDatabases.CLEAN
Table: CleanTables.GUESTS

`CleanReservationModel`

File: etl_lib/models/clean/CleanReservationModel.py

Represents cleaned and standardized reservation data.

Database: GlueDatabases.CLEAN
Table: CleanTables.RESERVATIONS

`CleanRoomModel`

Represents cleaned and standardized room data.

Database: GlueDatabases.CLEAN
Table: CleanTables.ROOMS

Processed Models

`ProcessedGuestModel`

File: etl_lib/models/processed/ProcessedGuestModel.py

Represents fully processed guest data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.GUESTS

`ProcessedReservationModel`

File: etl_lib/models/processed/ProcessedReservationModel.py

Represents fully processed reservation data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.RESERVATIONS

`ProcessedRoomModel`

File: etl_lib/models/processed/ProcessedRoomModel.py

Represents fully processed room data ready for analytics.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ROOMS

Incremental / Added Models

`ProcessedAddedGuestModel`

File: etl_lib/models/processed/ProcessedAddedGuestModel.py

Represents newly processed guest records produced by incremental tasks. These are intended to be merged into ProcessedGuestModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_GUESTS

`ProcessedAddedReservationModel`

File: etl_lib/models/processed/ProcessedAddedReservationModel.py

Represents newly processed reservation records produced by incremental tasks. These are intended to be merged into ProcessedReservationModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_RESERVATIONS

`ProcessedAddedRoomModel`

File: etl_lib/models/processed/ProcessedAddedRoomModel.py

Represents newly processed room/stay records produced by incremental tasks. These are intended to be merged into ProcessedRoomModel.

Database: GlueDatabases.PROCESSED
Table: ProcessedTables.ADDED_ROOMS

Chain-Specific Models

These models are specific to individual hotel chains and handle chain-specific data structures.

Athenaeum Chain

`RawRevenueModel`

File: etl_lib/models/chains/athenaeum/RawRevenueModel.py

Represents raw revenue data specific to the Athenaeum hotel chain.

Database: GlueDatabases.RAW
Table: RawTables.REVENUE

Opera Chain

`RawRateModel`

File: etl_lib/models/raw/RawRateModel.py

Represents raw rate data specific to the Opera hotel chain, as produced by the OperaRawTask.

Database: GlueDatabases.RAW
Table: RawTables.RATES

Dev Reference Links

RawRateModel

`RawProfileModel`

File: etl_lib/models/raw/RawProfileModel.py

Represents raw profile data specific to the Opera hotel chain, as produced by the OperaRawTask.

Database: GlueDatabases.RAW
Table: RawTables.PROFILES

Dev Reference Links

RawProfileModel

Guestline (Raw Models)

Guestline Chain

`RawRoompicksModel`

File: etl_lib/models/raw/RawRoompicksModel.py

Represents raw room picks data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.

Database: GlueDatabases.RAW
Table: RawTables.ROOMPICKS

`RawPersonprofilesModel`

File: etl_lib/models/raw/RawPersonprofilesModel.py

Represents raw person profiles data specific to the Guestline hotel chain, as produced by the GuestlineRawTask.

Database: GlueDatabases.RAW
Table: RawTables.PERSONPROFILES

Model Base Class

All models inherit from the Model base class, which provides common functionality:

Methods

get() -> DataFrame: Returns the DataFrame if loaded, otherwise reads it from the catalog.
write(): Writes the DataFrame to the catalog using the configured database and table.
read() -> DataFrame: Reads the DataFrame from the catalog, persists, and repartitions it.
persist(storage_level=None, partitions=None): Persists and repartitions the DataFrame with optional storage level and partition count.
set(df, persist=True, storage_level=None): Sets the DataFrame, optionally persisting it.
unpersist(): Unpersists the DataFrame from memory and clears the reference.

Constructor Parameters

job_context (required): The job context containing catalog and configuration.
catalog (optional): Catalog instance to use (defaults to job_context.catalog).
database (required): Database name or Enum.
table (required): Table name or Enum.
df (optional): Existing DataFrame to use instead of reading from catalog.
storage_level (optional): Storage level for DataFrame persistence (default: MEMORY_AND_DISK).
property_partition (optional): Whether to use property partitioning (default: False).
overwrite_partitions (optional): Whether to overwrite partitions on write (default: True).
partitions (optional): Number of partitions to use (default: from job context).