Tasks Glossary
Tasks are the fundamental building blocks of the ETL pipeline. They encapsulate specific data processing operations and can be composed together to build complex workflows.
Core Task Classes
Task
The base abstract class that all tasks inherit from. Defines the task lifecycle, input/output models, and execution flow.
TaskGroup
A container for grouping multiple tasks together for organized execution.
Common Processing Tasks
Tasks used across all hotel chains for standard data processing operations.
GuestLoyaltyTask
Computes guest loyalty metrics including stays, lifetime value, booking patterns, and loyalty categories.
GuestMatchingCheckTask
Ensures all guests have a cluster ID for downstream processing by copying guest_id if cluster_id is missing.
GuestMatchingTask
Matches and deduplicates guests using the Splink library for record linkage.
ProcessingTask
Orchestrator task that runs all common processing subtasks in sequence.
ReplaceValuesTask
Replaces values in DataFrames based on configured mappings from job configuration.
ReservationMetricsTask
Calculates key metrics for reservations including cancellation window, booking window, and stay nights.
RoomMetricsTask
Calculates daily stay metrics per room including stay-day/night flags, length of stay, and revenue aggregations.
Reporting Tasks
Tasks that generate final reports for consumption by end users.
BookingRoomsReportTask
Aggregates daily room data to the reservation level, summing all charges per booking.
DailyReportTask
Produces one row per stay date (per booking), showing detailed daily room activity.
GuestLoyaltyReportTask
Generates a guest loyalty report with KPIs and loyalty flags based on booking and stay history.
PickupReportTask
Generates pickup & pace report comparing current bookings with historical snapshots.
ReportsTask
Orchestrator task that runs all report generation subtasks.
Chain-Specific Tasks
Tasks that are specific to individual hotel chains.
Athenaeum
AthenaeumLegacyMergeTask
Merges legacy Athenaeum data (pre-2025) into processed DataFrames.
AthenaeumRawTask
Orchestrates both ingestion and crawling for Athenaeum data.
CleanAthenaeumTask
Cleans and transforms raw Athenaeum data into standardized guest, reservation, and room models.
Opera (Cheval Collection)
OperaRawTask
Orchestrates both ingestion and crawling for Opera data (Cheval Collection).
OperaIngesterTask
Downloads raw files for Opera.
OperaCrawlerTask
Parses raw Opera payloads and produces raw models used by the cleaner.
CleanOperaTask
Cleans and transforms Opera raw data into standardized guest, reservation, and room models.
OperaLegacyMergeTask
Merges legacy Opera data from S3 into processed DataFrames.
CMTPFixTask
Applies a targeted workaround for CTMP cancellations at Cheval Maison The Palm Dubai.