ProcessingTask
The ProcessingTask is an orchestrator task that runs all common data processing operations in sequence. It coordinates subtasks that calculate metrics, match guests, and prepare data for reporting.
Overview
ProcessingTask serves as the main entry point for the processing stage of the ETL pipeline. It:
- Orchestrates 6 core processing subtasks
- Transforms cleaned data into processed models
- Calculates reservation and room metrics
- Performs guest matching and deduplication
- Computes guest loyalty metrics
- Applies value replacements from configuration
Flow Diagram
Data Flow
Subtasks
ProcessingTask runs the following subtasks in order:
- ReservationMetricsTask - Calculates booking window, cancellation window, and stay nights
- RoomMetricsTask - Computes daily room metrics and stay flags
- GuestMatchingTask - Performs guest deduplication using Splink
- GuestMatchingCheckTask - Ensures all guests have cluster IDs
- GuestLoyaltyTask - Calculates loyalty metrics and flags
- ReplaceValuesTask - Applies configured value replacements
Constructor Options
subtasks— Optional custom list ofTaskobjects. If none provided, the default sequence (ReservationMetrics -> RoomMetrics -> GuestMatching -> GuestMatchingCheck -> GuestLoyalty -> ReplaceValues) is used.skip_subtasks— List of names of subtasks to skip. Useful for testing and incremental workflows.is_incremental— Can be set explicitly, otherwise taken fromjob_context.is_incremental.write_to_catalog— Controls whether the final outputs are written to catalog (default True).
Incremental Behavior
- When
is_incremental = True, subtasks process added/changed records where supported and produceProcessedAdded*outputs. The parentProcessingTaskgathers outputs and will only writeProcessedAdded*models if they contain data. - The orchestrator will also only write the main
Processed*models if those DataFrames contain rows.
Models
Requires
CleanGuestModel- Cleaned guest dataCleanReservationModel- Cleaned reservation dataCleanRoomModel- Cleaned room stay data
Provides
ProcessedGuestModel- Processed guest data with loyalty metricsProcessedReservationModel- Processed reservations with calculated metricsProcessedRoomModel- Processed room data with daily metrics
Provides (incremental)
ProcessedAddedGuestModel- Newly added guest records (incremental)ProcessedAddedReservationModel- Newly added reservations (incremental)ProcessedAddedRoomModel- Newly added rooms (incremental)
Related Tasks
- ReservationMetricsTask - First subtask
- GuestMatchingTask - Most complex subtask
- GuestLoyaltyTask - Computes final metrics
- Task - Base task class
Related Documentation
Last updated on