AthenaeumLegacyMergeTask
The AthenaeumLegacyMergeTask merges historical legacy data (pre-April 2025) from Athenaeum into the processed DataFrames, ensuring a complete historical view of guest and reservation data.
Overview
This task bridges the gap between old and new data systems by:
- Loading legacy table data (pre-2025-04-01)
- Aligning legacy schema to processed schema
- Applying column mappings for renamed fields
- Merging legacy data with current processed data
- Handling stay flags for legacy rooms
- Preserving data lineage with source_system markers
Flow Diagram
Data Merge Strategy
Column Mappings
The task applies these column mappings to align legacy data:
| Legacy Column | Current Column |
|---|---|
| athenaeum_room_stay_date_occupied | room_stay_date_is_stay_day |
| athenaeum_agent_name | travel_agent |
| athenaeum_company_name | company |
| channel | source |
| source | secondary_source |
Models
Requires
ProcessedGuestModel- Current processed guest dataProcessedReservationModel- Current processed reservation dataProcessedRoomModel- Current processed room data
Provides
ProcessedGuestModel- Merged guest data (current + legacy)ProcessedReservationModel- Merged reservation data (current + legacy)ProcessedRoomModel- Merged room data (current + legacy)
Date Cutoff
Stay-day and stay-night flag logic.
Legacy Date Filter: room_stay_date < 2025-04-01
Implementation Details
Schema Alignment
The task aligns legacy data to match processed schemas.
Stay Flag Handling
Legacy rooms need special stay-day/stay-night flag logic:
Source System Marking
All DataFrames are marked with their source:
This enables downstream tracking of data origin.
Data Deduplication
The task ensures no duplicates across the cutoff date:
- Legacy data: Filtered to
< 2025-04-01 - Current data: Naturally
>= 2025-04-01from CleanAthenaeumTask - Result: No overlap, no duplicates
Each model type is deduplicated on its natural key:
- Guests:
guest_id - Reservations:
res_id - Rooms:
res_id+room_stay_date
Related Tasks
- CleanAthenaeumTask - Produces current data
- ProcessingTask - Runs after merge
- Task - Base class
Best Practices
- Run after ProcessingTask to merge into processed models
- Verify cutoff date matches system migration date
- Monitor union sizes to ensure no unexpected data volume
- Test column mappings when legacy schema changes
- Track source_system for data lineage
- Document legacy business rules for stay flags and other logic
Last updated on