Skip to Content
ProcessesTasksChainsAthenaeumAthenaeumLegacyMergeTask

AthenaeumLegacyMergeTask

The AthenaeumLegacyMergeTask merges historical legacy data (pre-April 2025) from Athenaeum into the processed DataFrames, ensuring a complete historical view of guest and reservation data.

Overview

This task bridges the gap between old and new data systems by:

  • Loading legacy table data (pre-2025-04-01)
  • Aligning legacy schema to processed schema
  • Applying column mappings for renamed fields
  • Merging legacy data with current processed data
  • Handling stay flags for legacy rooms
  • Preserving data lineage with source_system markers

Flow Diagram

Data Merge Strategy

Column Mappings

The task applies these column mappings to align legacy data:

Legacy ColumnCurrent Column
athenaeum_room_stay_date_occupiedroom_stay_date_is_stay_day
athenaeum_agent_nametravel_agent
athenaeum_company_namecompany
channelsource
sourcesecondary_source

Models

Requires

  • ProcessedGuestModel - Current processed guest data
  • ProcessedReservationModel - Current processed reservation data
  • ProcessedRoomModel - Current processed room data

Provides

  • ProcessedGuestModel - Merged guest data (current + legacy)
  • ProcessedReservationModel - Merged reservation data (current + legacy)
  • ProcessedRoomModel - Merged room data (current + legacy)

Date Cutoff

Stay-day and stay-night flag logic. Legacy Date Filter: room_stay_date < 2025-04-01

Implementation Details

Schema Alignment

The task aligns legacy data to match processed schemas.

Stay Flag Handling

Legacy rooms need special stay-day/stay-night flag logic:

Source System Marking

All DataFrames are marked with their source:

This enables downstream tracking of data origin.

Data Deduplication

The task ensures no duplicates across the cutoff date:

  1. Legacy data: Filtered to < 2025-04-01
  2. Current data: Naturally >= 2025-04-01 from CleanAthenaeumTask
  3. Result: No overlap, no duplicates

Each model type is deduplicated on its natural key:

  • Guests: guest_id
  • Reservations: res_id
  • Rooms: res_id + room_stay_date

Best Practices

  1. Run after ProcessingTask to merge into processed models
  2. Verify cutoff date matches system migration date
  3. Monitor union sizes to ensure no unexpected data volume
  4. Test column mappings when legacy schema changes
  5. Track source_system for data lineage
  6. Document legacy business rules for stay flags and other logic
Last updated on