Skip to Content
ProcessesTasksChainsOperaOpera Legacy Merge Task

Opera Legacy Merge Task

The OperaLegacyMergeTask is a Cheval Collection specific processing task that merges legacy Opera data stored in S3 into the current Processed* models. It is implemented to avoid reprocessing and to preserve historical information originating from a legacy system.

Purpose

  • Load legacy Opera tables from S3 and split them into legacy, overlap, and future partitions using a per-property cutoff_date.
  • Align legacy schema to the current processed schema before unioning.
  • Deduplicate and merge ProcessedReservationModel, ProcessedRoomModel, and ProcessedGuestModel.

How it works

  1. Load reservation ID mapping from S3 (CSV at s3://etl-gp-processed/legacy_cutoff/chevalcollection/), join mapping to the processed reservations to map res_id_og values.
  2. Load the legacy dataset from S3 and join with a generated cutoff_date per property (from the job configuration).
  3. Split the legacy data into:
    • legacy — older-than-cutoff rows (completely historical)
    • overlap — historical rows that overlap the cutoff_date (must be merged carefully)
    • future — future-facing rows (forward-dated stays that may conflict with current reservations)
  4. Align the legacy DataFrame to the current schema via a mapping and cast/select strategy.
  5. Merge ProcessedReservationModel by unioning the current, legacy, overlap and future partitions, applying custom cutoff merge logic to avoid duplication.
  6. Merge ProcessedRoomModel by aligning room_stay_date, deduplicating by (res_id, room_stay_date), and computing stay day/night flags.
  7. Merge ProcessedGuestModel by projecting guest fields and unioning legacy guest rows while dropping duplicates.

Implementation Details

  • Uses a per-property cutoff_date config to classify legacy records.
  • Uses a CSV mapping with res_id_og to map OC→RMS reservation IDs.
  • Applies room_stay_date_is_stay_day and room_stay_date_is_stay_night expressions to preserve historical check-in/out semantics.
  • Uses Spark unionByName(..., allowMissingColumns=True) to align columns and preserve backward compatibility.

Models Required

  • CleanGuestModel
  • ProcessedReservationModel
  • ProcessedRoomModel

Models Provided

  • ProcessedGuestModel
  • ProcessedReservationModel
  • ProcessedRoomModel
Last updated on