Opera Legacy Merge Task
The OperaLegacyMergeTask is a Cheval Collection specific processing task that merges legacy Opera data stored in S3 into the current Processed* models. It is implemented to avoid reprocessing and to preserve historical information originating from a legacy system.
Purpose
- Load legacy Opera tables from S3 and split them into
legacy,overlap, andfuturepartitions using a per-propertycutoff_date. - Align legacy schema to the current processed schema before unioning.
- Deduplicate and merge
ProcessedReservationModel,ProcessedRoomModel, andProcessedGuestModel.
How it works
- Load reservation ID mapping from S3 (CSV at
s3://etl-gp-processed/legacy_cutoff/chevalcollection/), join mapping to the processed reservations to mapres_id_ogvalues. - Load the legacy dataset from S3 and join with a generated
cutoff_dateper property (from the job configuration). - Split the legacy data into:
legacy— older-than-cutoff rows (completely historical)overlap— historical rows that overlap thecutoff_date(must be merged carefully)future— future-facing rows (forward-dated stays that may conflict with current reservations)
- Align the legacy DataFrame to the current schema via a mapping and
cast/selectstrategy. - Merge
ProcessedReservationModelby unioning thecurrent,legacy,overlapandfuturepartitions, applying custom cutoff merge logic to avoid duplication. - Merge
ProcessedRoomModelby aligningroom_stay_date, deduplicating by(res_id, room_stay_date), and computing stay day/night flags. - Merge
ProcessedGuestModelby projecting guest fields and unioning legacy guest rows while dropping duplicates.
Implementation Details
- Uses a per-property
cutoff_dateconfig to classify legacy records. - Uses a CSV mapping with
res_id_ogto map OC→RMS reservation IDs. - Applies
room_stay_date_is_stay_dayandroom_stay_date_is_stay_nightexpressions to preserve historical check-in/out semantics. - Uses Spark
unionByName(..., allowMissingColumns=True)to align columns and preserve backward compatibility.
Models Required
CleanGuestModelProcessedReservationModelProcessedRoomModel
Models Provided
ProcessedGuestModelProcessedReservationModelProcessedRoomModel
Related Documentation
- ProcessingTask
- CleanOperaTask
- CMTPFixTask
jobs/chevalcollection_task.py(example usage)
Last updated on