OperaLegacyMergeTask — Developer Reference
Developer reference for OperaLegacyMergeTask, the Opera chain-specific task that merges legacy data into the processed models. This page lists the constructor, core methods, and merging strategy.
Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
name | Optional[str] | Optional task name |
job_context | JobContext | Job context with Spark, catalog and config |
Provides / Requires
requires()—[CleanGuestModel, ProcessedReservationModel, ProcessedRoomModel]provides()—[ProcessedGuestModel, ProcessedReservationModel, ProcessedRoomModel]
Key Methods & Flow
_get_mapping(res_df)— Reads CSV mapping froms3://etl-gp-processed/legacy_cutoff/chevalcollection/and joins it to processed reservations to mapres_id_ogcolumns._load_legacy_table()— Loads legacy data from S3 path and splits intolegacy,overlap, andfuturebased oncutoff_dateper property. Uses the job’spropertiesconfig._align_legacy_to_schema(legacy_df, target_schema, passthrough=None)— Renames and casts legacy columns to the targetprocessedschema and fills missing fields._filter_non_overlaps(df, *exclude_dfs)— Exclude records that exist in the current dataset to avoid duplication._merge_res_df(res_df)— Merge reservations withlegacy,overlap, andfuturepartitions using union-by-name, cutoff merging, and deduplication onres_id._merge_rooms_df(rooms_df, res_df)— Merge room rows byres_id/room_stay_date, compute stay day/night flags, and handle overlapping rows._merge_guest_df(guest_df)— Merge guest rows byguest_id, unioning legacy values and deduplicating.
Implementation Notes
- Uses
map_propertyto map per-property cutoffs from job config to the loaded mapping dataframe. - Performs
left_antijoins to exclude legacy records that have current equivalents when necessary. - Uses
unionByName(..., allowMissingColumns=True)to preserve backward compatibility when merging older legacy columns.
Back to process documentation: [/processes/tasks/chains/opera/opera-legacy-merge-task](/processes/tasks/chains/opera/opera-legacy-merge-task)
Last updated on