Skip to Content
DevelopmentReferenceTasksChainsOperaOperaLegacyMergeTask — Developer Reference

OperaLegacyMergeTask — Developer Reference

Developer reference for OperaLegacyMergeTask, the Opera chain-specific task that merges legacy data into the processed models. This page lists the constructor, core methods, and merging strategy.

Constructor Parameters

ParameterTypeDescription
nameOptional[str]Optional task name
job_contextJobContextJob context with Spark, catalog and config

Provides / Requires

  • requires()[CleanGuestModel, ProcessedReservationModel, ProcessedRoomModel]
  • provides()[ProcessedGuestModel, ProcessedReservationModel, ProcessedRoomModel]

Key Methods & Flow

  • _get_mapping(res_df) — Reads CSV mapping from s3://etl-gp-processed/legacy_cutoff/chevalcollection/ and joins it to processed reservations to map res_id_og columns.
  • _load_legacy_table() — Loads legacy data from S3 path and splits into legacy, overlap, and future based on cutoff_date per property. Uses the job’s properties config.
  • _align_legacy_to_schema(legacy_df, target_schema, passthrough=None) — Renames and casts legacy columns to the target processed schema and fills missing fields.
  • _filter_non_overlaps(df, *exclude_dfs) — Exclude records that exist in the current dataset to avoid duplication.
  • _merge_res_df(res_df) — Merge reservations with legacy, overlap, and future partitions using union-by-name, cutoff merging, and deduplication on res_id.
  • _merge_rooms_df(rooms_df, res_df) — Merge room rows by res_id/room_stay_date, compute stay day/night flags, and handle overlapping rows.
  • _merge_guest_df(guest_df) — Merge guest rows by guest_id, unioning legacy values and deduplicating.

Implementation Notes

  • Uses map_property to map per-property cutoffs from job config to the loaded mapping dataframe.
  • Performs left_anti joins to exclude legacy records that have current equivalents when necessary.
  • Uses unionByName(..., allowMissingColumns=True) to preserve backward compatibility when merging older legacy columns.

Back to process documentation: [/processes/tasks/chains/opera/opera-legacy-merge-task](/processes/tasks/chains/opera/opera-legacy-merge-task)

Last updated on