Skip to Content

Opera Cleaning

The OperaCleaner transforms raw Oracle Opera PMS data into our standardized format. Opera is used by the Cheval Collection properties and is one of the most widely used property management systems in the hospitality industry.

Overview

The Opera cleaning process has several unique characteristics:

  1. Nested JSON structure with complex field paths (e.g., roomstay.arrivaldate)
  2. Multiple ID lists requiring custom extraction functions
  3. Attached profiles for companies and travel agents
  4. Separate daily rates table with detailed revenue breakdowns
  5. Pseudo rooms (PM rooms) for forecasting and overbooking management

Data Sources

The cleaner reads from three tables in the etl_gp_raw_chevalcollection database:

  • reservations - Main reservation data with nested structures
  • guest_profiles - Additional guest information (nationality)
  • daily_rates - Daily revenue breakdown by date

Cleaning Workflow

Helper Functions

Opera uses custom utility functions to extract data from complex structures:

Extract Reservation ID

def get_res_id(col_array, type_substring): """Finds the reservation id from list""" return F.when( F.size( F.expr(f"filter({col_array}, x -> lower(x.type) like '%{type_substring.lower()}%')") ) > 0, F.expr(f"filter({col_array}, x -> lower(x.type) like '%{type_substring.lower()}%')")[0]["id"], ).otherwise(None)

Usage: Extracts the reservation ID from reservationidlist array where type contains “reservation”:

df = df.withColumn("reservation_id_og", get_res_id("reservationidlist", "reservation"))

Extract Profile Information

def get_profile(col_array, type_substring: str): """Finds the Travel Agent and Company""" filter_expr = f""" filter({col_array}, x -> lower(x.reservationProfileType) like '%{type_substring}%') """ return F.when( F.size(F.expr(filter_expr)) > 0, F.expr(f"element_at({filter_expr}, 1).name") ).otherwise(None)

Usage: Extracts company and travel agent names from attachedprofiles array:

df = df.withColumn("company", get_profile("attachedprofiles", "Company")) df = df.withColumn("travel_agent", get_profile("attachedprofiles", "TravelAgent"))

Deduplication Strategy

Opera uses a priority-based deduplication similar to Athenaeum:

status_priority = ( F.when(F.col("reservationstatus") == "CheckedOut", 3) .when(F.col("reservationstatus") == "CheckedIn", 2) .when(F.col("reservationstatus") == "Reserved", 1) .otherwise(0) ) window = Window.partitionBy("reservation_id_og").orderBy( F.to_date(F.col("lastmodifydatetime")).desc(), status_priority.desc(), )

Key: For duplicate reservation IDs, the most recent update with the highest status priority wins.

Field Mapping

Reservation Fields

Opera FieldStandard Schema FieldNotes
reservationidlist → extract by typereservation_id_ogExtracted from array using get_res_id()
reservationguest.idguest_id_ogPrimary guest ID
createdatetimebooking_date, created_timestampWhen reservation was created
lastmodifydatetimelast_modified_timestampLast update time
roomstay.arrivaldatecheck_in_dateCheck-in date
roomstay.departuredatecheck_out_dateCheck-out date
reservationstatusstatusReservation status
roomstay.bookingchannelcodechannelBooking channel
roomstay.sourcecodesourceSource code
roomstay.marketcodemarket_segmentMarket segment
roomstay.adultcountguests_adultsNumber of adults
roomstay.childcountguests_childrenNumber of children
roomstay.rateamount.currencycodecurrencyCurrency code
attachedprofiles → extract by typecompany, travel_agentParsed using get_profile()

Guest Fields (from nested reservationguest)

Opera FieldStandard Schema Field
reservationguest.givennamefirst_name
reservationguest.surnamelast_name
reservationguest.emailemail
reservationguest.birthdatebirth_date
reservationguest.phonenumberphone_number
reservationguest.address.streetaddressaddress_street
reservationguest.address.citynameaddress_city
reservationguest.address.postalcodeaddress_postcode
reservationguest.address.stateaddress_state
reservationguest.address.country.codeaddress_country
reservationguest.vip.vipCodemembership_type

Room Fields (from roomstay)

Opera FieldStandard Schema Field
roomstay.roomidroom_id
roomstay.roomtyperoom_type
roomstay.rateplancoderoom_rate_code
roomstay.roomclassroom_type_category
roomstay.roomtypechargedroom_type_charged
roomstay.pseudoRoomroom_pm (pseudo/PM room flag)

Status Mapping

Opera StatusStandard Status
Cancelledcancelled
CheckedOutchecked_out
NoShowno_show
CheckedInchecked_in
Reservedconfirmed
DueInconfirmed
DueOutconfirmed
InHousein_house

Daily Rates Processing

Opera provides detailed daily revenue in the daily_rates table:

Source Structure

{ "reservationid": "12345", "dailyrates": { "details": [ {"summarydate": "2025-01-15", "revenue": 250.00}, {"summarydate": "2025-01-16", "revenue": 250.00} ] } }

Processing Logic

# Explode nested daily rates df = df.withColumn("detail", F.explode("dailyrates.details")) df = df.select( F.col("reservationid").alias("reservation_id_og"), F.to_date("detail.summarydate").alias("room_stay_date"), F.col("detail.revenue").cast("double").alias("room_stay_date_rate_net"), F.col("synced_date"), )

Deduplication

For each reservation and date, keep only the most recent synced data:

window = Window.partitionBy("reservation_id_og", "room_stay_date").orderBy( F.col("synced_date").desc() ) df = ( df.withColumn("rn", F.row_number().over(window)) .filter("rn = 1") .drop("rn", "synced_date") )

Guest Profile Enrichment

Additional guest information (nationality) is loaded from guest_profiles:

df = self._job_context.catalog.read_table( database="etl_gp_raw_chevalcollection", table="guest_profiles", ) df = ( df.withColumn("guest_id_og", get_res_id("guestIdList", "Profile")) .withColumn("nationality", F.col("guestDetails.customer.nationality")) ) # Join with reservation guests guest_df = guest_df.join(profiles_df, on="guest_id_og", how="left")

Rooms Processing

Stay Date Expansion

Daily room records are generated by expanding the stay into individual dates:

df = df.withColumn( "room_stay_date", F.explode( F.when( # Same-day checkout F.to_date("check_in_date") == F.col("check_out_date"), F.array(F.to_date("check_in_date")) ).otherwise( # Multi-night stay F.sequence( F.to_date("check_in_date"), F.date_sub(F.to_date("check_out_date"), 1), ) ) ), )

Revenue Join

Daily rates are joined by reservation ID and stay date:

df = df.join( rates, on=["reservation_id_og", "room_stay_date"], how="left", )

F&B and Other Revenue

Opera provides aggregate F&B and other revenue at the reservation level:

df = df.withColumn( "room_stay_date_fnb_net", F.coalesce( F.col("revenuesandbalances.foodAndBevRevenue.amount").cast("double"), F.lit(0.0), ), ).withColumn( "room_stay_date_other_net", F.coalesce( F.col("revenuesandbalances.otherRevenue.amount").cast("double"), F.lit(0.0), ), )

Important: These totals are only assigned to the check-in date to avoid double-counting:

df = df.withColumn( "room_stay_date_fnb_net", F.when( F.col("room_stay_date") == F.to_date("room_check_in_date"), F.col("room_stay_date_fnb_net") ).otherwise(F.lit(0.0)) )

Pseudo Rooms (PM Rooms)

Opera supports pseudo rooms (PM rooms) for inventory management:

df = df.withColumn("room_pm", F.col("roomstay.pseudoRoom"))

Purpose: PM rooms are used for:

  • Forecasting and planning
  • Overbooking management
  • Group blocks without specific room assignments

Records with room_pm = true don’t have actual room assignments yet.

Data Validation

Standard cleaning utilities are applied:

df = clean_country(df, "address_country") df = clean_country(df, "nationality") df = clean_email(df, "email") df = clean_phone_number(df, "phone_number")

Output Tables

Clean Reservations

  • Database: etl_gp_clean_chevalcollection
  • Table: reservations
  • Key: reservation_id_og (extracted from Opera ID list)
  • Row Count: One per reservation

Clean Guests

  • Database: etl_gp_clean_chevalcollection
  • Table: guests
  • Key: guest_id (stable UUID)
  • Source: Combined from reservation guest data + guest profiles
  • Enrichment: Nationality from guest_profiles table

Clean Rooms

  • Database: etl_gp_clean_chevalcollection
  • Table: rooms
  • Key: Composite (res_id, room_stay_date)
  • Granularity: One row per reservation per day
  • Revenue: Room rates from daily_rates, F&B/Other from reservation totals

Configuration

Property mapping is applied via configuration:

df = self._map_property(df, "property_id")

Challenges & Solutions

ChallengeSolution
Complex nested JSON structuresUse dot notation to access nested fields
Multiple ID listsCustom get_res_id() function to extract by type
Attached profiles arrayCustom get_profile() function to parse by profile type
Daily rates in separate tableJoin by reservation ID and stay date
F&B/Other totals not dailyAssign entire total to check-in date only
Pseudo roomsPreserve room_pm flag for identification

Opera-Specific Features

ID Lists

Opera stores multiple IDs (confirmation numbers, external IDs, etc.) in arrays. The cleaner extracts the primary reservation ID using type filters.

Attached Profiles

Companies and travel agents are stored in an attachedprofiles array rather than dedicated fields. The cleaner parses these using profile type filters.

Revenue Structure

Opera provides both:

  • Daily room rates (in separate table) - Detailed daily breakdown
  • Aggregate F&B/Other (in reservation) - Totals only

Room Type vs Charged Type

Opera tracks both the original room_type and the room_type_charged when a guest is assigned a different room category.

VIP Status

Opera’s VIP codes are preserved in the membership_type field.

See Also

Last updated on