Skip to Content

Guestline Cleaning

The GuestlineCleaner transforms raw Guestline PMS data into our standardized format. Guestline is used by the Queensway properties and provides a comprehensive web-based property management system.

Overview

The Guestline cleaning process has several distinctive features:

  1. Nested data structure with deep field paths (e.g., data.roompick.ratelines)
  2. Ratelines - Complex rate structures with multiple rate periods per reservation
  3. Multi-level revenue - Revenue items nested within ratelines
  4. Aggregate-based date calculation - Check-in/out dates computed from ratelines
  5. Separate person profiles table for guest enrichment

Data Sources

The cleaner reads from two tables in the etl_gp_raw_queensway database:

  • roompicks - Main reservation data with nested ratelines and revenue
  • personprofiles - Guest profiles with contact information

Cleaning Workflow

Deduplication Strategy

Guestline uses a priority-based deduplication similar to other systems:

status_priority = ( F.when(F.col("data.roompick.statuscode") == "CHKO", 3) .when(F.col("data.roompick.statuscode") == "CheckedIn", 2) .when(F.col("data.roompick.statuscode") == "RES", 1) .when(F.col("data.roompick.statuscode") == "Confirmed", 1) .otherwise(0) ) window = Window.partitionBy("data.entityid").orderBy( F.to_timestamp("data.roompick.lasteditedtimestamp").desc(), status_priority.desc(), )

Key: For duplicate entity IDs, the most recently edited record with the highest status priority wins.

Field Mapping

Reservation Fields

Guestline FieldStandard Schema FieldNotes
data.entityidreservation_id_ogUnique booking identifier
data.roompick.statuscodestatusReservation status
data.roompick.mediasourcesourceBooking source
data.roompick.channeldescriptionchannelBooking channel
data.roompick.ratelines.marketsegment[0]market_segmentMarket segment
data.roompick.ratelines.guests.adults[0]guests_adultsNumber of adults
data.roompick.ratelines.guests.children[0]guests_childrenNumber of children
data.roompick.creationdatebooking_date, created_timestampWhen booking was created
data.roompick.lasteditedtimestamplast_modified_timestampLast update time
data.roompick.cancellationdatecancellation_dateCancellation date
data.roompick.guestprofiles[0].profilerefguest_id_ogGuest profile reference
data.roompick.agentnametravel_agentTravel agent name
data.roompick.companynamecompanyCompany name
data.currencycodecurrencyCurrency code

Guest Fields (from nested data.contact)

Guestline FieldStandard Schema Field
data.contact.contactgivennamefirst_name
data.contact.contactfamilynamelast_name
data.contact.contactaddress.townaddress_city
data.contact.contactaddress.stateaddress_state
data.contact.contactaddress.postcodeaddress_postcode
data.contact.contactaddress.countrycodeaddress_country

Person Profile Fields

Additional guest information from personprofiles table:

Profile FieldStandard Schema Field
forenamefirst_name
surnamelast_name
emailemail
gendergender
telephonenumberphone_number
streetaddress_street
townaddress_city
postcodeaddress_postcode
countryaddress_country
nationalitynationality
vipmembership_type

Status Mapping

Guestline StatusStandard Status
CANcancelled
CHKOchecked_out
NoShno_show
CheckedInchecked_in
RESconfirmed
Confirmedconfirmed
InHousechecked_in

Computed Check-in/Check-out Dates

Unlike other PMS systems, Guestline doesn’t have top-level arrival/departure dates. Instead, they must be computed from ratelines:

Check-in Date (Earliest)

df = df.withColumn( "check_in_date", F.expr(""" aggregate( data.roompick.ratelines, date('9999-12-31'), (acc, x) -> least(acc, to_date(x.from)) ) """) )

Logic: Start with a far-future date, then take the minimum of all rateline.from dates.

Check-out Date (Latest)

df = df.withColumn( "check_out_date", F.expr(""" aggregate( data.roompick.ratelines, date('0001-01-01'), (acc, x) -> greatest(acc, to_date(x.to)) ) """) )

Logic: Start with a far-past date, then take the maximum of all rateline.to dates.

Ratelines Structure

Guestline uses ratelines to represent different rate periods within a single reservation:

{ "data": { "roompick": { "ratelines": [ { "from": "2025-01-15", "to": "2025-01-17", "roomid": "101", "roomtypecode": "DBL", "rateplancode": "BAR", "guests": {"adults": 2, "children": 0}, "marketsegment": "LEISURE", "revenue": [ { "revenuetypegroup": "Accommodation", "amountbeforetax": 250.00, "amountaftertax": 287.50 }, { "revenuetypegroup": "FoodAndDrink", "amountbeforetax": 50.00, "amountaftertax": 57.50 } ] } ] } } }

Key features:

  • Multiple ratelines per reservation (e.g., room changes, rate changes)
  • Each rateline has a date range (from - to)
  • Revenue items are nested within each rateline
  • Guest counts and market segment are rateline-specific

Guest Profile Enrichment

Person profiles are loaded separately and joined with reservation data:

profiles_df = self._job_context.catalog.read_table( database="etl_gp_raw_queensway", table="personprofiles", dedup_keys=["entityid"], sort_by=["synced_date"], ) # Join on guest profile reference df = profiles_df.join( res_df.select("guest_id", "guest_id_og", "address_state", "created_timestamp") .drop_duplicates(subset=["guest_id_og"]), on=res_df["guest_id_og"] == profiles_df["profileref"], how="left" )

Note: The join preserves address_state from reservations (not available in profiles).

Rooms Processing

The rooms processing is more complex for Guestline due to the ratelines structure:

Step 1: Explode Ratelines

df = df.withColumn("detail", F.explode("ratelines"))

Each rateline becomes a separate row.

Step 2: Generate Stay Dates per Rateline

df = df.withColumn( "room_stay_date", F.explode( F.sequence( F.to_date(F.col("detail.from")), F.date_sub(F.to_date(F.col("detail.to")), 1), ) ), )

Expands each rateline into daily records.

Step 3: Explode Revenue within Rateline

df = df.withColumn("revenue", F.explode("detail.revenue"))

Each revenue item within a rateline becomes a separate row.

Step 4: Extract Room and Revenue Fields

df = ( df.withColumn("room_id", F.col("detail.roomid")) .withColumn("room_rate_code", F.col("detail.rateplancode")) .withColumn("room_type_code", F.col("detail.roomtypecode")) .withColumn("amount_before_tax", F.col("revenue.amountbeforetax").cast("double")) .withColumn("amount_after_tax", F.col("revenue.amountaftertax").cast("double")) .withColumn("revenue_type_group", F.col("revenue.revenuetypegroup")) )

Step 5: Map Revenue Types

df = ( df.withColumn( "room_stay_date_rate_net", F.when( F.col("revenue_type_group") == "Accommodation", F.col("amount_before_tax") ).otherwise(0) ) .withColumn( "room_stay_date_fnb_net", F.when( F.col("revenue_type_group") == "FoodAndDrink", F.col("amount_before_tax") ).otherwise(0) ) .withColumn( "room_stay_date_other_net", F.when( F.col("revenue_type_group") == "Other", F.col("amount_before_tax") ).otherwise(0) ) )

Step 6: Aggregate by Reservation and Date

df = df.groupBy( "chain_id", "property_id", "res_id", "room_id", "room_type_code", "room_rate_code", "room_check_in_date", "room_check_out_date", "room_guests_adults", "room_guests_children", "room_stay_date", "booking_type" ).agg( F.sum("room_stay_date_rate_net").alias("room_stay_date_rate_net"), F.sum("room_stay_date_fnb_net").alias("room_stay_date_fnb_net"), F.sum("room_stay_date_other_net").alias("room_stay_date_other_net"), F.sum("room_stay_date_listing_rate_net").alias("room_stay_date_listing_rate_net"), F.sum("room_stay_date_listing_rate_gross").alias("room_stay_date_listing_rate_gross"), )

This aggregation combines multiple revenue items for the same day into totals.

Revenue Type Groups

Guestline categorizes revenue into three groups:

  • Accommodationroom_stay_date_rate_net
  • FoodAndDrinkroom_stay_date_fnb_net
  • Otherroom_stay_date_other_net

Both net (before tax) and gross (after tax) amounts are available for accommodation.

Data Validation

Standard cleaning utilities are applied:

df = clean_country(df, "address_country") df = clean_country(df, "nationality") df = clean_email(df, "email") df = clean_phone_number(df, "phone_number")

Output Tables

Clean Reservations

  • Database: etl_gp_clean_queensway
  • Table: reservations
  • Key: reservation_id_og (entity ID)
  • Row Count: One per reservation

Clean Guests

  • Database: etl_gp_clean_queensway
  • Table: guests
  • Key: guest_id (stable UUID)
  • Source: Combined from roompicks contact data + person profiles
  • Deduplication: Latest by created_timestamp

Clean Rooms

  • Database: etl_gp_clean_queensway
  • Table: rooms
  • Key: Composite (property_id, res_id, room_id, room_stay_date)
  • Granularity: One row per reservation per room per day
  • Revenue: Aggregated from exploded ratelines and revenue items

Challenges & Solutions

ChallengeSolution
No top-level check-in/out datesCompute from ratelines using aggregate functions
Complex nested ratelinesMulti-level explosion: ratelines → dates → revenue
Revenue at rateline levelExplode and aggregate by date
Multiple rate periodsEach rateline generates separate room records
Heavy nested JSONDrop after extraction to reduce memory
Guest data in two placesJoin roompicks contact with person profiles

Guestline-Specific Features

Ratelines Architecture

Guestline’s rateline model allows for:

  • Multiple rate periods within one reservation
  • Room changes mid-stay
  • Rate plan changes mid-stay
  • Per-period guest counts and market segments

Revenue Nesting

Revenue items are nested within ratelines rather than at the reservation level, providing more granular revenue tracking.

Profile Reference System

Guests are linked via profileref in the roompick, which maps to entityid in person profiles.

VIP Status

Guestline tracks VIP status in person profiles, which is preserved in membership_type.

Example: Multi-Rateline Reservation

Consider a reservation with two ratelines:

Rateline 1: Jan 15-17, Room 101, Rate BAR, 2 adults Revenue: Accommodation $250, F&B $50 Rateline 2: Jan 17-18, Room 202, Rate PROMO, 2 adults Revenue: Accommodation $150, F&B $30

After processing, this produces 3 daily room records:

  • Jan 15: Room 101, Accommodation $250, F&B $50
  • Jan 16: Room 101, Accommodation $250, F&B $50
  • Jan 17: Room 202, Accommodation $150, F&B $30

See Also

Last updated on