Guest Loyalty

Guest loyalty summarizes how often, how recently, and how valuable a guest is. It counts stays and nights, measures booking lead time, tracks first/last and upcoming stays, lists properties visited, and totals room revenue when available. It flags returning guests, buckets stay frequency, records the last booked property/channel/source/agent/segment/rate, and tags each reservation with the guest’s booking order.

Logic for Aggregating per guest cluster

Inputs

Expects res_df (reservations) and guest_df to exist on the workflow context. res_df should contain at least: res_id, booking_date, check_in_date, check_out_date, status, and guest_id (or guest_cluster_id after matching). Optional but commonly used columns: total_room_net, total_room_rate_net, stay_nights, property_id, channel, source, travel_agent, market_segment, room_rate_code, and created_timestamp.

Outputs

Writes back an enriched guest_df aggregated by guest_cluster_id with loyalty metrics such as stays, total_stay_nights, avg_stay_nights, lifetime_total_value, avg_total_net_revenue_per_stay, first_booking_date, last_booking_date, last_stayed_property_id, properties_visited (comma-separated), returning_guest, stays_category, and a set of last-booked fields (channel, source, travel agent, market segment, rate code).
Updates res_df by joining a guest_cluster_id (from guest_df) and adding nth_booking (1-based by booking_date) for reservations with status in confirmed, checked_in, or checked_out.

Key behaviours and notes

Only reservations with status in [“confirmed”, “checked_in”, “checked_out”] are considered when computing loyalty aggregates and when calculating time_between_bookings (the code filters res_df before joining guests).
time_between_bookings is computed using a windowed lag(booking_date) partitioned by guest_cluster_id and ordered by booking_date.
Revenue metrics are only aggregated if total_room_net / total_room_rate_net columns exist; the implementation conditionally extends aggregation expressions when these columns are present.
properties_visited is built by collecting a set of property_id values for past stays and concatenating them with commas.
stays_category maps counts into buckets: 1, 2, 3, 4-7, 8-10, 11-20, 20+.

Edge cases and failure modes

If guest_cluster_id is missing on guest_df (for example when matching is skipped), Guest Matching Check should be used to copy guest_id into guest_cluster_id to avoid null joins. Without a guest_cluster_id the loyalty aggregation will not group guests correctly.
The code casts several aggregated longs into integers (e.g., stays, total_stay_nights, future_checkins) to ensure stable types for downstream consumers.
The nth_booking numbering only includes reservations with an eligible status and non-null guest_cluster_id.
Null or missing date columns will propagate nulls in booking window and stay-night related metrics; the code guards some computations with conditional when clauses but relies on valid dates for meaningful numeric values.

Example (high level)

Given a workflow context with res_df and guest_df:

Running the Guest Loyalty runnable produces an updated guest_df keyed by guest_cluster_id with aggregated loyalty metrics and an updated res_df containing guest_cluster_id and nth_booking for eligible reservations.