Skip to Content

Room Metrics

  • Joins room-stay rows with reservation status.
  • Computes length of stay (days) where applicable.
  • Sets per-stay-date for check-in, check-out, stay-day, and stay-night.
  • Normalizes revenue components and aggregates per reservation.
  • Adds reservation-level check-in/out dates and guest totals when missing.
  • Cleans up temporary fields and updates the rooms dataset.

Inputs

  • Expects rooms_df and res_df on the workflow context. res_df is used to provide reservation status for each res_id. rooms_df should contain fields such as res_id, room_stay_date, room_check_in_date, room_check_out_date, room_stay_date_rate_net, room_stay_date_fnb_net, room_stay_date_other_net, room_pm and guest counts (room_guests_adults, room_guests_children) when available.

Outputs

  • Writes back an enriched rooms_df with computed columns: room_length_of_stay, room_stay_date_total_net (sum of rate/fnb/other), room_stay_date_is_check_in_day, room_stay_date_is_check_out_day, room_stay_date_is_stay_day, room_stay_date_is_stay_night and aggregated reservation-level columns such as check_in_date, check_out_date, guests_adults, guests_children, total_room_net, total_room_rate_net, total_room_fnb_net, total_room_other_net when the corresponding day-level fields are present.

Behaviour and business rules

  • The runnable joins rooms_df with res_df.select("res_id", "status") (broadcast). It removes duplicates and repartitions by res_id.
  • Length of stay (room_length_of_stay) is computed as the absolute datediff between room_check_out_date and room_check_in_date for current/future rows. If the necessary check-in/out fields are missing, that column is left unchanged.
  • Revenue day-level components are coalesced to 0.0 to avoid nulls before aggregation.
  • Stay-day and stay-night flags are computed using several exclusion rules:
    • Non-room bookings (based on booking type) are excluded.
    • Cancelled, no_show and waitlist statuses are excluded.
    • PM rooms (room_pm) are excluded.
    • Day-use stays (same check-in and check-out date) are handled specially: they can count as a stay-day but not a stay-night.
    • Check-out day for overnight stays is excluded from stay-day/stay-night.
    • Past dates may be excluded unless the reservation is in checked_in, checked_out or in_house statuses.
  • Aggregations per res_id compute min/max check-in/out dates, sum of guest counts, and sums of revenue components which are then joined back to the room rows.

Edge cases and notes

  • If status is present in the joined df it is dropped after use to avoid leaking temporary columns.
  • The code uses checkpoint(eager=True) on intermediate DataFrames to limit lineage and improve stability for long pipelines.
  • Booking type is inspected; rows that are not ROOM bookings are excluded from stay-day/night flags.
  • Null or missing fields will result in missing derived values; the runnable is defensive but expects typical reservation and room-day fields to be present for meaningful results.
Last updated on