Skip to Content
ProcessesTasksChainsAthenaeumCleanAthenaeumTask

CleanAthenaeumTask

The CleanAthenaeumTask transforms raw Athenaeum data into standardized cleaned models that conform to the common schema used across all hotel chains.

Overview

This task performs comprehensive data cleaning and standardization:

  • Deduplicates reservation records
  • Maps Athenaeum-specific fields to standard schema
  • Normalizes status codes and booking types
  • Extracts guest, reservation, and room data
  • Generates stable UUIDs for entities
  • Enriches data with revenue information

Flow Diagram

Data Transformations

Models

Requires

  • RawReservationModel - Raw Athenaeum reservation data
  • RawRevenueModel - Raw Athenaeum revenue/transaction data

Provides

  • CleanGuestModel - Standardized guest data
  • CleanReservationModel - Standardized reservation data
  • CleanRoomModel - Standardized room stay data

Key Transformations

1. Deduplication

Uses tie-breaking logic to keep the most recent/relevant record.

2. Status Code Mapping

Athenaeum-specific codes → Standard codes:

AthenaeumStandard
CANCELEDcancelled
CHKOUTchecked_out
NOSHOWno_show
CHKINchecked_in
CONFIRMEDconfirmed
INHOUSEchecked_in
WAITLISTwaitlist

3. Field Mapping

Sample field mappings:

Athenaeum FieldStandard Field
profileidguest_id_og
confirmationnumberres_id_og
resvdatebooking_date
arrivaldatecheck_in_date
departuredatecheck_out_date
companynamecompany
taname1travel_agent
marketsegmentmarket_segment
emailaddemail
mobilenumberphone_number

4. UUID Generation

Generates stable UUIDs for guests and reservations.

5. Revenue Enrichment

Joins transaction/revenue data to get:

  • Source codes
  • Channel information
  • Daily revenue breakdowns
  • Rate details

Entity Extraction

Separate methods extract each entity type:

  • get_clean_guests() - Extracts and cleans guest records
  • get_clean_reservations() - Extracts and cleans reservation records
  • get_clean_rooms() - Extracts and cleans daily room stays

Data Quality Checks

The task performs several quality checks:

  1. Null Check-in/Check-out: Filters invalid date records
  2. Duplicate Removal: Deduplicates by confirmation number
  3. Type Validation: Converts and validates data types
  4. Referential Integrity: Ensures guest-reservation relationships

Downstream Tasks

Best Practices

  1. Review field mappings when Athenaeum schema changes
  2. Monitor deduplication for unexpected behavior
  3. Validate status mappings match business requirements
  4. Test UUID generation ensures consistency across runs
  5. Check revenue joins for data completeness
  6. Handle nulls appropriately in downstream processing
Last updated on