CleanAthenaeumTask
The CleanAthenaeumTask transforms raw Athenaeum data into standardized cleaned models that conform to the common schema used across all hotel chains.
Overview
This task performs comprehensive data cleaning and standardization:
- Deduplicates reservation records
- Maps Athenaeum-specific fields to standard schema
- Normalizes status codes and booking types
- Extracts guest, reservation, and room data
- Generates stable UUIDs for entities
- Enriches data with revenue information
Flow Diagram
Data Transformations
Models
Requires
RawReservationModel- Raw Athenaeum reservation dataRawRevenueModel- Raw Athenaeum revenue/transaction data
Provides
CleanGuestModel- Standardized guest dataCleanReservationModel- Standardized reservation dataCleanRoomModel- Standardized room stay data
Key Transformations
1. Deduplication
Uses tie-breaking logic to keep the most recent/relevant record.
2. Status Code Mapping
Athenaeum-specific codes → Standard codes:
| Athenaeum | Standard |
|---|---|
| CANCELED | cancelled |
| CHKOUT | checked_out |
| NOSHOW | no_show |
| CHKIN | checked_in |
| CONFIRMED | confirmed |
| INHOUSE | checked_in |
| WAITLIST | waitlist |
3. Field Mapping
Sample field mappings:
| Athenaeum Field | Standard Field |
|---|---|
| profileid | guest_id_og |
| confirmationnumber | res_id_og |
| resvdate | booking_date |
| arrivaldate | check_in_date |
| departuredate | check_out_date |
| companyname | company |
| taname1 | travel_agent |
| marketsegment | market_segment |
| emailadd | |
| mobilenumber | phone_number |
4. UUID Generation
Generates stable UUIDs for guests and reservations.
5. Revenue Enrichment
Joins transaction/revenue data to get:
- Source codes
- Channel information
- Daily revenue breakdowns
- Rate details
Entity Extraction
Separate methods extract each entity type:
get_clean_guests()- Extracts and cleans guest recordsget_clean_reservations()- Extracts and cleans reservation recordsget_clean_rooms()- Extracts and cleans daily room stays
Data Quality Checks
The task performs several quality checks:
- Null Check-in/Check-out: Filters invalid date records
- Duplicate Removal: Deduplicates by confirmation number
- Type Validation: Converts and validates data types
- Referential Integrity: Ensures guest-reservation relationships
Downstream Tasks
Related Tasks
- AthenaeumRawTask - Previous step
- AthenaeumLegacyMergeTask - Optional legacy data merge
- ProcessingTask - Next step
- Task - Base class
Best Practices
- Review field mappings when Athenaeum schema changes
- Monitor deduplication for unexpected behavior
- Validate status mappings match business requirements
- Test UUID generation ensures consistency across runs
- Check revenue joins for data completeness
- Handle nulls appropriately in downstream processing
Last updated on