Skip to Content

Athenaeum

The Athenaeum ingest process retrieves daily booking and revenue data from a secure SFTP server and loads it into the raw data catalog.

Overview

The AthenaeumIngester class handles the extraction of two main data types from the Athenaeum property management system:

  • Reservations: Daily booking information including guest details, room assignments, and booking status
  • Revenues: Transaction-level financial data including charges, payments, and revenue classifications

Data Source Configuration

The ingester connects to an SFTP server using credentials configured in the job context:

AthenaeumIngester(job_context).run()

Connection Details

  • Protocol: Secure FTP (SFTP)
  • Property ID: ATH
  • Data Format: Pipe-delimited CSV files (|)
  • File Naming Convention:
    • Bookings: ATH/ATHdailyBookings/ATH_dailyBookings_YYYYMMDD.csv
    • Revenues: ATH/ATHdailyRevenue/ATH_dailyRevenue_YYYYMMDD.csv

Ingest Process Flow

Incremental Loading Strategy

The ingester uses an incremental sync approach to avoid reprocessing data:

  1. Check Last Sync Date: Query the raw catalog for the most recent synced_date for each table
  2. Calculate Missing Dates: Determine all dates between last sync and today
  3. Fetch Daily Files: Download and process one file per day per table
  4. Record Sync Date: Each batch is tagged with its synced_date for future incremental runs

If no previous sync exists (first run), the ingester can accept a list of specific dates to process via the sync_dates parameter.

Data Schemas

Reservations Schema (48 fields)

The booking data includes comprehensive guest and reservation information:

Field CategoryFieldsDescription
Guest IdentityPROFILEID, NATIONALITY, VIPLEVELGuest profile and demographic data
Contact InfoADDRESS, CITY, STATE, POSTALCODE, COUNTRY, PHONETYPE, PHONENUMBER, MOBILECOUNTRYCODE, MOBILENUMBER, EMAILTYPE, EMAILADDComplete contact details
Booking DetailsCONFIRMATIONNUMBER, THIRDPARTYCONF, STATUSCODE, RESVDATE, CREATEDON, UPDATEDONReservation identification and status
Stay InformationARRIVALDATE, ARRIVALTIME, DEPARTUREDATE, DEPARTURETIME, ADULTS, CHILDRENCheck-in/out details and guest count
Room AssignmentROOM, ROOMTYPE, RATEPLAN, ROOMTYPCHG, ROOMTYPCHGREASON, DONOTMOVE, DONOTMOVEREASON, ISROOMUPSOLDRoom allocation and upgrades
Business DataCOMPANYNAME, COMPANYID, IATANUMBER, TANAME1, TAID1, MARKETSEGMENTCorporate and travel agent information
Group BookingsGROUPNAME, GROUPNUMBERGroup reservation tracking
CancellationsCANCELDATE, CANCELREASONCancellation details if applicable
PreferencesHASPREFERENCESGuest preference indicators
Audit TrailCREATEDBY, UPDATEDBY, RSL_CODESystem tracking fields

Revenues Schema (42 fields)

The revenue data captures detailed transaction information:

Field CategoryFieldsDescription
Transaction CoreRVL_DATE, RVL_RECORDTYPE, RVL_LINE, RVL_PAYTYPE, RVL_TRANSACTIONTYPETransaction identification and type
Financial AmountsRVL_QUANTITY, RVL_NETAMOUNT, RVL_TAXAMOUNT, RVL_AMOUNTMonetary values and quantities
Revenue ClassificationRVL_REVENUETYPE, RVL_TRANSACTIONCODE, RVL_TRANSACTIONDESC, RVL_TYPEOFREVENUERevenue categorization
Account InfoRVL_ACCOUNT, RVL_ACCOUNTID, RVL_ACCOUNTTYPE, RVL_ACCOUNTNAMEAccount and folio details
Reservation LinkRVL_RESERVATIONID, RVL_RESERVATIONSTAYID, RVL_CONFIRMATIONNUMBER, RVL_REFNUMLinks to booking records
Stay ContextRVL_STATUSCODE, RVL_ARRIVAL, RVL_DEPARTURE, RVL_ROOMNUMBER, RVL_ROOMTYPEAssociated stay information
Market DataRVL_MARKETSEGMENT, RVL_REASONFORSTAY, RVL_SOURCECODE, RVL_SECONDARYSOURCECODE, RVL_SOURCE, RVL_RATECODEBooking channel and market segment
Business PartnersRVL_TAID, RVL_TANAME, RVL_ORGID, RVL_ORGNAME, RVL_CRSNUMBERTravel agents and companies
Audit TimestampsRVL_CREATEDON, RVL_UPDATEDON, RVL_CANCELLEDONTransaction lifecycle dates
CancellationsRVL_CANCELLATIONCODECancellation tracking
PropertyRVL_PROPERTYProperty identifier

Implementation Details

Core Methods

run()

Main execution method that orchestrates the ingest process:

  1. Determines sync dates for reservations table
  2. Ingests booking data for each date
  3. Determines sync dates for revenues table
  4. Ingests revenue data for each date

_get_raw_bookings(date: str)

Ingests booking data for a specific date:

  • Constructs the SFTP file path using date format YYYYMMDD
  • Downloads and parses the CSV with pipe delimiter
  • Applies the RAW_BOOKING_SCHEMA for type safety
  • Writes to etl_gp_raw_athenaeum.reservations catalog table with synced_date partition

_get_raw_revenues(date: str)

Ingests revenue data for a specific date:

  • Constructs the SFTP file path using date format YYYYMMDD
  • Downloads and parses the CSV with pipe delimiter
  • Applies the RAW_REVENUE_SCHEMA for type safety
  • Writes to etl_gp_raw_athenaeum.revenues catalog table with synced_date partition

Error Handling

The ingester relies on the underlying SecureFTP utility for connection management and file retrieval. If a daily file is missing or malformed:

  • The SFTP client will raise an exception
  • The ingester will stop processing subsequent dates
  • Manual intervention may be required to investigate missing files

Type Safety

All fields are initially loaded as StringType to preserve raw data integrity. Type casting and validation occur in the cleaning stage of the workflow.

Catalog Output

After successful ingestion, data is available in the raw catalog:

Database: etl_gp_raw_athenaeum

Tables:

  • reservations - Daily booking snapshots with partitioning by synced_date and property_id
  • revenues - Daily transaction records with partitioning by synced_date and property_id

Both tables are partitioned to enable efficient incremental processing and data retention management.

Usage in Workflow

The Athenaeum ingest process is invoked as the first step in the ETL workflow:

from etl_lib.job.JobContext import JobContext from etl_lib.pipeline.ingest.athenaeum.AthenaeumIngester import AthenaeumIngester job_context = JobContext(chain_id="athenaeum", partitions=16) AthenaeumIngester(job_context).run()

After ingestion completes, the raw data flows to:

  1. Crawling - Consolidates S3 files into Iceberg tables in the Glue Data Catalog
  2. Cleaning - Data validation and transformation
  3. Processing - Business logic and analytics

Next Steps

After the ingest stage completes:

  • Crawler - Consolidates ingested S3 partitions into queryable Iceberg tables
  • Athenaeum Cleaner - Validates and transforms raw data into standardized format
Last updated on