AthenaeumRawTask

The AthenaeumRawTask orchestrates the raw data acquisition for the Athenaeum hotel chain, running both ingestion and crawling subtasks to fetch reservation data from the source system.

Overview

This task handles the initial data acquisition stage by:

Ingesting data from the Athenaeum API/system
Crawling and parsing reservation data
Producing raw models for downstream processing
Supporting optional date-based sync strategies

Flow Diagram

Subtasks

AthenaeumIngesterTask

Connects to the Athenaeum system to fetch and ingest raw booking data.

AthenaeumCrawlerTask

Parses and transforms ingested data into standardized raw models.

Models

Requires

None (initial data acquisition)

Provides

RawReservationModel - Raw reservation data from Athenaeum
RawRevenueModel - Raw revenue data from Athenaeum

Error Handling

The task handles common scenarios:

API connection failures
Missing data for specific dates
Parsing errors in crawler
Invalid date formats

Downstream Tasks

The output of this task feeds into:

CleanAthenaeumTask - Next step: cleaning
Task - Base task class

Best Practices

Use sync_dates for incremental updates to avoid re-ingesting all data
Monitor ingestion logs for API errors or missing data
Test with skip_ingestion when developing crawler changes
Schedule appropriately based on Athenaeum data update frequency
Handle API credentials securely through JobContext configuration