ReportsTask
The ReportsTask orchestrates the generation of all final reports for consumption by end users and analytics systems. It coordinates three report subtasks that produce different views of the processed data.
Overview
- Orchestrates 3 report generation subtasks
- Writes results to database tables for consumption
- Provides flexibility to skip or select specific reports
- Ensures consistent reporting across all chains
Flow Diagram
Report Types
Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
name | Optional[str] | Custom name for the task |
job_context | JobContext | Context with Spark session and configuration |
database_sink | DatabaseSink | Database connection for writing reports |
skip_reports | Optional[List[str]] | Names of reports to skip (without “Task” suffix) |
only_run_reports | Optional[List[str]] | Names of only reports to run (without “Task” suffix) |
write_to_catalog | Optional[bool] | If true, write output tables using DatabaseSink (default: False) |
is_incremental | Optional[bool] | If true, tasks that support incremental logic will run in incremental mode (default: None) |
Note: ReportsTask forwards write_to_catalog and is_incremental to all subtasks when they are created, so you can control persistence and incremental mode centrally.
Subtasks
ReportsTask runs the following subtasks in order:
- DailyReportTask - One row per stay date per booking
- BookingRoomsReportTask - Aggregated data per reservation
- GuestLoyaltyReportTask - Guest KPIs and loyalty flags
- PickupReportTask - Pickup and pace analysis (not included in the default orchestrator; run separately if required)
Models
Requires
ProcessedGuestModel- Processed guest dataProcessedReservationModel- Processed reservation dataProcessedRoomModel- Processed room data
Provides
- None (writes directly to database tables)
Incremental
The ReportsTask and a subset of its subtasks support incremental execution to avoid full regeneration of the output tables. The task-level requires_incremental is [ProcessedAddedRoomModel]. In the current implementation, both DailyReportTask and BookingRoomsReportTask allow incremental updates (_ALLOWS_INCREMENTAL = True) and will upsert only affected rows when is_incremental=True and write_to_catalog=True.
Report Output Tables
| Report | Table Name | Granularity | Primary Use |
|---|---|---|---|
| Daily | daily | One row per room-stay-date | Detailed daily analysis, occupancy tracking |
| Booking Rooms | booking_rooms | One row per reservation | Revenue analysis, booking patterns |
| Guest loyalty | guest_loyalty (curated) | One row per guest | Guest segmentation and CRM use |
Catalog / Database writing
Reports write to chain-scoped database tables via DatabaseSink. Whether a task writes to the DB depends on write_to_catalog. Use write_to_catalog=True to persist reports and False to run local-only transformations.
Database Sink
Reports use DatabaseSink to write results.
Tables are namespaced by chain ID:
athenaeum_dailyathenaeum_booking_roomsmews_dailymews_booking_rooms
Related Tasks
- DailyReportTask - Most detailed report
- BookingRoomsReportTask - Reservation-level report
- GuestLoyaltyReportTask - Guest-focused report
- PickupReportTask - Snapshot-based analysis
- Task - Base task class
Best Practices
- Use only_run_reports for selective regeneration to save time
- Monitor database sizes as reports can grow large
- Consider partitioning large report tables by date
- Test with skip_reports to identify slow reports
- Document report dependencies for downstream consumers