Skip to Content

ReportsTask

The ReportsTask orchestrates the generation of all final reports for consumption by end users and analytics systems. It coordinates three report subtasks that produce different views of the processed data.

Overview

  • Orchestrates 3 report generation subtasks
  • Writes results to database tables for consumption
  • Provides flexibility to skip or select specific reports
  • Ensures consistent reporting across all chains

Flow Diagram

Report Types

Constructor Parameters

ParameterTypeDescription
nameOptional[str]Custom name for the task
job_contextJobContextContext with Spark session and configuration
database_sinkDatabaseSinkDatabase connection for writing reports
skip_reportsOptional[List[str]]Names of reports to skip (without “Task” suffix)
only_run_reportsOptional[List[str]]Names of only reports to run (without “Task” suffix)
write_to_catalogOptional[bool]If true, write output tables using DatabaseSink (default: False)
is_incrementalOptional[bool]If true, tasks that support incremental logic will run in incremental mode (default: None)

Note: ReportsTask forwards write_to_catalog and is_incremental to all subtasks when they are created, so you can control persistence and incremental mode centrally.

Subtasks

ReportsTask runs the following subtasks in order:

  1. DailyReportTask - One row per stay date per booking
  2. BookingRoomsReportTask - Aggregated data per reservation
  3. GuestLoyaltyReportTask - Guest KPIs and loyalty flags
  4. PickupReportTask - Pickup and pace analysis (not included in the default orchestrator; run separately if required)

Models

Requires

  • ProcessedGuestModel - Processed guest data
  • ProcessedReservationModel - Processed reservation data
  • ProcessedRoomModel - Processed room data

Provides

  • None (writes directly to database tables)

Incremental

The ReportsTask and a subset of its subtasks support incremental execution to avoid full regeneration of the output tables. The task-level requires_incremental is [ProcessedAddedRoomModel]. In the current implementation, both DailyReportTask and BookingRoomsReportTask allow incremental updates (_ALLOWS_INCREMENTAL = True) and will upsert only affected rows when is_incremental=True and write_to_catalog=True.

Report Output Tables

ReportTable NameGranularityPrimary Use
DailydailyOne row per room-stay-dateDetailed daily analysis, occupancy tracking
Booking Roomsbooking_roomsOne row per reservationRevenue analysis, booking patterns
Guest loyaltyguest_loyalty (curated)One row per guestGuest segmentation and CRM use

Catalog / Database writing

Reports write to chain-scoped database tables via DatabaseSink. Whether a task writes to the DB depends on write_to_catalog. Use write_to_catalog=True to persist reports and False to run local-only transformations.

Database Sink

Reports use DatabaseSink to write results.

Tables are namespaced by chain ID:

  • athenaeum_daily
  • athenaeum_booking_rooms
  • mews_daily
  • mews_booking_rooms

Best Practices

  1. Use only_run_reports for selective regeneration to save time
  2. Monitor database sizes as reports can grow large
  3. Consider partitioning large report tables by date
  4. Test with skip_reports to identify slow reports
  5. Document report dependencies for downstream consumers
Last updated on