Mews

The Mews ingest process retrieves reservation, customer, order item (revenue), and master data from Mews PMS via the Connector API and loads it into the ingest catalog.

Overview

The MewsIngester class handles the extraction of multiple data types from the Mews Property Management System:

Reservations: Complete booking information including guest details, room assignments, dates, and status
Customers: Guest profile data with contact information and addresses
Order Items: Detailed revenue/charge data for each reservation
Master Data: Services, rates, resource categories (room types), business segments, age categories, companies, and products

Data Source Configuration

The ingester connects to Mews Connector API using access token authentication configured in the job context:


MewsIngester(job_context, sync_dates=None).run()

Connection Details

Protocol: HTTPS REST API
Authentication: ClientToken + AccessToken (stored in AWS Secrets Manager)
Data Format: JSON with paginated responses
Base URL: Configured via source["base_url"]
API Documentation: Mews Connector API
Required Headers:
- Content-Type: application/json
Authentication Method: Tokens passed in request body payload

Authentication Flow

The ingester retrieves credentials from AWS Secrets Manager:

Secret Lookup: Retrieves secret using source["auth"]["secret_name"]
Credential Extraction: Extracts client_token and access_token from secret
Client Initialization: Creates MewsAPIClient with credentials
Token Injection: Tokens are automatically added to each API request payload

Ingest Process Flow

Reservation Retrieval Strategy

The ingester uses an update-based fetch approach with a buffer window:

Single Endpoint with Buffer

Retrieves all reservations updated within a time window that includes a 1-day buffer to catch late updates.


POST /api/connector/v1/reservations/getAll/2023-06-06
{
  "EnterpriseIds": ["{propertyId}"],
  "UpdatedUtc": {
    "StartUtc": "{sync_date - 1 day}",
    "EndUtc": "{sync_date + 1 day}"
  }
}

Buffer Strategy: The ingester adds ±1 day to the sync date to ensure no reservations are missed due to timezone differences or late updates.

Pagination Handling

The Mews API uses cursor-based pagination:

Page Size: 1000 records per request
Cursor: Automatically included in subsequent requests
Termination: Pagination stops when no cursor is returned
Rate Limiting: 100ms delay between pagination requests

Incremental Loading Strategy

The ingester uses an incremental sync approach:

Check Last Sync Date: Query the ingest catalog for the most recent synced_date for the reservations table per property
Calculate Missing Dates: Determine all dates between last sync and today
Fallback to Constructor Dates: If no previous sync exists, use sync_dates parameter
Fetch Daily Data: Process one date at a time per property
Record Sync Date: Each batch is tagged with its synced_date for future incremental runs

Master Data Synchronization

For each batch of reservations, the ingester extracts related IDs and fetches master data entities:

ID Extraction

From each reservation batch, the ingester extracts:

Service IDs: For fetching property services
Rate IDs: For fetching rate plans
Business Segment IDs: For fetching market segments
Company IDs: For fetching corporate accounts
Account IDs: For fetching customer profiles
Age Category IDs: For fetching guest type classifications
Reservation IDs: For fetching order items

Chunked Fetching

Master data is fetched in chunks to respect API limits:

Chunk Size: 1000 IDs per request
Parallel Processing: Sequential processing of chunks with progress logging
Error Handling: Exceptions are logged; empty lists returned on failure

Master Data Entities

Services

Purpose: Property services (typically one per property)
Endpoint: /api/connector/v1/services/getAll
Key Fields: Id, EnterpriseId, Name, IsActive
Usage: Maps ServiceId to EnterpriseId (property_id)

Resource Categories

Purpose: Room types (Standard, Deluxe, Suite, etc.)
Endpoint: /api/connector/v1/resourceCategories/getAll
Key Fields: Id, EnterpriseId, ServiceId, Name, ShortName, Capacity
Usage: Defines available room types per property

Rates

Purpose: Rate plans (BAR, Corporate, Government, etc.)
Endpoint: /api/connector/v1/rates/getAll
Key Fields: Id, ServiceId, Name, ShortName, IsActive, IsPublic
Usage: Identifies rate plan details for reservations

Business Segments

Purpose: Market segments (Transient, Group, Corporate, etc.)
Endpoint: /api/connector/v1/businessSegments/getAll
Key Fields: Id, ServiceId, Name, IsActive
Usage: Categorizes booking sources and market types

Age Categories

Purpose: Guest type classifications (Adult, Child, Infant)
Endpoint: /api/connector/v1/ageCategories/getAll
Key Fields: Id, ServiceId, Name, Classification (Adult/Child)
Usage: Defines person count categories in PersonCounts array

Transactional Data Synchronization

After master data is loaded, transactional data is fetched:

Companies

Purpose: Corporate accounts and travel agencies
Endpoint: /api/connector/v1/companies/getAll
Key Fields: Id, Name, TaxIdentifier, ContactPerson, Address
Linked via: Reservation.PartnerCompanyId, TravelAgencyId

Customers

Purpose: Guest profiles with contact information
Endpoint: /api/connector/v1/customers/getAll
Key Fields: Id, FirstName, LastName, Email, Phone, Address, BirthDate, Nationality
Extent: Includes Customers and Addresses, excludes Documents
Linked via: Reservation.AccountId

Order Items

Purpose: Revenue/charge line items
Endpoint: /api/connector/v1/orderItems/getAll
Filters: By ServiceOrderIds (reservation IDs)
Key Fields: Id, Type, AccountingCategoryId, Amount, ConsumedUtc, State
Types: Product, Service, Payment, CancellationFee
Return Value: Raw list used for product ID extraction

Products

Purpose: F&B items and orderable products
Endpoint: /api/connector/v1/products/getAll
Key Fields: Id, ServiceId, Name, ShortName, Classifications (Food, Beverage, Wellness, CityTax)
Extraction: Product IDs extracted from order items with Type=Product
Linked via: OrderItem.Data.Product.ProductId

Reservation Processing

Service-to-Enterprise Mapping

The ingester creates a dictionary mapping ServiceId to EnterpriseId from the services data:


service_map = {service["Id"]: service["EnterpriseId"]}

This mapping is used to populate the property_id column in reservations by:

Creating a PySpark map expression from the dictionary
Mapping Reservation.ServiceId to property_id
Filling nulls with the property_id parameter as fallback

VoucherId Renaming

The field VoucherId is renamed to VoucherCode before saving to avoid conflicts with the BaseCrawler architecture.

Data Schemas

Reservations Schema

The reservation data captures comprehensive booking information:

Field Category	Key Fields	Description
Identifiers	`Id`, `ServiceId`, `GroupId`, `Number`	Unique identifiers for reservation and service
Accounts	`AccountId`, `AccountType`, `BookerId`	Customer account references
Audit	`CreatorProfileId`, `UpdaterProfileId`, `CreatedUtc`, `UpdatedUtc`, `CancelledUtc`	Creation and modification tracking
State	`State`, `Origin`, `CommanderOrigin`, `OriginDetails`	Reservation status and booking channel
Dates	`StartUtc`, `EndUtc`, `ScheduledStartUtc`, `ScheduledEndUtc`, `ActualStartUtc`, `ActualEndUtc`	Stay dates (scheduled and actual)
Release	`ReleasedUtc`	When reservation was released
Channel	`ChannelNumber`, `ChannelManagerNumber`	Channel manager booking reference
Room Assignment	`RequestedResourceCategoryId`, `AssignedResourceId`, `AssignedResourceLocked`	Room type request and assignment
Rate & Segment	`BusinessSegmentId`, `RateId`	Market segment and rate plan
Voucher	`VoucherId` (renamed to `VoucherCode`)	Voucher/promo code
Payment	`CreditCardId`	Payment method reference
Blocks	`AvailabilityBlockId`	Group block reference
Partners	`PartnerCompanyId`, `TravelAgencyId`	Company and travel agent references
Cancellation	`CancellationReason`	Reason for cancellation
Purpose	`Purpose`	Trip purpose
QR Code	`QrCodeData`	Mobile key data
Person Counts	`PersonCounts[]`	Array of guest counts by age category
Person Counts Fields	`PersonCounts[].AgeCategoryId`, `PersonCounts[].Count`	Age category ID and count
Options	`Options`	Check-in options struct
Options Fields	`Options.OwnerCheckedIn`, `AllCompanionsCheckedIn`, `AnyCompanionCheckedIn`, `ConnectorCheckIn`	Check-in status flags

Schema Type: StructType with nested arrays and objects (see RESERVATIONS_SCHEMA)

Customers Schema

The customer data contains guest profile information:

Field Category	Key Fields	Description
Identifiers	`Id`	Unique customer ID
Name	`FirstName`, `LastName`, `Title`	Guest name details
Contact	`Email`, `Phone`, `SecondaryPhone`	Contact information
Demographics	`BirthDate`, `BirthPlace`, `Nationality`, `Sex`, `Language`	Demographic data
Address	`Address.Line1`, `City`, `PostalCode`, `CountryCode`	Physical address
Identification	`IdentityCard.Type`, `Number`, `Expiration`	ID document details
Passport	`Passport.Number`, `Expiration`, `Issuance`	Passport information
Visa	`Visa.Number`, `Expiration`, `Issuance`	Visa details
Company	`CompanyId`	Linked company reference
Categories	`CategoryId`	Customer category classification
Notes	`Notes`	Additional notes
Options	`Options.SendMarketingEmails`, `SendMarketingPostalMail`	Marketing preferences

Schema Type: StructType with nested address and document structures

Order Items Schema

The order item data provides detailed revenue breakdown:

Field Category	Key Fields	Description
Identifiers	`Id`, `EnterpriseId`, `ServiceId`, `OrderId`	Unique identifiers
Accounting	`AccountId`, `AccountingCategoryId`	Account and category references
Type	`Type`, `SubType`	Item type (Product, Service, Payment, etc.)
Name	`Name`	Item description
Dates	`CreatedUtc`, `UpdatedUtc`, `ConsumedUtc`, `ClosedUtc`	Lifecycle timestamps
Billing	`BillId`, `InvoiceId`	Billing document references
State	`State`, `Origin`	Accounting state (Open, Closed, etc.)
Amount	`Amount.Currency`, `NetValue`, `GrossValue`	Monetary amounts
Tax	`Amount.TaxValues[]`, `TaxValues[].Code`, `TaxValues[].Value`	Tax breakdown
Tax Breakdown	`Amount.Breakdown.Items[]`	Detailed tax calculation
Breakdown Fields	`Items[].TaxRateCode`, `NetValue`, `TaxValue`	Per-rate tax details
Data	`Data`	Type-specific nested data (varies by Type)
Product Data	`Data.Product.ProductId`	Product reference when Type=Product
Options	`Options.CanceledWithReservation`	Whether item was auto-canceled
Notes	`Notes`	Additional notes

Schema Type: Complex StructType with nested amounts, taxes, and variable data field

Services Schema

Service data defines property-level services:

Field	Type	Description
`Id`	String	Unique service ID
`EnterpriseId`	String	Property/enterprise ID
`Name`	String	Service name
`Description`	String	Service description
`IsActive`	Boolean	Whether service is active
`Options`	Struct	Service options and settings

Schema Type: StructType with minimal nesting

Resource Categories Schema

Resource category data defines room types:

Field	Type	Description
`Id`	String	Unique category ID
`EnterpriseId`	String	Property/enterprise ID
`ServiceId`	String	Associated service
`Name`	String	Category name (e.g., “Deluxe King”)
`ShortName`	String	Abbreviated name
`Description`	String	Category description
`IsActive`	Boolean	Whether category is active
`Capacity`	Integer	Maximum occupancy
`ExtraCapacity`	Integer	Extra bed capacity
`Ordering`	Integer	Display order

Schema Type: StructType with basic fields

Rates Schema

Rate data defines rate plans:

Field	Type	Description
`Id`	String	Unique rate ID
`ServiceId`	String	Associated service
`GroupId`	String	Rate group reference
`Name`	String	Rate plan name
`ShortName`	String	Abbreviated name
`IsActive`	Boolean	Whether rate is active
`IsPublic`	Boolean	Whether rate is publicly bookable
`IsEnabled`	Boolean	Whether rate is enabled

Schema Type: StructType with basic fields

Business Segments Schema

Business segment data defines market segments:

Field	Type	Description
`Id`	String	Unique segment ID
`ServiceId`	String	Associated service
`Name`	String	Segment name
`IsActive`	Boolean	Whether segment is active

Schema Type: StructType with basic fields

Age Categories Schema

Age category data defines guest types:

Field	Type	Description
`Id`	String	Unique category ID
`ServiceId`	String	Associated service
`Name`	String	Category name (e.g., “Adult”, “Child”)
`Classification`	String	Either “Adult” or “Child”
`Ordering`	Integer	Display order

Schema Type: StructType with basic fields

Companies Schema

Company data contains corporate account information:

Field	Type	Description
`Id`	String	Unique company ID
`Name`	String	Company name
`Number`	String	Company number/code
`Identifier`	String	Business identifier
`TaxIdentifier`	String	Tax ID number
`AdditionalTaxIdentifier`	String	Secondary tax ID
`BillingCode`	String	Billing reference code
`AccountingCode`	String	Accounting system code
`Address`	Struct	Company address
`InvoiceDueInterval`	String	Payment terms
`Contact`	Struct	Contact person details
`Options`	Struct	Company options

Schema Type: StructType with nested address and contact

Products Schema

Product data defines orderable items:

Field	Type	Description
`Id`	String	Unique product ID
`ServiceId`	String	Associated service
`Name`	String	Product name
`ShortName`	String	Abbreviated name
`Description`	String	Product description
`ExternalName`	String	External system name
`Charging`	String	Charging mode
`Posting`	String	Posting mode
`Promotions`	Struct	Promotion settings
`Classifications`	Struct	Product classifications
`Classifications.Food`	Boolean	Whether classified as food
`Classifications.Beverage`	Boolean	Whether classified as beverage
`Classifications.Wellness`	Boolean	Whether classified as wellness
`Classifications.CityTax`	Boolean	Whether classified as city tax
`AccountingCategoryId`	String	Accounting category reference
`Options`	Struct	Product options

Schema Type: StructType with nested classifications

Implementation Details

Core Methods

`run()`

Main execution method that orchestrates the ingest process:

Iterates through each property configured in the job context
Calls _process_property() for each property

`_process_property(property_data: dict)`

Processes a single property:

Determines sync dates using _get_sync_dates()
Falls back to sync_dates constructor parameter if no dates found
Calls _process_sync_date() for each date

`_get_sync_dates(property_id: str, table: str = "reservations")`

Determines dates to sync:

Calls get_next_sync_dates() from base class to get missing dates
Falls back to self.sync_dates if no dates found
Returns empty list if neither source has dates

`_process_sync_date(property_id: str, sync_date: str)`

Comprehensive data ingestion for a single property and date:

Fetch Reservations: Calls _fetch_reservations() with 1-day buffer
Extract IDs: Calls _extract_related_ids() to collect master data IDs
Sync Master Data: Sequentially syncs services, resource categories, rates, business segments, age categories
Save Reservations: Maps ServiceId to property_id and writes to catalog
Sync Transactional Data: Syncs companies, customers, order items
Extract Products: If order items contain products, syncs product master data

`_fetch_reservations(property_id: str, sync_date: datetime)`

Fetches reservations for a specific date:

Calculates buffer window: sync_date - 1 day to sync_date + 1 day
Calls MewsAPIClient.get_all_reservations() with UpdatedUtc filter
Returns empty list if no reservations found

`_extract_related_ids(reservations: list[Row | dict])`

Extracts related entity IDs from reservations:

Iterates through reservations
Collects unique ServiceIds, RateIds, BusinessSegmentIds, CompanyIds, AccountIds
Extracts AgeCategoryIds from PersonCounts array
Returns dictionary of ID lists

`_chunk_ids(ids: list, chunk_size: int = 1000)`

Splits ID lists into chunks:

Chunk Size: 1000 IDs (Mews API limit)
Empty Handling: Returns empty list if no IDs
Usage: Used by all fetch methods to respect API limits

`_fetch_entities(...)`

Generic entity fetching with chunking and logging:

Validates input IDs (returns empty list if none)
Chunks IDs using _chunk_ids()
Logs progress for each chunk
Calls provided fetch function for each chunk
Aggregates results across all chunks
Returns combined entity list

`_write_entities(...)`

Generic entity writing to catalog:

Creates Spark DataFrame from entity list
Adds synced_timestamp and synced_date columns
Handles property_id based on partition settings:
- If property_column specified: Maps from entity column with fallback
- If property_partition=True: Uses literal property_id
Calls catalog.write_to_ingest() with appropriate partitioning

`_sync_services(service_ids: list, property_id: str, sync_date: str)`

Syncs service master data:

Fetches services using chunked requests
Writes to services table with property partition
Creates and returns ServiceId → EnterpriseId mapping dictionary

`_save_reservations(...)`

Saves reservation data:

Creates DataFrame from reservation list
Renames VoucherId to VoucherCode
Adds metadata columns
Maps ServiceId to property_id using service_to_enterprise_map
Writes to reservations table with property partition

`_sync_resource_categories(service_ids: list, property_id: str, sync_date: str)`

Syncs resource category (room type) master data:

Fetches by service_ids and enterprise_ids
Writes to resource_categories table with property partition

`_sync_rates(rate_ids: list, property_id: str, sync_date: str)`

Syncs rate plan master data:

Fetches by rate_ids and enterprise_ids
Writes to rates table without property partition

`_sync_business_segments(business_segment_ids: list, property_id: str, sync_date: str)`

Syncs business segment master data:

Fetches by business_segment_ids and enterprise_ids
Writes to business_segments table without property partition

`_sync_age_categories(age_category_ids: list, property_id: str, sync_date: str)`

Syncs age category master data:

Fetches by age_category_ids and enterprise_ids
Writes to age_categories table without property partition

`_sync_companies(company_ids: list, property_id: str, sync_date: str)`

Syncs company transactional data:

Fetches by company_ids and enterprise_ids
Writes to companies table without property partition

`_sync_customers(account_ids: list, property_id: str, sync_date: str)`

Syncs customer transactional data:

Fetches by customer_ids (account_ids) and enterprise_ids
Includes Customers and Addresses extent
Writes to customers table without property partition

`_sync_order_items(reservation_ids: list, property_id: str, sync_date: str)`

Syncs order item revenue data:

Fetches by service_order_ids (reservation_ids)
Writes to order_items table with property partition
Returns raw order items list for product extraction

`_extract_product_ids(order_items: list[dict])`

Extracts product IDs from order items:

Filters for items with Type=Product
Extracts Data.Product.ProductId from each
Returns set of unique product IDs

`_sync_products(product_ids: list, property_id: str, sync_date: str)`

Syncs product master data:

Fetches by product_ids and enterprise_ids
Writes to products table with property partition

`_get_credentials()`

Retrieves API credentials from AWS Secrets Manager:

Creates boto3 session for eu-central-1 region
Fetches secret using self._secret_name
Parses JSON secret string
Validates presence of client_token and access_token
Returns credentials dictionary

Error Handling

AWS Secrets Manager Failures: If secret retrieval fails, exception propagates and stops ingestion
Missing Configuration: Raises ValueError if secret_name or base_url missing
API Request Failures: Handled by MewsAPIClient with exponential backoff retry logic
Empty Responses: Handled gracefully; empty DataFrames created without errors
Chunking: Each chunk is processed independently; failures in one chunk don’t block others
Entity Writes: Skip writing if entity list is empty (no DataFrame creation)

Type Safety

All data structures are validated against PySpark schemas during DataFrame creation. This ensures:

Type consistency for downstream processing
Early detection of schema changes in the API
Null-safe handling of optional fields
Preserved nested structure for complex objects (PersonCounts, Options, Amount, etc.)

Fields are primarily StringType to preserve raw API values. Type casting and business logic occur in the cleaning stage of the workflow.

Catalog Output

After successful ingestion, data is available in the ingest catalog:

Database: etl_gp_ingest_mews

Tables:

reservations - Complete reservation details, partitioned by synced_date and property_id
customers - Guest profile data, partitioned by synced_date
order_items - Revenue line items, partitioned by synced_date and property_id
services - Property services, partitioned by synced_date and property_id
resource_categories - Room types, partitioned by synced_date and property_id
rates - Rate plans, partitioned by synced_date
business_segments - Market segments, partitioned by synced_date
age_categories - Guest type classifications, partitioned by synced_date
companies - Corporate accounts, partitioned by synced_date
products - F&B and orderable items, partitioned by synced_date and property_id

API Rate Limiting Considerations

The Mews Connector API enforces rate limits:

General Endpoints: Subject to enterprise-level rate limiting
Pagination: Limited to 1000 records per request (controlled by Limitation.Count)
Retry Logic: Exponential backoff with 2-30 second delays, up to 5 attempts
Delay Between Pages: 100ms to avoid aggressive pagination

If rate limits are encountered:

The MewsAPIClient will automatically retry with exponential backoff
Consider reducing chunk size from 1000 to 500
Distribute sync dates across multiple job runs
Contact Mews support to increase rate limits if needed

Usage in Workflow

The Mews ingester is typically invoked as the first stage in the ETL workflow:


from etl_lib.job.JobContext import JobContext
from etl_lib.pipeline.ingest.mews.MewsIngester import MewsIngester
 
# Standard incremental ingest
job_context = JobContext(source="mews", property_ids=["PROPERTY_ID"])
MewsIngester(job_context).run()
 
# Backfill specific dates
job_context = JobContext(source="mews", property_ids=["PROPERTY_ID"])
MewsIngester(job_context, sync_dates=["2025-01-01", "2025-01-02"]).run()

The ingest data is then processed by the cleaning pipeline to transform nested structures into flat tables suitable for analytics.