Mews
The Mews ingest process retrieves reservation, customer, order item (revenue), and master data from Mews PMS via the Connector API and loads it into the ingest catalog.
Overview
The MewsIngester class handles the extraction of multiple data types from the Mews Property Management System:
- Reservations: Complete booking information including guest details, room assignments, dates, and status
- Customers: Guest profile data with contact information and addresses
- Order Items: Detailed revenue/charge data for each reservation
- Master Data: Services, rates, resource categories (room types), business segments, age categories, companies, and products
Data Source Configuration
The ingester connects to Mews Connector API using access token authentication configured in the job context:
MewsIngester(job_context, sync_dates=None).run()Connection Details
- Protocol: HTTPS REST API
- Authentication: ClientToken + AccessToken (stored in AWS Secrets Manager)
- Data Format: JSON with paginated responses
- Base URL: Configured via
source["base_url"] - API Documentation: Mews Connector API
- Required Headers:
Content-Type: application/json
- Authentication Method: Tokens passed in request body payload
Authentication Flow
The ingester retrieves credentials from AWS Secrets Manager:
- Secret Lookup: Retrieves secret using
source["auth"]["secret_name"] - Credential Extraction: Extracts
client_tokenandaccess_tokenfrom secret - Client Initialization: Creates
MewsAPIClientwith credentials - Token Injection: Tokens are automatically added to each API request payload
Ingest Process Flow
Reservation Retrieval Strategy
The ingester uses an update-based fetch approach with a buffer window:
Single Endpoint with Buffer
Retrieves all reservations updated within a time window that includes a 1-day buffer to catch late updates.
POST /api/connector/v1/reservations/getAll/2023-06-06
{
"EnterpriseIds": ["{propertyId}"],
"UpdatedUtc": {
"StartUtc": "{sync_date - 1 day}",
"EndUtc": "{sync_date + 1 day}"
}
}Buffer Strategy: The ingester adds ±1 day to the sync date to ensure no reservations are missed due to timezone differences or late updates.
Pagination Handling
The Mews API uses cursor-based pagination:
- Page Size: 1000 records per request
- Cursor: Automatically included in subsequent requests
- Termination: Pagination stops when no cursor is returned
- Rate Limiting: 100ms delay between pagination requests
Incremental Loading Strategy
The ingester uses an incremental sync approach:
- Check Last Sync Date: Query the ingest catalog for the most recent
synced_datefor the reservations table per property - Calculate Missing Dates: Determine all dates between last sync and today
- Fallback to Constructor Dates: If no previous sync exists, use
sync_datesparameter - Fetch Daily Data: Process one date at a time per property
- Record Sync Date: Each batch is tagged with its
synced_datefor future incremental runs
Master Data Synchronization
For each batch of reservations, the ingester extracts related IDs and fetches master data entities:
ID Extraction
From each reservation batch, the ingester extracts:
- Service IDs: For fetching property services
- Rate IDs: For fetching rate plans
- Business Segment IDs: For fetching market segments
- Company IDs: For fetching corporate accounts
- Account IDs: For fetching customer profiles
- Age Category IDs: For fetching guest type classifications
- Reservation IDs: For fetching order items
Chunked Fetching
Master data is fetched in chunks to respect API limits:
- Chunk Size: 1000 IDs per request
- Parallel Processing: Sequential processing of chunks with progress logging
- Error Handling: Exceptions are logged; empty lists returned on failure
Master Data Entities
Services
- Purpose: Property services (typically one per property)
- Endpoint:
/api/connector/v1/services/getAll - Key Fields:
Id,EnterpriseId,Name,IsActive - Usage: Maps ServiceId to EnterpriseId (property_id)
Resource Categories
- Purpose: Room types (Standard, Deluxe, Suite, etc.)
- Endpoint:
/api/connector/v1/resourceCategories/getAll - Key Fields:
Id,EnterpriseId,ServiceId,Name,ShortName,Capacity - Usage: Defines available room types per property
Rates
- Purpose: Rate plans (BAR, Corporate, Government, etc.)
- Endpoint:
/api/connector/v1/rates/getAll - Key Fields:
Id,ServiceId,Name,ShortName,IsActive,IsPublic - Usage: Identifies rate plan details for reservations
Business Segments
- Purpose: Market segments (Transient, Group, Corporate, etc.)
- Endpoint:
/api/connector/v1/businessSegments/getAll - Key Fields:
Id,ServiceId,Name,IsActive - Usage: Categorizes booking sources and market types
Age Categories
- Purpose: Guest type classifications (Adult, Child, Infant)
- Endpoint:
/api/connector/v1/ageCategories/getAll - Key Fields:
Id,ServiceId,Name,Classification(Adult/Child) - Usage: Defines person count categories in PersonCounts array
Transactional Data Synchronization
After master data is loaded, transactional data is fetched:
Companies
- Purpose: Corporate accounts and travel agencies
- Endpoint:
/api/connector/v1/companies/getAll - Key Fields:
Id,Name,TaxIdentifier,ContactPerson,Address - Linked via:
Reservation.PartnerCompanyId,TravelAgencyId
Customers
- Purpose: Guest profiles with contact information
- Endpoint:
/api/connector/v1/customers/getAll - Key Fields:
Id,FirstName,LastName,Email,Phone,Address,BirthDate,Nationality - Extent: Includes Customers and Addresses, excludes Documents
- Linked via:
Reservation.AccountId
Order Items
- Purpose: Revenue/charge line items
- Endpoint:
/api/connector/v1/orderItems/getAll - Filters: By
ServiceOrderIds(reservation IDs) - Key Fields:
Id,Type,AccountingCategoryId,Amount,ConsumedUtc,State - Types: Product, Service, Payment, CancellationFee
- Return Value: Raw list used for product ID extraction
Products
- Purpose: F&B items and orderable products
- Endpoint:
/api/connector/v1/products/getAll - Key Fields:
Id,ServiceId,Name,ShortName,Classifications(Food, Beverage, Wellness, CityTax) - Extraction: Product IDs extracted from order items with
Type=Product - Linked via:
OrderItem.Data.Product.ProductId
Reservation Processing
Service-to-Enterprise Mapping
The ingester creates a dictionary mapping ServiceId to EnterpriseId from the services data:
service_map = {service["Id"]: service["EnterpriseId"]}This mapping is used to populate the property_id column in reservations by:
- Creating a PySpark map expression from the dictionary
- Mapping
Reservation.ServiceIdtoproperty_id - Filling nulls with the property_id parameter as fallback
VoucherId Renaming
The field VoucherId is renamed to VoucherCode before saving to avoid conflicts with the BaseCrawler architecture.
Data Schemas
Reservations Schema
The reservation data captures comprehensive booking information:
| Field Category | Key Fields | Description |
|---|---|---|
| Identifiers | Id, ServiceId, GroupId, Number | Unique identifiers for reservation and service |
| Accounts | AccountId, AccountType, BookerId | Customer account references |
| Audit | CreatorProfileId, UpdaterProfileId, CreatedUtc, UpdatedUtc, CancelledUtc | Creation and modification tracking |
| State | State, Origin, CommanderOrigin, OriginDetails | Reservation status and booking channel |
| Dates | StartUtc, EndUtc, ScheduledStartUtc, ScheduledEndUtc, ActualStartUtc, ActualEndUtc | Stay dates (scheduled and actual) |
| Release | ReleasedUtc | When reservation was released |
| Channel | ChannelNumber, ChannelManagerNumber | Channel manager booking reference |
| Room Assignment | RequestedResourceCategoryId, AssignedResourceId, AssignedResourceLocked | Room type request and assignment |
| Rate & Segment | BusinessSegmentId, RateId | Market segment and rate plan |
| Voucher | VoucherId (renamed to VoucherCode) | Voucher/promo code |
| Payment | CreditCardId | Payment method reference |
| Blocks | AvailabilityBlockId | Group block reference |
| Partners | PartnerCompanyId, TravelAgencyId | Company and travel agent references |
| Cancellation | CancellationReason | Reason for cancellation |
| Purpose | Purpose | Trip purpose |
| QR Code | QrCodeData | Mobile key data |
| Person Counts | PersonCounts[] | Array of guest counts by age category |
| Person Counts Fields | PersonCounts[].AgeCategoryId, PersonCounts[].Count | Age category ID and count |
| Options | Options | Check-in options struct |
| Options Fields | Options.OwnerCheckedIn, AllCompanionsCheckedIn, AnyCompanionCheckedIn, ConnectorCheckIn | Check-in status flags |
Schema Type: StructType with nested arrays and objects (see RESERVATIONS_SCHEMA)
Customers Schema
The customer data contains guest profile information:
| Field Category | Key Fields | Description |
|---|---|---|
| Identifiers | Id | Unique customer ID |
| Name | FirstName, LastName, Title | Guest name details |
| Contact | Email, Phone, SecondaryPhone | Contact information |
| Demographics | BirthDate, BirthPlace, Nationality, Sex, Language | Demographic data |
| Address | Address.Line1, City, PostalCode, CountryCode | Physical address |
| Identification | IdentityCard.Type, Number, Expiration | ID document details |
| Passport | Passport.Number, Expiration, Issuance | Passport information |
| Visa | Visa.Number, Expiration, Issuance | Visa details |
| Company | CompanyId | Linked company reference |
| Categories | CategoryId | Customer category classification |
| Notes | Notes | Additional notes |
| Options | Options.SendMarketingEmails, SendMarketingPostalMail | Marketing preferences |
Schema Type: StructType with nested address and document structures
Order Items Schema
The order item data provides detailed revenue breakdown:
| Field Category | Key Fields | Description |
|---|---|---|
| Identifiers | Id, EnterpriseId, ServiceId, OrderId | Unique identifiers |
| Accounting | AccountId, AccountingCategoryId | Account and category references |
| Type | Type, SubType | Item type (Product, Service, Payment, etc.) |
| Name | Name | Item description |
| Dates | CreatedUtc, UpdatedUtc, ConsumedUtc, ClosedUtc | Lifecycle timestamps |
| Billing | BillId, InvoiceId | Billing document references |
| State | State, Origin | Accounting state (Open, Closed, etc.) |
| Amount | Amount.Currency, NetValue, GrossValue | Monetary amounts |
| Tax | Amount.TaxValues[], TaxValues[].Code, TaxValues[].Value | Tax breakdown |
| Tax Breakdown | Amount.Breakdown.Items[] | Detailed tax calculation |
| Breakdown Fields | Items[].TaxRateCode, NetValue, TaxValue | Per-rate tax details |
| Data | Data | Type-specific nested data (varies by Type) |
| Product Data | Data.Product.ProductId | Product reference when Type=Product |
| Options | Options.CanceledWithReservation | Whether item was auto-canceled |
| Notes | Notes | Additional notes |
Schema Type: Complex StructType with nested amounts, taxes, and variable data field
Services Schema
Service data defines property-level services:
| Field | Type | Description |
|---|---|---|
Id | String | Unique service ID |
EnterpriseId | String | Property/enterprise ID |
Name | String | Service name |
Description | String | Service description |
IsActive | Boolean | Whether service is active |
Options | Struct | Service options and settings |
Schema Type: StructType with minimal nesting
Resource Categories Schema
Resource category data defines room types:
| Field | Type | Description |
|---|---|---|
Id | String | Unique category ID |
EnterpriseId | String | Property/enterprise ID |
ServiceId | String | Associated service |
Name | String | Category name (e.g., “Deluxe King”) |
ShortName | String | Abbreviated name |
Description | String | Category description |
IsActive | Boolean | Whether category is active |
Capacity | Integer | Maximum occupancy |
ExtraCapacity | Integer | Extra bed capacity |
Ordering | Integer | Display order |
Schema Type: StructType with basic fields
Rates Schema
Rate data defines rate plans:
| Field | Type | Description |
|---|---|---|
Id | String | Unique rate ID |
ServiceId | String | Associated service |
GroupId | String | Rate group reference |
Name | String | Rate plan name |
ShortName | String | Abbreviated name |
IsActive | Boolean | Whether rate is active |
IsPublic | Boolean | Whether rate is publicly bookable |
IsEnabled | Boolean | Whether rate is enabled |
Schema Type: StructType with basic fields
Business Segments Schema
Business segment data defines market segments:
| Field | Type | Description |
|---|---|---|
Id | String | Unique segment ID |
ServiceId | String | Associated service |
Name | String | Segment name |
IsActive | Boolean | Whether segment is active |
Schema Type: StructType with basic fields
Age Categories Schema
Age category data defines guest types:
| Field | Type | Description |
|---|---|---|
Id | String | Unique category ID |
ServiceId | String | Associated service |
Name | String | Category name (e.g., “Adult”, “Child”) |
Classification | String | Either “Adult” or “Child” |
Ordering | Integer | Display order |
Schema Type: StructType with basic fields
Companies Schema
Company data contains corporate account information:
| Field | Type | Description |
|---|---|---|
Id | String | Unique company ID |
Name | String | Company name |
Number | String | Company number/code |
Identifier | String | Business identifier |
TaxIdentifier | String | Tax ID number |
AdditionalTaxIdentifier | String | Secondary tax ID |
BillingCode | String | Billing reference code |
AccountingCode | String | Accounting system code |
Address | Struct | Company address |
InvoiceDueInterval | String | Payment terms |
Contact | Struct | Contact person details |
Options | Struct | Company options |
Schema Type: StructType with nested address and contact
Products Schema
Product data defines orderable items:
| Field | Type | Description |
|---|---|---|
Id | String | Unique product ID |
ServiceId | String | Associated service |
Name | String | Product name |
ShortName | String | Abbreviated name |
Description | String | Product description |
ExternalName | String | External system name |
Charging | String | Charging mode |
Posting | String | Posting mode |
Promotions | Struct | Promotion settings |
Classifications | Struct | Product classifications |
Classifications.Food | Boolean | Whether classified as food |
Classifications.Beverage | Boolean | Whether classified as beverage |
Classifications.Wellness | Boolean | Whether classified as wellness |
Classifications.CityTax | Boolean | Whether classified as city tax |
AccountingCategoryId | String | Accounting category reference |
Options | Struct | Product options |
Schema Type: StructType with nested classifications
Implementation Details
Core Methods
run()
Main execution method that orchestrates the ingest process:
- Iterates through each property configured in the job context
- Calls
_process_property()for each property
_process_property(property_data: dict)
Processes a single property:
- Determines sync dates using
_get_sync_dates() - Falls back to
sync_datesconstructor parameter if no dates found - Calls
_process_sync_date()for each date
_get_sync_dates(property_id: str, table: str = "reservations")
Determines dates to sync:
- Calls
get_next_sync_dates()from base class to get missing dates - Falls back to
self.sync_datesif no dates found - Returns empty list if neither source has dates
_process_sync_date(property_id: str, sync_date: str)
Comprehensive data ingestion for a single property and date:
- Fetch Reservations: Calls
_fetch_reservations()with 1-day buffer - Extract IDs: Calls
_extract_related_ids()to collect master data IDs - Sync Master Data: Sequentially syncs services, resource categories, rates, business segments, age categories
- Save Reservations: Maps ServiceId to property_id and writes to catalog
- Sync Transactional Data: Syncs companies, customers, order items
- Extract Products: If order items contain products, syncs product master data
_fetch_reservations(property_id: str, sync_date: datetime)
Fetches reservations for a specific date:
- Calculates buffer window:
sync_date - 1 daytosync_date + 1 day - Calls
MewsAPIClient.get_all_reservations()with UpdatedUtc filter - Returns empty list if no reservations found
_extract_related_ids(reservations: list[Row | dict])
Extracts related entity IDs from reservations:
- Iterates through reservations
- Collects unique ServiceIds, RateIds, BusinessSegmentIds, CompanyIds, AccountIds
- Extracts AgeCategoryIds from PersonCounts array
- Returns dictionary of ID lists
_chunk_ids(ids: list, chunk_size: int = 1000)
Splits ID lists into chunks:
- Chunk Size: 1000 IDs (Mews API limit)
- Empty Handling: Returns empty list if no IDs
- Usage: Used by all fetch methods to respect API limits
_fetch_entities(...)
Generic entity fetching with chunking and logging:
- Validates input IDs (returns empty list if none)
- Chunks IDs using
_chunk_ids() - Logs progress for each chunk
- Calls provided fetch function for each chunk
- Aggregates results across all chunks
- Returns combined entity list
_write_entities(...)
Generic entity writing to catalog:
- Creates Spark DataFrame from entity list
- Adds
synced_timestampandsynced_datecolumns - Handles
property_idbased on partition settings:- If
property_columnspecified: Maps from entity column with fallback - If
property_partition=True: Uses literal property_id
- If
- Calls
catalog.write_to_ingest()with appropriate partitioning
_sync_services(service_ids: list, property_id: str, sync_date: str)
Syncs service master data:
- Fetches services using chunked requests
- Writes to
servicestable with property partition - Creates and returns ServiceId → EnterpriseId mapping dictionary
_save_reservations(...)
Saves reservation data:
- Creates DataFrame from reservation list
- Renames
VoucherIdtoVoucherCode - Adds metadata columns
- Maps ServiceId to property_id using service_to_enterprise_map
- Writes to
reservationstable with property partition
_sync_resource_categories(service_ids: list, property_id: str, sync_date: str)
Syncs resource category (room type) master data:
- Fetches by service_ids and enterprise_ids
- Writes to
resource_categoriestable with property partition
_sync_rates(rate_ids: list, property_id: str, sync_date: str)
Syncs rate plan master data:
- Fetches by rate_ids and enterprise_ids
- Writes to
ratestable without property partition
_sync_business_segments(business_segment_ids: list, property_id: str, sync_date: str)
Syncs business segment master data:
- Fetches by business_segment_ids and enterprise_ids
- Writes to
business_segmentstable without property partition
_sync_age_categories(age_category_ids: list, property_id: str, sync_date: str)
Syncs age category master data:
- Fetches by age_category_ids and enterprise_ids
- Writes to
age_categoriestable without property partition
_sync_companies(company_ids: list, property_id: str, sync_date: str)
Syncs company transactional data:
- Fetches by company_ids and enterprise_ids
- Writes to
companiestable without property partition
_sync_customers(account_ids: list, property_id: str, sync_date: str)
Syncs customer transactional data:
- Fetches by customer_ids (account_ids) and enterprise_ids
- Includes Customers and Addresses extent
- Writes to
customerstable without property partition
_sync_order_items(reservation_ids: list, property_id: str, sync_date: str)
Syncs order item revenue data:
- Fetches by service_order_ids (reservation_ids)
- Writes to
order_itemstable with property partition - Returns raw order items list for product extraction
_extract_product_ids(order_items: list[dict])
Extracts product IDs from order items:
- Filters for items with
Type=Product - Extracts
Data.Product.ProductIdfrom each - Returns set of unique product IDs
_sync_products(product_ids: list, property_id: str, sync_date: str)
Syncs product master data:
- Fetches by product_ids and enterprise_ids
- Writes to
productstable with property partition
_get_credentials()
Retrieves API credentials from AWS Secrets Manager:
- Creates boto3 session for eu-central-1 region
- Fetches secret using
self._secret_name - Parses JSON secret string
- Validates presence of
client_tokenandaccess_token - Returns credentials dictionary
Error Handling
- AWS Secrets Manager Failures: If secret retrieval fails, exception propagates and stops ingestion
- Missing Configuration: Raises
ValueErrorifsecret_nameorbase_urlmissing - API Request Failures: Handled by MewsAPIClient with exponential backoff retry logic
- Empty Responses: Handled gracefully; empty DataFrames created without errors
- Chunking: Each chunk is processed independently; failures in one chunk don’t block others
- Entity Writes: Skip writing if entity list is empty (no DataFrame creation)
Type Safety
All data structures are validated against PySpark schemas during DataFrame creation. This ensures:
- Type consistency for downstream processing
- Early detection of schema changes in the API
- Null-safe handling of optional fields
- Preserved nested structure for complex objects (PersonCounts, Options, Amount, etc.)
Fields are primarily StringType to preserve raw API values. Type casting and business logic occur in the cleaning stage of the workflow.
Catalog Output
After successful ingestion, data is available in the ingest catalog:
Database: etl_gp_ingest_mews
Tables:
reservations- Complete reservation details, partitioned bysynced_dateandproperty_idcustomers- Guest profile data, partitioned bysynced_dateorder_items- Revenue line items, partitioned bysynced_dateandproperty_idservices- Property services, partitioned bysynced_dateandproperty_idresource_categories- Room types, partitioned bysynced_dateandproperty_idrates- Rate plans, partitioned bysynced_datebusiness_segments- Market segments, partitioned bysynced_dateage_categories- Guest type classifications, partitioned bysynced_datecompanies- Corporate accounts, partitioned bysynced_dateproducts- F&B and orderable items, partitioned bysynced_dateandproperty_id
API Rate Limiting Considerations
The Mews Connector API enforces rate limits:
- General Endpoints: Subject to enterprise-level rate limiting
- Pagination: Limited to 1000 records per request (controlled by
Limitation.Count) - Retry Logic: Exponential backoff with 2-30 second delays, up to 5 attempts
- Delay Between Pages: 100ms to avoid aggressive pagination
If rate limits are encountered:
- The
MewsAPIClientwill automatically retry with exponential backoff - Consider reducing chunk size from 1000 to 500
- Distribute sync dates across multiple job runs
- Contact Mews support to increase rate limits if needed
Usage in Workflow
The Mews ingester is typically invoked as the first stage in the ETL workflow:
from etl_lib.job.JobContext import JobContext
from etl_lib.pipeline.ingest.mews.MewsIngester import MewsIngester
# Standard incremental ingest
job_context = JobContext(source="mews", property_ids=["PROPERTY_ID"])
MewsIngester(job_context).run()
# Backfill specific dates
job_context = JobContext(source="mews", property_ids=["PROPERTY_ID"])
MewsIngester(job_context, sync_dates=["2025-01-01", "2025-01-02"]).run()The ingest data is then processed by the cleaning pipeline to transform nested structures into flat tables suitable for analytics.