Skip to Content
ProcessesTasksProcessingReplaceValuesTask

ReplaceValuesTask

The ReplaceValuesTask replaces values in DataFrames based on configured mappings. It allows for data standardization and correction based on business rules defined in the job configuration.

Overview

This task enables flexible data transformation by:

  • Reading value replacement mappings from job configuration
  • Applying replacements to any column across all models
  • Preserving original values in _og (original) columns
  • Supporting guest, reservation, and room data

Incremental Behavior

  • Supports incremental mode: if job_context.is_incremental is set, the task will operate on ProcessedAdded* models and write to the same Added models with the replaced values.
  • During merge, the task merges added models into base processed models, deduplicating by guest_id, res_id, or res_id+room_stay_date for rooms.

Implementation Notes

  • Replacements are done using an equality join between each column and a small mapping DataFrame constructed from config (old_val -> new_val) pairs.
  • The original column is preserved as {column}_og and the replaced value stored in the original column name.

Flow Diagram

Models

Requires

  • ProcessedGuestModel - Processed guest data
  • ProcessedReservationModel - Processed reservation data
  • ProcessedRoomModel - Processed room data

Provides

  • ProcessedGuestModel - Guest data with replaced values
  • ProcessedReservationModel - Reservation data with replaced values
  • ProcessedRoomModel - Room data with replaced values

Expanding Codes

Correcting Data Entry Errors

Configuration Best Practices

  1. Case Sensitivity: Mappings are case-sensitive. Include all variations in the mapping config.

  2. Null Handling: Null values are not replaced. Original nulls remain null.

  3. Partial Matches: Only exact matches are replaced. Use cleaning tasks for fuzzy matching.

  4. Common Patterns: Define replacements at the config level for reuse across chains.

Last updated on