ReplaceValuesTask
The ReplaceValuesTask replaces values in DataFrames based on configured mappings. It allows for data standardization and correction based on business rules defined in the job configuration.
Overview
This task enables flexible data transformation by:
- Reading value replacement mappings from job configuration
- Applying replacements to any column across all models
- Preserving original values in
_og(original) columns - Supporting guest, reservation, and room data
Incremental Behavior
- Supports incremental mode: if
job_context.is_incrementalis set, the task will operate onProcessedAdded*models and write to the same Added models with the replaced values. - During merge, the task merges added models into base processed models, deduplicating by
guest_id,res_id, orres_id+room_stay_datefor rooms.
Implementation Notes
- Replacements are done using an equality join between each column and a small mapping DataFrame constructed from config
(old_val -> new_val)pairs. - The original column is preserved as
{column}_ogand the replaced value stored in the original column name.
Flow Diagram
Models
Requires
ProcessedGuestModel- Processed guest dataProcessedReservationModel- Processed reservation dataProcessedRoomModel- Processed room data
Provides
ProcessedGuestModel- Guest data with replaced valuesProcessedReservationModel- Reservation data with replaced valuesProcessedRoomModel- Room data with replaced values
Expanding Codes
Correcting Data Entry Errors
Configuration Best Practices
-
Case Sensitivity: Mappings are case-sensitive. Include all variations in the mapping config.
-
Null Handling: Null values are not replaced. Original nulls remain null.
-
Partial Matches: Only exact matches are replaced. Use cleaning tasks for fuzzy matching.
-
Common Patterns: Define replacements at the config level for reuse across chains.
Related Tasks
- ProcessingTask - Parent orchestrator
- Task - Base task class
- Configuration - Value replacement configuration
Last updated on