ETL Jobs
Extract, Transform, Load Jobs of hotel data for generating insights.
Getting Started
It is recommended to use the provided Docker environment for a consistent development experience.
Prerequisites
- Docker
- AWS CLI configured with the necessary profiles.
Running ETL Jobs
To run an ETL job, use the make run command with the following environment variables:
AWS_PROFILE: The AWS CLI profile to use (default:gp-etl).GLUE_JOB_NAME: The name of the Glue job (default:dev-athenaeum-LOCAL).GLUE_JOB_SCRIPT: The entry-point Python file for the job, located in thejobs/directory (default:athenaeum.py).
Example:
make run AWS_PROFILE=gp-etl GLUE_JOB_NAME=dev-athenaeum-LOCAL GLUE_JOB_SCRIPT=athenaeum.pyInteractive Development (Jupyter)
To start an interactive Jupyter Lab environment:
make devThis will start a Jupyter Lab server accessible at http://localhost:8889 .
- Token:
guestpulse - Workspace: The
jobs/andetl_lib/directories are mounted, allowing you to edit code locally and see changes immediately.
Running Tests
To run the tests, use the make tests command:
make testsYou can also run a specific test by setting the TARGET_TEST environment variable:
make tests TARGET_TEST=etl_lib/tests/test_orchestrator.pyOther Commands
make help: Show available targets and environment variables.make shell: Run a one-off interactive shell.make logs: Follow the logs of the running container.make down: Stop and remove the container.
Development Reference
Last updated on