Skip to Content
DevelopmentETL JobsETL Jobs

ETL Jobs

Extract, Transform, Load Jobs of hotel data for generating insights.

Getting Started

It is recommended to use the provided Docker environment for a consistent development experience.

Prerequisites

  • Docker
  • AWS CLI configured with the necessary profiles.

Running ETL Jobs

To run an ETL job, use the make run command with the following environment variables:

  • AWS_PROFILE: The AWS CLI profile to use (default: gp-etl).
  • GLUE_JOB_NAME: The name of the Glue job (default: dev-athenaeum-LOCAL).
  • GLUE_JOB_SCRIPT: The entry-point Python file for the job, located in the jobs/ directory (default: athenaeum.py).

Example:

make run AWS_PROFILE=gp-etl GLUE_JOB_NAME=dev-athenaeum-LOCAL GLUE_JOB_SCRIPT=athenaeum.py

Interactive Development (Jupyter)

To start an interactive Jupyter Lab environment:

make dev

This will start a Jupyter Lab server accessible at http://localhost:8889 .

  • Token: guestpulse
  • Workspace: The jobs/ and etl_lib/ directories are mounted, allowing you to edit code locally and see changes immediately.

Running Tests

To run the tests, use the make tests command:

make tests

You can also run a specific test by setting the TARGET_TEST environment variable:

make tests TARGET_TEST=etl_lib/tests/test_orchestrator.py

Other Commands

  • make help: Show available targets and environment variables.
  • make shell: Run a one-off interactive shell.
  • make logs: Follow the logs of the running container.
  • make down: Stop and remove the container.

Development Reference

Last updated on