Launch the project code

Authors:

Cao Tri DO <caotri.do88@gmail.com>

Version:

2025-09

Objectives

This article is intended to guide you through the steps to launch the project code.

Load the data in the Databricks Unity Catalog

uv run scripts/run_upload_data.py --env all --env-file .env --config project_config.yml

Note

if you need to delete all the data:

uv run scripts/run_cleanup_data.py --env all --env-file .env --config project_config.yml

Preprocess the data

  1. Go to the script scripts/process_data.py and run it

You will need at least these elements in your project_config.yml file:

prd:
    catalog_name: mlops_prd
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"
    train_table: hotel_reservations_train_set
    test_table: hotel_reservations_test_set
acc:
    catalog_name: mlops_acc
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"
    train_table: hotel_reservations_train_set
    test_table: hotel_reservations_test_set
dev:
    catalog_name: mlops_dev
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"
    train_table: hotel_reservations_train_set
    test_table: hotel_reservations_test_set

Running an experiment

  1. Create the workspace folder /Shared/experiments in Databricks (manually or via script)

uv run scripts/run_create_mlflow_workspace.py --env-file ./.env --config-file ./project_config.yml --environment dev

This will:

  • Reading the path from the config → ProjectConfig.from_yaml : config.experiment_name_basic = /Shared/experiments/hotel_reservations

  • Securing the parent directory → We extract exp_dir = /Shared/experiments and create it if needed. This way, you can run your script on a fresh workspace without crashing.

  • Activating the experiment → mlflow.set_experiment(experiment_path) is, of course, aligned with what you defined in your YAML.

This will need at least:

  • In your yaml file:

experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom
  1. Go into scripts/train_register_model.py and run it

This will need at least:

  • In your .yaml file:

experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom
model_name: hotel_reservation_LR
model_type: logistic-regression


parameters:
C: 1.0
max_iter: 1000
solver: lbfgs

and in your .env file:

PROFILE="dev-free"

This correspond to your databricks profile in ~/.databrickscfg

  1. Go into Databricks >> Workspace >> Shared >> experiments >> hotel_reservations to see your experiment

Note

You can delete all the experiments and models in the UI, but you can also use the script scripts/run_cleanup_mlflow.py to delete all the experiments and models created by the project.

uv run scripts/run_cleanup_mlflow_experiments.py --env-file ./.env --config-file ./project_config.yml --environment dev