Launch the project code¶
- Authors:
Cao Tri DO <caotri.do88@gmail.com>
- Version:
2025-09
Objectives
This article is intended to guide you through the steps to launch the project code.
Load the data in the Databricks Unity Catalog¶
uv run scripts/run_upload_data.py --env all --env-file .env --config project_config.yml
Note
if you need to delete all the data:
uv run scripts/run_cleanup_data.py --env all --env-file .env --config project_config.yml
Preprocess the data¶
Go to the script
scripts/process_data.pyand run it
You will need at least these elements in your project_config.yml file:
prd:
catalog_name: mlops_prd
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
train_table: hotel_reservations_train_set
test_table: hotel_reservations_test_set
acc:
catalog_name: mlops_acc
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
train_table: hotel_reservations_train_set
test_table: hotel_reservations_test_set
dev:
catalog_name: mlops_dev
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
train_table: hotel_reservations_train_set
test_table: hotel_reservations_test_set
Running an experiment¶
Create the workspace folder
/Shared/experimentsin Databricks (manually or via script)
uv run scripts/run_create_mlflow_workspace.py --env-file ./.env --config-file ./project_config.yml --environment dev
This will:
Reading the path from the config → ProjectConfig.from_yaml :
config.experiment_name_basic = /Shared/experiments/hotel_reservationsSecuring the parent directory → We extract exp_dir =
/Shared/experimentsand create it if needed. This way, you can run your script on a fresh workspace without crashing.Activating the experiment → mlflow.set_experiment(experiment_path) is, of course, aligned with what you defined in your YAML.
This will need at least:
In your yaml file:
experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom
Go into
scripts/train_register_model.pyand run it
This will need at least:
In your
.yamlfile:
experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom
model_name: hotel_reservation_LR
model_type: logistic-regression
parameters:
C: 1.0
max_iter: 1000
solver: lbfgs
and in your .env file:
PROFILE="dev-free"
This correspond to your databricks profile in ~/.databrickscfg
Go into Databricks >> Workspace >> Shared >> experiments >> hotel_reservations to see your experiment
Note
You can delete all the experiments and models in the UI, but you can also use the script scripts/run_cleanup_mlflow.py to delete all the experiments and models created by the project.
uv run scripts/run_cleanup_mlflow_experiments.py --env-file ./.env --config-file ./project_config.yml --environment dev