======================= Launch the project code ======================= :Authors: Cao Tri DO :Version: 2025-09 .. admonition:: Objectives :class: important This article is intended to guide you through the steps to launch the project code. Load the data in the Databricks Unity Catalog ---------------------------------------------- .. code-block:: bash uv run scripts/run_upload_data.py --env all --env-file .env --config project_config.yml .. note:: if you need to delete all the data: .. code-block:: bash uv run scripts/run_cleanup_data.py --env all --env-file .env --config project_config.yml Preprocess the data -------------------- 1. Go to the script ``scripts/process_data.py`` and run it You will need at least these elements in your ``project_config.yml`` file: .. code-block:: yaml prd: catalog_name: mlops_prd schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" train_table: hotel_reservations_train_set test_table: hotel_reservations_test_set acc: catalog_name: mlops_acc schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" train_table: hotel_reservations_train_set test_table: hotel_reservations_test_set dev: catalog_name: mlops_dev schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" train_table: hotel_reservations_train_set test_table: hotel_reservations_test_set Running an experiment --------------------- 1. Create the workspace folder ``/Shared/experiments`` in Databricks (manually or via script) .. code-block:: bash uv run scripts/run_create_mlflow_workspace.py --env-file ./.env --config-file ./project_config.yml --environment dev This will: - Reading the path from the config → ProjectConfig.from_yaml : ``config.experiment_name_basic = /Shared/experiments/hotel_reservations`` - Securing the parent directory → We extract exp_dir = ``/Shared/experiments`` and create it if needed. This way, you can run your script on a fresh workspace without crashing. - Activating the experiment → mlflow.set_experiment(experiment_path) is, of course, aligned with what you defined in your YAML. This will need at least: - In your yaml file: .. code-block:: yaml experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom 2. Go into ``scripts/train_register_model.py`` and run it This will need at least: - In your ``.yaml`` file: .. code-block:: yaml experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom model_name: hotel_reservation_LR model_type: logistic-regression parameters: C: 1.0 max_iter: 1000 solver: lbfgs and in your ``.env`` file: .. code-block:: bash PROFILE="dev-free" This correspond to your databricks profile in ``~/.databrickscfg`` 3. Go into **Databricks** >> **Workspace** >> **Shared** >> **experiments** >> **hotel_reservations** to see your experiment .. note:: You can delete all the experiments and models in the UI, but you can also use the script ``scripts/run_cleanup_mlflow.py`` to delete all the experiments and models created by the project. .. code-block:: bash uv run scripts/run_cleanup_mlflow_experiments.py --env-file ./.env --config-file ./project_config.yml --environment dev