=======================
Launch the project code
=======================

:Authors:
    Cao Tri DO <caotri.do88@gmail.com>
:Version: 2025-09

.. admonition:: Objectives
    :class: important

    This article is intended to guide you through the steps to launch the project code.

Load the data in the Databricks Unity Catalog 
----------------------------------------------

.. code-block:: bash

   uv run scripts/run_upload_data.py --env all --env-file .env --config project_config.yml

.. note::
   if you need to delete all the data:

   .. code-block:: bash

        uv run scripts/run_cleanup_data.py --env all --env-file .env --config project_config.yml

Preprocess the data
--------------------

1. Go to the script ``scripts/process_data.py`` and run it

You will need at least these elements in your ``project_config.yml`` file:

.. code-block:: yaml

    prd:
        catalog_name: mlops_prd
        schema_name: caotrido
        volume_name: data
        raw_data_file: "Hotel Reservations.csv"
        train_table: hotel_reservations_train_set
        test_table: hotel_reservations_test_set
    acc:
        catalog_name: mlops_acc
        schema_name: caotrido
        volume_name: data
        raw_data_file: "Hotel Reservations.csv"
        train_table: hotel_reservations_train_set
        test_table: hotel_reservations_test_set
    dev:
        catalog_name: mlops_dev
        schema_name: caotrido
        volume_name: data
        raw_data_file: "Hotel Reservations.csv"
        train_table: hotel_reservations_train_set
        test_table: hotel_reservations_test_set


Running an experiment
---------------------

1. Create the workspace folder ``/Shared/experiments`` in Databricks (manually or via script)

.. code-block:: bash

   uv run scripts/run_create_mlflow_workspace.py --env-file ./.env --config-file ./project_config.yml --environment dev

This will: 

- Reading the path from the config → ProjectConfig.from_yaml : ``config.experiment_name_basic = /Shared/experiments/hotel_reservations``
- Securing the parent directory → We extract exp_dir = ``/Shared/experiments`` and create it if needed. This way, you can run your script on a fresh workspace without crashing.
- Activating the experiment → mlflow.set_experiment(experiment_path) is, of course, aligned with what you defined in your YAML.

This will need at least:

- In your yaml file:

.. code-block:: yaml

   experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
   experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom

2. Go into ``scripts/train_register_model.py`` and run it 

This will need at least:

- In your ``.yaml`` file:

.. code-block:: yaml

    experiment_name_basic: "/Shared/experiments/hotel_reservations" # /Users/caotri.do881@gmail.com/hotel-reservation-basic
    experiment_name_custom: "/Shared/experiments/hotel_reservations_custom" # /Users/caotri.do881@gmail.com/hotel-reservation-custom
    model_name: hotel_reservation_LR
    model_type: logistic-regression


    parameters:
    C: 1.0
    max_iter: 1000
    solver: lbfgs

and in your ``.env`` file:

.. code-block:: bash

   PROFILE="dev-free"

This correspond to your databricks profile in ``~/.databrickscfg``

3. Go into **Databricks** >> **Workspace** >> **Shared** >> **experiments** >> **hotel_reservations** to see your experiment 

.. note::
    You can delete all the experiments and models in the UI, but you can also use the script ``scripts/run_cleanup_mlflow.py`` to delete all the experiments and models created by the project.

    .. code-block:: bash
    
        uv run scripts/run_cleanup_mlflow_experiments.py --env-file ./.env --config-file ./project_config.yml --environment dev