=================== Unity Catalog Setup =================== :Authors: Cao Tri DO :Version: 2025-09 .. admonition:: Objectives :class: important This article is intended to guide you through the setup of Unity Catalog for the project Manual Installation =================== 1. Download the ressources at https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset?resource=download and unzip the archive 2. Go into Databricks >> Catalog 3. Create 3 new catalogs: - mlops_dev - mlops_acc - mlops_prd 4. In each catalog, create a schemas named **caotrido** 5. In each schema, create a volume named **data** 6. Upload the **Hotel Reservations.csv** file into the **data** volume Automated Installation ======================== We provide a module and script to automate the setup of Unity Catalog: - to load automatically the data into Unity Catalog in the right schema - to delete the created schemas and volumes Pre-requisites -------------- This action could not be automated because of the admin rights on Unity Catalog that are required to create catalogs. 1. Go into Databricks >> Catalog 2. Create 3 new catalogs: - mlops_dev - mlops_acc - mlops_prd You will need at least these elements in your ``project_config.yml`` file: .. code-block:: yaml prd: catalog_name: mlops_prd schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" acc: catalog_name: mlops_acc schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" dev: catalog_name: mlops_dev schema_name: caotrido volume_name: data raw_data_file: "Hotel Reservations.csv" # Data Sources data_source: source_type: kaggle # "kaggle" or "local" kaggle_dataset: ahsan81/hotel-reservations-classification-dataset local_path: ./data/raw # if source_type = local Load the data into the right schema ----------------------------------- - On one environment (dev): .. code-block:: bash uv run scripts/run_upload.py --env dev --env-file .env --config project_config.yml - On all environment: .. code-block:: bash uv run scripts/run_upload.py --env all --env-file .env --config project_config.yml Clean up the ressources ----------------------- - On one environment (dev): .. code-block:: bash uv run scripts/run_cleanup.py --env dev --env-file .env --config project_config.yml - On all environment: .. code-block:: bash uv run scripts/run_cleanup.py --env all --env-file .env --config project_config.yml