Unity Catalog Setup

Authors:

Cao Tri DO <caotri.do88@gmail.com>

Version:

2025-09

Objectives

This article is intended to guide you through the setup of Unity Catalog for the project

Manual Installation

  1. Download the ressources at https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset?resource=download and unzip the archive

  2. Go into Databricks >> Catalog

  3. Create 3 new catalogs:
    • mlops_dev

    • mlops_acc

    • mlops_prd

  4. In each catalog, create a schemas named caotrido

  5. In each schema, create a volume named data

  6. Upload the Hotel Reservations.csv file into the data volume

Automated Installation

We provide a module and script to automate the setup of Unity Catalog:

  • to load automatically the data into Unity Catalog in the right schema

  • to delete the created schemas and volumes

Pre-requisites

This action could not be automated because of the admin rights on Unity Catalog that are required to create catalogs.

  1. Go into Databricks >> Catalog

  2. Create 3 new catalogs:
    • mlops_dev

    • mlops_acc

    • mlops_prd

You will need at least these elements in your project_config.yml file:

prd:
    catalog_name: mlops_prd
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"
acc:
    catalog_name: mlops_acc
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"
dev:
    catalog_name: mlops_dev
    schema_name: caotrido
    volume_name: data
    raw_data_file: "Hotel Reservations.csv"

# Data Sources
data_source:
source_type: kaggle            # "kaggle" or "local"
kaggle_dataset: ahsan81/hotel-reservations-classification-dataset
local_path: ./data/raw         # if source_type = local

Load the data into the right schema

  • On one environment (dev):

uv run scripts/run_upload.py --env dev --env-file .env --config project_config.yml
  • On all environment:

uv run scripts/run_upload.py --env all --env-file .env --config project_config.yml

Clean up the ressources

  • On one environment (dev):

uv run scripts/run_cleanup.py --env dev --env-file .env --config project_config.yml
  • On all environment:

uv run scripts/run_cleanup.py --env all --env-file .env --config project_config.yml