Unity Catalog Setup¶
- Authors:
Cao Tri DO <caotri.do88@gmail.com>
- Version:
2025-09
Objectives
This article is intended to guide you through the setup of Unity Catalog for the project
Manual Installation¶
Download the ressources at https://www.kaggle.com/datasets/ahsan81/hotel-reservations-classification-dataset?resource=download and unzip the archive
Go into Databricks >> Catalog
- Create 3 new catalogs:
mlops_dev
mlops_acc
mlops_prd
In each catalog, create a schemas named caotrido
In each schema, create a volume named data
Upload the Hotel Reservations.csv file into the data volume
Automated Installation¶
We provide a module and script to automate the setup of Unity Catalog:
to load automatically the data into Unity Catalog in the right schema
to delete the created schemas and volumes
Pre-requisites¶
This action could not be automated because of the admin rights on Unity Catalog that are required to create catalogs.
Go into Databricks >> Catalog
- Create 3 new catalogs:
mlops_dev
mlops_acc
mlops_prd
You will need at least these elements in your project_config.yml file:
prd:
catalog_name: mlops_prd
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
acc:
catalog_name: mlops_acc
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
dev:
catalog_name: mlops_dev
schema_name: caotrido
volume_name: data
raw_data_file: "Hotel Reservations.csv"
# Data Sources
data_source:
source_type: kaggle # "kaggle" or "local"
kaggle_dataset: ahsan81/hotel-reservations-classification-dataset
local_path: ./data/raw # if source_type = local
Load the data into the right schema¶
On one environment (dev):
uv run scripts/run_upload.py --env dev --env-file .env --config project_config.yml
On all environment:
uv run scripts/run_upload.py --env all --env-file .env --config project_config.yml
Clean up the ressources¶
On one environment (dev):
uv run scripts/run_cleanup.py --env dev --env-file .env --config project_config.yml
On all environment:
uv run scripts/run_cleanup.py --env all --env-file .env --config project_config.yml