Database CLI, DB Connect and VS Code Extension Setup¶
- Authors:
Cao Tri DO <caotri.do88@gmail.com>
- Version:
2025-09
Objectives
This article is intended to give you an overview of how to set up your local environment to connect to Databricks using Database CLI, DB Connect and VS Code Extension.
Introduction¶
Installing the tools¶
Databricks CLI¶
The Databricks CLI is a command-line interface that allows users to interact with Databricks workspaces programmatically.
Use Case
Automating repetitive tasks
Scripting workspace operations
Integrating Databricks operations into CI/CD pipelines
Installation
Install the Databricks CLI using uv:
uv add databricks-cli
Check that everything works
uv run databricks --version
- Initiate Authentication to configure Databricks CLI
Method 1: With an access token
uv run databricks configure --token
Note
You will need to give:
Host → URL of your Databricks workspace (ex : https://adb-1234567890.12.azuredatabricks.net)
Token → A personnal access token from your Databricks user profile (User Settings > Access tokens).
Method 2: With an Auth authentication (using web browser)
uv run databricks auth login - configure-cluster - host <YOUR HOST>
After entering your information, the CLI will prompt you to save it under a Databricks configuration profile. You can accept the suggested name or enter a new one. This profile can be overwritten if it already exists.
Manage Profiles: To list or view settings of existing profiles, use:
uv run databricks auth profiles
Your configuration will be available:
nano ~/.databrickscfg
Example of a .databrickscfg profile:
[DEFAULT]
host = https://dbc-c2e8445d-159d.cloud.databricks.com/
auth_type = databricks-cli
; This profile is autogenerated by the Databricks Extension for VS Code
[caotrido]
host = https://adb-2886740019606493.13.azuredatabricks.net/
token = xxxxxxxxxxxxxxxxxxx
; This profile is autogenerated by the Databricks Extension for VS Code
[dev]
host = https://dbc-c2e8445d-159d.cloud.databricks.com/
auth_type = databricks-cli
[my-dbc-profile]
host = https://dbc-c36d09ec-dbbe.cloud.databricks.com/
token = xxxxxxxxxxxxxxxxxxx
View Token Information: To view the current OAuth token and its expiration:
uv run databricks auth token --host <YOUR HOST>
If in the future, you need to reconfigure a profile, just run:
uv run databricks configure --token --profile my-dbc-profile
To list all your available clusters:
uv run databricks clusters list
Or if you need on a specific profile
uv run databricks clusters list --profile my-dbc-profile
Databricks Connect (DB Connect)¶
Databricks Connect (DB Connect) is a tool that lets you run Spark code from your local machine (or IDE like VS Code, PyCharm, IntelliJ…) while using a Databricks cluster as the execution engine.
In simple terms:
You write PySpark, Scala, Java, or R code locally.
Databricks Connect forwards your code → execution happens on the Databricks cluster, not on your laptop.
You get the power of Databricks + cluster resources, while coding in your favorite environment.
Use cases
Develop and test Spark code in your IDE instead of only in Databricks notebooks.
Reuse existing Spark code without rewriting it in notebooks.
Run distributed Spark jobs without needing a powerful local machine.
Debug more easily with local dev tools.
Developing and testing code locally
Debugging with local tools
Seamlessly transitioning code from local development to production environments
How it works
Install Databricks Connect (Python package databricks-connect or Maven/Scala dependency).
Configure it with your workspace details (URL + token + cluster ID).
Your local SparkSession connects to the cluster instead of running Spark locally.
Installation and Usage
- Prerequisites
Databricks workspace (Azure, AWS, or GCP).
Cluster running (with Databricks Runtime supported by DB Connect — usually the latest LTS runtime).
uv (your package manager).
Add DB Connect:
uv add databricks-connect
Configure DB Connect
uv run databricks configure --token --profile my-dbc-profile
Note
It will ask for:
Databricks Host → the URL of your workspace (example: http://dbc-f122dc18-1b68.cloud.databricks.com/)
Databricks Token → generate a personal access token in User Settings > Access tokens.
Now you will need to add the id of the cluster in your databricks configuration:
nano ~/.databrickscfg
and add this line to your desired profile:
cluster_id = faea85fdea5744e5
Verify the connection
DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run databricks-connect test
If everything is set up, you’ll see checks like:
✅ SparkSession created
✅ Cluster reachable
Warning
Note that in the new Databricks Connect (v13+), the profile must be chosen via the environment variable DATABRICKS_CONFIG_PROFILE .
Use DB Connect in Python. Create a script
myscript.py
from pyspark.sql import SparkSession
# Create Spark session (automatically points to Databricks cluster)
spark = SparkSession.builder.getOrCreate()
# Example: read a table from Databricks
df = spark.read.csv("./data/myfile.csv", header=True)
print("Row count:", df.count())
df.show(5)
Use uv run if DB Connect is local to a project
uv run python myscript.py
Note
Even though this runs locally, the computation is actually performed on your Databricks cluster.
Local vs Remote Execution
If you want to switch between local and remote execution on Databricks cluster:
Adapt the code in
myscript.pyimport os from pyspark.sql import SparkSession if os.getenv("USE_DB_CONNECT", "false").lower() == "true": # Databricks Connect → executes on your Databricks cluster spark = SparkSession.builder.getOrCreate() print("➡ Running on Databricks cluster") else: # Local Spark spark = SparkSession.builder \ .master("local[*]") \ .appName("LocalSpark") \ .getOrCreate() print("➡ Running locally") df = spark.range(1000) print("Row count:", df.count())
If you call SparkSession.builder.getOrCreate() with no master, and DB Connect is configured → it goes to Databricks cluster.
If you call .master(“local[*]”) → it runs locally.
To run on Databricks
USE_DB_CONNECT=true DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run python myscript.py
To run locally
DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run python myscript.py
Note:
Databricks CLI → command-line tool to manage Databricks resources (clusters, jobs, secrets, DBFS, etc.).
Databricks Connect → a bridge to run local Spark code on a Databricks cluster.
Databricks VS Code extension¶
The official Databricks VS Code extension lets you:
Connect your VS Code to a Databricks workspace (via URL + PAT token).
Browse and edit Databricks files (notebooks, repos, DBFS files).
Sync code between your local machine and Databricks (so you can edit locally, run remotely).
Run notebooks and Python files directly on your Databricks cluster, without needing DB Connect.
Manage clusters, jobs, repos right from VS Code.
It’s more of a workspace integration tool, whereas DB Connect is a remote Spark execution bridge.
Key difference from Databricks Connect¶
Feature |
Databricks Connect |
Databricks VS Code Extension |
|---|---|---|
Execution model |
Redirects your local Spark code to Databricks cluster |
Runs scripts or notebooks inside Databricks workspace |
Setup |
Needs runtime version match (DBR ↔ Connect) |
Just configure workspace URL + token |
Best for |
Developers wanting PySpark API locally in IDE |
Developers managing Databricks repos, jobs, notebooks |
Limitations |
Tightly coupled to Spark runtime versions |
Doesn’t expose SparkSession locally |
How to install & use the VS Code extension¶
Open VS Code → go to Extensions (Ctrl+Shift+X).
Search for Databricks → install the official one.
In VS Code, press Ctrl+Shift+P → type Databricks: Configure Workspace.
Enter:
Workspace URL (e.g. https://adb-1234567890.11.azuredatabricks.net)
PAT Token (from User Settings → Access Tokens).
Once connected, you can:
Browse clusters, repos, jobs from the sidebar.
Right-click a .py or .dbc notebook → Run on Databricks.
Sync a local repo to a Databricks repo.
When to use which¶
Use Databricks Connect if you want to develop PySpark code locally and still leverage Databricks clusters.
Use the Databricks VS Code extension if you want to edit notebooks / manage jobs in VS Code but let execution happen fully inside Databricks.
In fact, some teams use both:
Databricks VS Code extension → for repo sync, notebook editing, job control.
Databricks Connect → for running Spark code locally against the cluster.