Database CLI, DB Connect and VS Code Extension Setup¶

Authors:: Cao Tri DO <caotri.do88@gmail.com>
Version:: 2025-09

Objectives

This article is intended to give you an overview of how to set up your local environment to connect to Databricks using Database CLI, DB Connect and VS Code Extension.

Introduction¶

Installing the tools¶

Databricks CLI¶

The Databricks CLI is a command-line interface that allows users to interact with Databricks workspaces programmatically.

Use Case

Automating repetitive tasks
Scripting workspace operations
Integrating Databricks operations into CI/CD pipelines

Installation

Install the Databricks CLI using uv:

uv add databricks-cli

Check that everything works

uv run databricks --version

Initiate Authentication to configure Databricks CLI
- Method 1: With an access token
uv run databricks configure --token
Note

You will need to give:
- Host → URL of your Databricks workspace (ex : https://adb-1234567890.12.azuredatabricks.net)
- Token → A personnal access token from your Databricks user profile (User Settings > Access tokens).
- Method 2: With an Auth authentication (using web browser)
uv run databricks auth login - configure-cluster - host <YOUR HOST>
After entering your information, the CLI will prompt you to save it under a Databricks configuration profile. You can accept the suggested name or enter a new one. This profile can be overwritten if it already exists.
Manage Profiles: To list or view settings of existing profiles, use:

uv run databricks auth profiles

Your configuration will be available:

nano ~/.databrickscfg

Example of a .databrickscfg profile:

[DEFAULT]
host      = https://dbc-c2e8445d-159d.cloud.databricks.com/
auth_type = databricks-cli

; This profile is autogenerated by the Databricks Extension for VS Code
[caotrido]
host  = https://adb-2886740019606493.13.azuredatabricks.net/
token = xxxxxxxxxxxxxxxxxxx

; This profile is autogenerated by the Databricks Extension for VS Code
[dev]
host      = https://dbc-c2e8445d-159d.cloud.databricks.com/
auth_type = databricks-cli

[my-dbc-profile]
host  = https://dbc-c36d09ec-dbbe.cloud.databricks.com/
token = xxxxxxxxxxxxxxxxxxx

View Token Information: To view the current OAuth token and its expiration:

uv run databricks auth token --host <YOUR HOST>

If in the future, you need to reconfigure a profile, just run:

uv run databricks configure --token --profile my-dbc-profile

To list all your available clusters:

uv run databricks clusters list

Or if you need on a specific profile

uv run databricks clusters list --profile my-dbc-profile

Databricks Connect (DB Connect)¶

Databricks Connect (DB Connect) is a tool that lets you run Spark code from your local machine (or IDE like VS Code, PyCharm, IntelliJ…) while using a Databricks cluster as the execution engine.

In simple terms:

You write PySpark, Scala, Java, or R code locally.
Databricks Connect forwards your code → execution happens on the Databricks cluster, not on your laptop.
You get the power of Databricks + cluster resources, while coding in your favorite environment.

Use cases

Develop and test Spark code in your IDE instead of only in Databricks notebooks.
Reuse existing Spark code without rewriting it in notebooks.
Run distributed Spark jobs without needing a powerful local machine.
Debug more easily with local dev tools.
Developing and testing code locally
Debugging with local tools
Seamlessly transitioning code from local development to production environments

How it works

Install Databricks Connect (Python package databricks-connect or Maven/Scala dependency).
Configure it with your workspace details (URL + token + cluster ID).
Your local SparkSession connects to the cluster instead of running Spark locally.

Installation and Usage

Prerequisites
- Databricks workspace (Azure, AWS, or GCP).
- Cluster running (with Databricks Runtime supported by DB Connect — usually the latest LTS runtime).
- uv (your package manager).
Add DB Connect:

uv add databricks-connect

Configure DB Connect

uv run databricks configure --token --profile my-dbc-profile

Note

It will ask for:

Databricks Host → the URL of your workspace (example: http://dbc-f122dc18-1b68.cloud.databricks.com/)
Databricks Token → generate a personal access token in User Settings > Access tokens.

Now you will need to add the id of the cluster in your databricks configuration:

nano ~/.databrickscfg

and add this line to your desired profile:

cluster_id = faea85fdea5744e5

Verify the connection

DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run databricks-connect test

If everything is set up, you’ll see checks like:

✅ SparkSession created
✅ Cluster reachable

Warning

Note that in the new Databricks Connect (v13+), the profile must be chosen via the environment variable DATABRICKS_CONFIG_PROFILE .

Use DB Connect in Python. Create a script myscript.py

from pyspark.sql import SparkSession

# Create Spark session (automatically points to Databricks cluster)
spark = SparkSession.builder.getOrCreate()

# Example: read a table from Databricks
df = spark.read.csv("./data/myfile.csv", header=True)

print("Row count:", df.count())
df.show(5)

Use uv run if DB Connect is local to a project

uv run python myscript.py

Note

Even though this runs locally, the computation is actually performed on your Databricks cluster.

Local vs Remote Execution

If you want to switch between local and remote execution on Databricks cluster:

Adapt the code in myscript.py

import os
from pyspark.sql import SparkSession

if os.getenv("USE_DB_CONNECT", "false").lower() == "true":
    # Databricks Connect → executes on your Databricks cluster
    spark = SparkSession.builder.getOrCreate()
    print("➡ Running on Databricks cluster")
else:
    # Local Spark
    spark = SparkSession.builder \
        .master("local[*]") \
        .appName("LocalSpark") \
        .getOrCreate()
    print("➡ Running locally")

df = spark.range(1000)
print("Row count:", df.count())

If you call SparkSession.builder.getOrCreate() with no master, and DB Connect is configured → it goes to Databricks cluster.

If you call .master(“local[*]”) → it runs locally.

To run on Databricks

USE_DB_CONNECT=true DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run python myscript.py

To run locally

DATABRICKS_CONFIG_PROFILE=my-dbc-profile uv run python myscript.py

Note:

Databricks CLI → command-line tool to manage Databricks resources (clusters, jobs, secrets, DBFS, etc.).
Databricks Connect → a bridge to run local Spark code on a Databricks cluster.

Databricks VS Code extension¶

The official Databricks VS Code extension lets you:

Connect your VS Code to a Databricks workspace (via URL + PAT token).
Browse and edit Databricks files (notebooks, repos, DBFS files).
Sync code between your local machine and Databricks (so you can edit locally, run remotely).
Run notebooks and Python files directly on your Databricks cluster, without needing DB Connect.
Manage clusters, jobs, repos right from VS Code.

It’s more of a workspace integration tool, whereas DB Connect is a remote Spark execution bridge.

Key difference from Databricks Connect¶

Feature	Databricks Connect	Databricks VS Code Extension
Execution model	Redirects your local Spark code to Databricks cluster	Runs scripts or notebooks inside Databricks workspace
Setup	Needs runtime version match (DBR ↔ Connect)	Just configure workspace URL + token
Best for	Developers wanting PySpark API locally in IDE	Developers managing Databricks repos, jobs, notebooks
Limitations	Tightly coupled to Spark runtime versions	Doesn’t expose SparkSession locally

How to install & use the VS Code extension¶

Open VS Code → go to Extensions (Ctrl+Shift+X).
Search for Databricks → install the official one.
In VS Code, press Ctrl+Shift+P → type Databricks: Configure Workspace.
Enter:
- Workspace URL (e.g. https://adb-1234567890.11.azuredatabricks.net)
- PAT Token (from User Settings → Access Tokens).
Once connected, you can:
- Browse clusters, repos, jobs from the sidebar.
- Right-click a .py or .dbc notebook → Run on Databricks.
- Sync a local repo to a Databricks repo.

When to use which¶

Use Databricks Connect if you want to develop PySpark code locally and still leverage Databricks clusters.
Use the Databricks VS Code extension if you want to edit notebooks / manage jobs in VS Code but let execution happen fully inside Databricks.

In fact, some teams use both:

Databricks VS Code extension → for repo sync, notebook editing, job control.
Databricks Connect → for running Spark code locally against the cluster.