Deploy MLflow on Databricks Apps

Expert assistant for deploying and managing the open-source MLflow server on Databricks Apps, providing fully integrated MLflow experience with Databricks tracking backend and Unity Catalog model registry.

What This Skill Does

Helps you deploy, configure, and troubleshoot MLflow OSS on Databricks Apps, including:

Deploying MLflow server connected to Databricks workspace

Configuring Unity Catalog for model registry

Managing MLflow versions (PyPI or GitHub branches)

Monitoring application logs and debugging issues

Testing deployments and verifying functionality

Project Architecture

**Tech Stack:**

MLflow Server (open source with Databricks integration)

Python package management with `uv`

Databricks Apps (serverless hosting)

Unity Catalog (model registry + artifact storage)

Backend: Databricks tracking store

**Storage:**

Experiments/runs: Databricks tracking store

Models: Unity Catalog

Artifacts: Unity Catalog volumes (`/Volumes/main/default/mlflow-artifacts`)

Step-by-Step Instructions

1. Understand Project Structure

When user asks about the project, explain:

**Key Files:**

`pyproject.toml` - Python dependencies including MLflow version

`app.yaml` - Databricks Apps config (auto-generated, don't edit)

`start_mlflow.sh` - Startup script (copies UI assets, starts server)

`deploy.sh` - Deployment script (handles UI assets, uploads to Databricks)

`requirements.txt` - Auto-generated from pyproject.toml

`.env.local` - Local Databricks credentials

**How It Works:**

1. Deployment downloads MLflow UI assets from PyPI

2. Syncs all files to Databricks workspace

3. Startup copies UI assets to MLflow's expected location

4. MLflow server starts with Databricks backend configuration

5. Authenticates using Databricks service principal (automatic)

2. Initial Setup

When user wants to set up the project:

```bash

Run initial setup

./setup.sh

Verify credentials are configured

source .env.local && export DATABRICKS_HOST && export DATABRICKS_TOKEN

databricks current-user me

```

3. Changing MLflow Version

**For official MLflow release (PyPI):**

Guide user to update `pyproject.toml`:

```toml

dependencies = [

"mlflow>=3.0.0", # Latest MLflow 3.x from PyPI

"databricks-sdk>=0.33.0",

"gunicorn>=22.0.0",

]

```

**For custom MLflow branch (GitHub):**

Guide user to update `pyproject.toml`:

```toml

dependencies = [

"mlflow @ git+https://github.com/USERNAME/mlflow.git@BRANCH_NAME",

"databricks-sdk>=0.33.0",

"gunicorn>=22.0.0",

]

```

**After version change:**

```bash

Regenerate requirements.txt

uv run python scripts/generate_semver_requirements.py

Deploy changes (ALWAYS with nohup)

nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &

tail -f /tmp/mlflow-deploy.log

```

4. Deploying the Application

**CRITICAL: ALWAYS use nohup for deployment** (takes 5-10+ minutes):

```bash

✅ CORRECT - Deploy with nohup and logging

nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &

Monitor progress

tail -f /tmp/mlflow-deploy.log

Check deployment status

databricks apps list

source .env.local && export DATABRICKS_HOST && export DATABRICKS_TOKEN

databricks apps get mlflow-oss

```

**Never run `./deploy.sh` directly** - it will timeout or disconnect, leaving deployment in unknown state.

**Why deployment is slow:**

`build_mlflow_ui_assets.sh` runs `yarn build` to compile UI assets

This can take 5+ minutes when using Git branches

Background execution ensures completion even if terminal disconnects

5. Monitoring Application Logs

**Use `dba_logz.py` script for real-time log streaming:**

```bash

Stream all logs for 30 seconds

uv run python dba_logz.py https://YOUR-APP-URL --duration 30

Search for errors

uv run python dba_logz.py https://YOUR-APP-URL --search "ERROR\|Exception" --duration 60

Monitor startup messages

uv run python dba_logz.py https://YOUR-APP-URL --search "Application startup\|Uvicorn running" --duration 60

Continuous streaming (Ctrl+C to stop)

uv run python dba_logz.py https://YOUR-APP-URL --duration 999999

```

**Note:** Browser logs available at `https://YOUR-APP-URL/logz` (requires OAuth).

6. Testing the Deployment

```bash

Test health endpoint

uv run python dba_client.py https://YOUR-APP-URL/health

Test experiments API

uv run python dba_client.py https://YOUR-APP-URL/api/2.0/mlflow/experiments/search POST '{}'

Check UI is served

uv run python dba_client.py https://YOUR-APP-URL/

```

7. Troubleshooting Common Issues

**UI Not Loading:**

1. Check `start_mlflow.sh` logs for UI asset copy errors

2. Verify MLflow version in `pyproject.toml` matches UI assets

3. Check application logs at `https://YOUR-APP-URL/logz`

**Experiments Not Showing:**

1. Verify `--backend-store-uri databricks` in `start_mlflow.sh`

2. Check service principal permissions in Databricks

3. Ensure `DATABRICKS_HOST` and `DATABRICKS_TOKEN` are set

**Model Registry Issues:**

1. Verify `--registry-store-uri databricks-uc` in `start_mlflow.sh`

2. Check Unity Catalog permissions for service principal

3. Ensure Unity Catalog is enabled in workspace

**Deployment Timing Out:**

1. Confirm you used `nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &`

2. Check log file: `tail -f /tmp/mlflow-deploy.log`

3. Wait 5-10 minutes for UI asset build to complete

Critical Rules

1. **ALWAYS use `uv run python`** - Never run `python` directly

```bash

# ✅ CORRECT

uv run python script.py

# ❌ WRONG

python script.py

```

2. **ALWAYS deploy with nohup** - Never run `./deploy.sh` directly

```bash

# ✅ CORRECT

nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &

# ❌ WRONG

./deploy.sh

```

3. **Export environment variables for Databricks CLI:**

```bash

source .env.local && export DATABRICKS_HOST && export DATABRICKS_TOKEN

databricks apps list

```

4. **Don't edit `app.yaml`** - It's auto-generated by deployment script

5. **Don't commit UI assets** - They're downloaded automatically during deployment

Environment Variables

**Automatically set by Databricks Apps:**

`DATABRICKS_HOST` - Workspace URL

`DATABRICKS_TOKEN` - Service principal token

**In `.env.local` (for local CLI):**

`DATABRICKS_HOST` - Your workspace URL

`DATABRICKS_TOKEN` - Personal access token

Important Notes

Service principal is automatically created/managed by Databricks Apps

Authentication requires OAuth (handled by Databricks Apps)

UI assets are version-matched to MLflow package

Git URLs in requirements.txt are preserved by `generate_semver_requirements.py`

Deployment provides full MLflow functionality with Databricks backend

No local storage or databases needed

Example Usage

**Scenario 1: Initial deployment**

```bash

./setup.sh

nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &

tail -f /tmp/mlflow-deploy.log

```

**Scenario 2: Switch to custom MLflow branch**

```bash

Edit pyproject.toml to use git+https://github.com/user/mlflow.git@branch

uv run python scripts/generate_semver_requirements.py

nohup ./deploy.sh > /tmp/mlflow-deploy.log 2>&1 &

tail -f /tmp/mlflow-deploy.log

```

**Scenario 3: Debug deployment issues**

```bash

Monitor logs during deployment

uv run python dba_logz.py https://YOUR-APP-URL --search "ERROR" --duration 120

Test health after deployment

uv run python dba_client.py https://YOUR-APP-URL/health

```

Deploy MLflow on Databricks Apps

Deploy MLflow on Databricks Apps

What This Skill Does

Project Architecture

Step-by-Step Instructions

1. Understand Project Structure

2. Initial Setup

Run initial setup

Verify credentials are configured

3. Changing MLflow Version

Regenerate requirements.txt

Deploy changes (ALWAYS with nohup)

4. Deploying the Application

✅ CORRECT - Deploy with nohup and logging

Monitor progress

Check deployment status

5. Monitoring Application Logs

Stream all logs for 30 seconds

Search for errors

Monitor startup messages

Continuous streaming (Ctrl+C to stop)

6. Testing the Deployment

Test health endpoint

Test experiments API

Check UI is served

7. Troubleshooting Common Issues

Critical Rules

Environment Variables

Important Notes

Example Usage

Edit pyproject.toml to use git+https://github.com/user/mlflow.git@branch

Monitor logs during deployment

Test health after deployment

Reviews (0)