description | coverY |
A Kubernetes operator for Databricks |
0 |
A kube-rs operator to enable GitOps style management of Databricks resources. It supports the following APIs:
Jobs 2.1 | DatabricksJob |
Git Credentials 2.0 | GitCredential |
Repos 2.0 | Repo |
Secrets 2.0 | DatabricksSecretScope, DatabricksSecret |
Experimental headed towards stable. See the GitHub project board for the roadmap. Contributions and feedback are welcome!
Looking for a more in-depth example? Read the tutorial.
Add the Helm repository and install the chart:
helm repo add mach
helm install databricks-kube-operator mach/databricks-kube-operator
Create a config map in the same namespace as the operator. To override the configmap name, --set configMapName=my-custom-name
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
name: databricks-kube-operator
api_secret_name: databricks-api-secret
Create a secret with your API URL and credentials:
cat <<EOF | kubectl apply -f -
apiVersion: v1
access_token: $(echo -n 'shhhh' | base64)
databricks_url: $(echo -n '' | base64)
kind: Secret
name: databricks-api-secret
type: Opaque
See the examples directory for samples of Databricks CRDs. Resources that are created via Kubernetes are owned by the operator: your checked-in manifests are the source of truth.
apiVersion: com.dstancu.databricks/v1
kind: DatabricksJob
name: my-word-count
namespace: default
no_alert_for_skipped_runs: false
format: MULTI_TASK
- job_cluster_key: word-count-cluster
max_concurrent_runs: 1
name: my-word-count
git_branch: misc-and-docs
git_provider: gitHub
- email_notifications: {}
job_cluster_key: word-count-cluster
notebook_path: examples/
source: GIT
task_key: my-word-count
timeout_seconds: 0
timeout_seconds: 0
Changes made by users in the Databricks webapp will be overwritten by the operator if drift is detected:
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count drifted!
Diff (remote, kube):
json atoms at path ".settings.tasks[0].notebook_task.notebook_path" are not equal:
[2024-01-11T14:20:40Z INFO databricks_kube::traits::remote_api_resource] Resource DatabricksJob my-word-count reconciling drift...
Look at jobs (allowed to be viewed by the operator's access token):
$ kubectl get databricksjobs
contoso-ingest-qa RUNNING
contoso-ingest-staging INTERNAL_ERROR
contoso-stats-qa TERMINATED
contoso-stats-staging NO_RUNS
$ kubectl describe databricksjob contoso-ingest-qa
A job's status key surfaces API information about the latest run. The status is polled every 60s:
$ kubectl get databricksjob contoso-ingest-staging -ojson | jq .status
"latest_run_state": {
"life_cycle_state": "INTERNAL_ERROR",
"result_state": "FAILED",
"state_message": "Task contoso-ingest-staging failed. This caused all downstream tasks to get skipped.",
"user_cancelled_or_timedout": false
Begin by creating the configmap as per the Helm instructions.
Generate and install the CRDs by running the crd_gen
bin target:
cargo run --bin crd_gen | kubectl apply -f -
The quickest way to test the operator is with a working minikube cluster:
minikube start
minikube tunnel &
export RUST_LOG=databricks_kube
cargo run
[2022-11-02T18:56:25Z INFO databricks_kube] boot! (build: df7e26b-modified)
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD:
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for CRD:
[2022-11-02T18:56:25Z INFO databricks_kube::context] Waiting for settings in config map: databricks-kube-operator
[2022-11-02T18:56:25Z INFO databricks_kube::context] Found config map
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested GitCredential(s)
[2022-11-02T18:56:25Z INFO databricks_kube::traits::synced_api_resource] Looking for uningested DatabricksJob(s)
The client is generated by openapi-generator
and then lightly postprocessed so we get models that derive JsonSchema
and fix some bugs.
TODO: Manual client 'fixes'
# Hey!! This uses GNU sed
# brew install gnu-sed
# Jobs API
openapi-generator generate -g rust -i openapi/jobs-2.1-aws.yaml -c openapi/config-jobs.yaml -o dbr_jobs
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_jobs/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_jobs/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_jobs/Cargo.toml
# Missing import?
gsed -r -i -e 's/(use reqwest;)/\1\nuse crate::models::ViewsToExport;/' dbr_jobs/src/apis/
# Git Credentials API
openapi-generator generate -g rust -i openapi/gitcredentials-2.0-aws.yaml -c openapi/config-git.yaml -o dbr_git_creds
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_git_creds/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_git_creds/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_git_creds/Cargo.toml
# Repos API
openapi-generator generate -g rust -i openapi/repos-2.0-aws.yaml -c openapi/config-repos.yaml -o dbr_repo
# Derive JsonSchema for all models and add schemars as dep
gsed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_repo/src/models/*
gsed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_repo/src/models/*
gsed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_repo/Cargo.toml
# Secrets API
openapi-generator generate -g rust -i openapi/secrets-aws.yaml -c openapi/config-secrets.yaml -o dbr_secrets
sed -i -e 's/derive(Clone/derive(JsonSchema, Clone/' dbr_secrets/src/models/*
sed -i -e 's/\/\*/use schemars::JsonSchema;\n\/\*/' dbr_secrets/src/models/*
sed -r -i -e 's/(\[dependencies\])/\1\nschemars = "0.8.11"/' dbr_secrets/Cargo.toml
Deriving CustomResource
uses macros to generate another struct. For this example, the output struct name would be DatabricksJob
#[derive(Clone, CustomResource, Debug, Default, Deserialize, PartialEq, Serialize, JsonSchema)]
group = "com.dstancu.databricks",
version = "v1",
kind = "DatabricksJob",
derive = "Default",
pub struct DatabricksJobSpec {
pub job: Job,
shows squiggles when you use crds::databricks_job::DatabricksJob
, but one may want to look inside. To see what is generated with cargo-expand:
rustup default nightly
cargo expand --bin databricks_kube
Want to add support for a new API? Provided it has an OpenAPI definition, these are the steps. Look for existing examples in the codebase:
- Download API definition into
and make a Rust generator configuration (feel free to copy the others and change name) - Generate the SDK, add it to the Cargo workspace and dependencies for
- Implement
for your new client - Define the new CRD Spec type (follow kube-rs tutorial)
impl RemoteAPIResource<TAPIResource> for MyNewCRD
impl StatusAPIResource<TStatusType> for MyNewCRD
and specifyTStatusType
in your CRD- Add the new resource to the context ensure CRDs condition
- Add the new resource to
Tests must be run with a single thread since we use a stateful singleton to 'mock' the state of a remote API. Eventually it would be nice to have integration tests targetting Databricks.
$ cargo test -- --test-threads=1