> For the complete documentation index, see [llms.txt](https://ask.birdie.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ask.birdie.ai/~/revisions/5vNgx1SBLaM9wSz456q7/integrations-and-data-ingestion/how-to-integrate-with/databricks.md).

# Databricks

### Overview

Birdie connects to your Databricks workspace and runs SQL queries through a **Databricks SQL Warehouse**.

Typical queries executed by Birdie look like:

```
SELECT *
FROM <catalog>.<schema>.<table>
WHERE <partition_column> BETWEEN :start AND :end;
```

Your team controls:

* Which tables or views Birdie can read
* Which datasets are exposed
* How data is partitioned for incremental ingestion

***

### Integration models

Birdie supports the authentication mechanisms provided by Databricks.

#### OAuth (Service Principal) — recommended

Birdie authenticates using a **Databricks service principal** and short-lived [OAuth tokens](https://docs.databricks.com/aws/en/dev-tools/auth) issued by the workspace OIDC endpoint.

This model is recommended when:

* You run production or automated ingestion pipelines
* You want OAuth-based, non-interactive access
* You want to avoid long-lived credentials

#### Personal Access Token (PAT)

Birdie authenticates using a technical user and a [Personal Access Token](https://docs.databricks.com/aws/en/dev-tools/auth/pat).

This model is typically used when:

* OAuth is not enabled in the workspace
* You are running a proof-of-concept or non-production setup

***

### Schema requirements

All Birdie database connectors follow the same schema model. Each dataset type must be exposed as **one table or one view**.

Examples of dataset types:

* `nps`
* `csat`
* `review`
* `survey`
* `support_ticket`
* `conversation_message`
* Operational or reference tables (accounts, users, metadata)

Birdie uses a separate detailed schema definition, similar to the [S3-based ingestion](https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with.../s3-azure-gcs) model.

***

#### Requirements

Before starting, make sure you have:

* A Databricks workspace with **Databricks SQL enabled**
* A SQL Warehouse available for querying
* Admin or equivalent privileges to:
  * Create users or service principals
  * Grant SQL permissions
  * Grant Warehouse access
* The tables or views Birdie will ingest already created

***

### Setup

#### 1. Create the identity used by Birdie

Birdie can authenticate as either a **service principal** (recommended) or a **technical user**.

**Option A — Service Principal (recommended)**

1. Go to **Settings - Workspace admin - Identity and access - Service principals**
2. Click **Add service principal**
3. Name it: `birdie`
4. Enable:
   * Workspace Access
   * Databricks SQL Access
5. Generate a **client secret**
6. Save:
   * Client ID
   * Client Secret

Birdie will use the following OAuth token endpoint:

```
https://<workspace-host>/oidc/v1/token
```

**Option B — Technical User + PAT**

1. Create a user named: `birdie`
2. Enable:
   * Workspace Access
   * Databricks SQL Access
3. Generate a **Personal Access Token (PAT)**
4. Store the PAT securely

#### 2. Grant read-only SQL permissions

Birdie requires **SELECT-only** access to the tables or views it will ingest.

**Unity Catalog environments**

```sql
GRANT USAGE ON CATALOG <catalog> TO `birdie`;
GRANT USAGE ON SCHEMA <catalog>.<schema> TO `birdie`;
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `birdie`;
```

Repeat the `SELECT` grant for each table or view Birdie should ingest.

***

**Hive Metastore (legacy workspaces)**

```sql
GRANT SELECT ON TABLE <schema>.<table> TO `birdie`;
```

Repeat for each required table or view.

#### 3. Grant SQL Warehouse access

Birdie must be able to execute queries in a Databricks SQL Warehouse.

Grant **CAN USE** permission on the warehouse to:

* The service principal, or
* The technical user

This step is mandatory for SQL execution.

***

### Connection details to share with Birdie

Provide the following information securely to the Birdie team:

* Databricks workspace URL / host
* Authentication method (OAuth or PAT)
* Identity used (service principal or user)
* OAuth Client ID and Client Secret **or** PAT
* SQL Warehouse ID
* Catalog, schema, and table or view names
* Partition column used for incremental ingestion

Birdie validates connectivity using queries such as:

```sql
SELECT current_user(), current_catalog(), current_schema();
```

***

### Validating the integration

{% stepper %}
{% step %}

#### Generate an OAuth token (OAuth example)

```bash
curl --request POST "https://<workspace-host>/oidc/v1/token" \
  --header "Content-Type: application/x-www-form-urlencoded" \
  --data "grant_type=client_credentials" \
  --data "client_id=<client_id>" \
  --data "client_secret=<client_secret>" \
  --data "scope=all-apis"
```

You should receive an `access_token`.
{% endstep %}

{% step %}

#### &#x20;Validate Databricks REST API access

```bash
curl -H "Authorization: Bearer <access_token>" \
  "https://<workspace-host>/api/2.0/workspace/get-status?path=/"
```

Expected response:

```json
{"object_type":"DIRECTORY","path":"/"}
```

{% endstep %}

{% step %}

#### Validate SQL execution

```bash
curl --request POST \
  "https://<workspace-host>/api/2.0/sql/statements" \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --data '{
    "statement": "SELECT 1",
    "warehouse_id": "<warehouse_id>"
  }'
```

If this returns `1`, Birdie can successfully execute SQL queries.Validating the integration
{% endstep %}
{% endstepper %}

***

### References

* Databricks REST & SQL API\
  <https://docs.databricks.com/api/index.html>
* Authentication\
  <https://docs.databricks.com/en/dev-tools/auth/index.html>\
  <https://docs.databricks.com/en/dev-tools/auth/oauth-u2m.html>
* Unity Catalog privileges\
  <https://docs.databricks.com/en/data-governance/unity-catalog/privileges/index.html>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ask.birdie.ai/~/revisions/5vNgx1SBLaM9wSz456q7/integrations-and-data-ingestion/how-to-integrate-with/databricks.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
