> For the complete documentation index, see [llms.txt](https://ask.birdie.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with/databricks.md).

# Databricks

### Overview

Birdie connects to your Databricks workspace and runs SQL queries through a [SQL Warehouse](https://docs.databricks.com/en/compute/sql-warehouse/index.html) — the serverless compute layer that runs queries without requiring a dedicated cluster.

Birdie can import data from Databricks into Birdie for analytics, enrichment and reporting purposes, including:

* Customers and accounts
* Operational and reference tables
* Feedback-related datasets exposed by the client

Because each Databricks deployment is unique, Birdie's team will work with your administrators to finalize details. This article explains how your Databricks admin can configure access and provide the credentials needed to enable the integration.

Two authentication methods are supported:

* **OAuth (Service Principal)** — recommended for production
* **Personal Access Token (PAT)** — simpler alternative

***

### Schema requirements

All Birdie database connectors (Snowflake, Databricks, BigQuery, etc.) require one table or view per feedback type, following the Birdie schema definition.

Examples of feedback types:

* nps
* csat
* review
* social\_media\_post
* support\_ticket

Each feedback type must exist as a table or a view.

Birdie provides the detailed schema reference separately (same model as the [S3 schema documentation](https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with.../s3-azure-gcs)).

***

#### Incremental Partition Column

Birdie uses a partition column to determine which rows to ingest on each run. Typical queries look like:

```sql
SELECT *
FROM <catalog>.<schema>.<table>
WHERE <partition_column> BETWEEN :start AND :end;
```

#### Requirements

The partition column **must reflect when a row was added or last modified in the table**, not when the original event occurred. For example:

* `loaded_at` or `updated_at` (ETL/pipeline timestamp) — **recommended**
* `inserted_at` (row creation timestamp) — acceptable if the table is append-only

#### Why `posted_at` is not sufficient

Using a business event timestamp like `posted_at` can cause **data gaps**:

1. **Historical updates missed**: If a past row is updated (e.g., a ticket status changes), its `posted_at` remains in the past and Birdie will not re-ingest it.
2. **Pipeline timing gaps**: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An `updated_at` or `loaded_at` column tied to the ETL timestamp ensures these rows are captured.

#### Recommendation

Add a `loaded_at` or `updated_at` column to your table that is set to `current_timestamp()` whenever a row is inserted or modified. Use this column as the `partition_column` when configuring the Birdie integration.

If your table is append-only and never updated, `posted_at` is acceptable as the partition column.

***

### OAuth (Service Principal) Method

#### Overview

Authenticates through Databricks' [OAuth M2M (machine-to-machine)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) flow using client credentials. Recommended for production and automated environments.

#### Requirements

Before starting:

* Your workspace has Unity Catalog enabled (recommended) or Hive Metastore
* You have workspace admin access
* A SQL Warehouse is available

#### Setup in Databricks

{% stepper %}
{% step %}

#### Create a Service Principal

1. Go to **Settings > Identity and access > Service principals**
2. Add a service principal named `birdie`
3. Enable **Workspace Access** and **Databricks SQL Access**
   {% endstep %}

{% step %}

#### Generate OAuth Credentials

1. Generate a client secret for the service principal
2. Save the **Client ID** and **Client Secret**
   {% endstep %}

{% step %}

#### Grant SQL Warehouse Access

Grant **CAN USE** on the SQL Warehouse to `birdie`
{% endstep %}

{% step %}

#### Grant Read-Only Table Permissions

Follow the principle of least privilege: grant access only to the specific catalogs, schemas and tables required by Birdie.

For Unity Catalog:

```sql
GRANT USAGE ON CATALOG <catalog> TO `birdie`;
GRANT USAGE ON SCHEMA <catalog>.<schema> TO `birdie`;
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `birdie`;
```

For Hive Metastore (legacy):

```sql
GRANT SELECT ON TABLE <schema>.<table> TO `birdie`;
```

{% endstep %}
{% endstepper %}

***

### Personal Access Token (PAT) Method

#### Overview

A simpler alternative when OAuth is not configured. Uses a long-lived token tied to a user account. Token lifecycle and rotation are the responsibility of the user who created it.

#### Setup in Databricks

{% stepper %}
{% step %}

#### Create or Use an Existing User

The user needs **Workspace Access** and **Databricks SQL Access**
{% endstep %}

{% step %}

#### Generate a PAT

1. Go to **User Settings > Developer > Access Tokens**
2. Generate a new token
3. Save the token (shown only once)
   {% endstep %}

{% step %}

#### Grant Permissions

Apply the same SQL GRANT statements as the OAuth method above (see Step 4 under OAuth).
{% endstep %}
{% endstepper %}

***

### Share Connection Details with Birdie

To configure the integration, securely provide Birdie with the following information.

**For OAuth (Service Principal):**

```json
{
  "connection_details": {
    "host": "https://dbc-xxxx.cloud.databricks.com",
    "warehouse_id": "a1b2c3d4e5f6g7h8",
    "auth": {
      "method": "oauth",
      "client_id": "your-service-principal-client-id",
      "client_secret": "your-service-principal-client-secret"
    }
  },
  "data_source": {
    "catalog": "main",
    "schema": "default",
    "table": "birdie_export_table",
    "partition_column": "loaded_at"
  },
  "metadata": {
    "data_kind": "Support Ticket, NPS, Account"
  }
}
```

**For PAT:**

```json
{
  "connection_details": {
    "host": "https://dbc-xxxx.cloud.databricks.com",
    "warehouse_id": "a1b2c3d4e5f6g7h8",
    "auth": {
      "method": "pat",
      "personal_access_token": "your-personal-access-token"
    }
  },
  "data_source": {
    "catalog": "main",
    "schema": "default",
    "table": "birdie_export_table",
    "partition_column": "loaded_at"
  },
  "metadata": {
    "data_kind": "Support Ticket, NPS, Account"
  }
}
```

<table><thead><tr><th width="215">Section</th><th>Field</th><th>Required</th><th>Description</th></tr></thead><tbody><tr><td>connection_details</td><td>host</td><td>Yes</td><td>Workspace URL, e.g. <code>https://dbc-xxxx.cloud.databricks.com</code></td></tr><tr><td>connection_details</td><td>warehouse_id</td><td>Yes</td><td>Databricks SQL Warehouse ID.</td></tr><tr><td>connection_details.auth</td><td>method</td><td>Yes</td><td>Authentication mode: <code>oauth</code> (default) or <code>pat</code>.</td></tr><tr><td>connection_details.auth</td><td>client_id</td><td>OAuth</td><td>Service principal client ID.</td></tr><tr><td>connection_details.auth</td><td>client_secret</td><td>OAuth</td><td>Service principal client secret.</td></tr><tr><td>connection_details.auth</td><td>personal_access_token</td><td>PAT</td><td>Personal access token.</td></tr><tr><td>data_source</td><td>catalog</td><td>Yes</td><td>Unity Catalog name (optional for Hive Metastore).</td></tr><tr><td>data_source</td><td>schema</td><td>Yes</td><td>Schema or database name.</td></tr><tr><td>data_source</td><td>table</td><td>Yes</td><td>Table or view name.</td></tr><tr><td>data_source</td><td>partition_column</td><td>Yes</td><td>Column for incremental ingestion. Defaults to <code>created_at</code></td></tr><tr><td>metadata</td><td>data_kind</td><td>Yes</td><td>The type(s) of data in the table, e.g. <code>Support Ticket</code>, <code>NPS</code>, <code>CSAT</code>, <code>Review</code>, <code>Account</code>.</td></tr></tbody></table>

[Share credentials securely](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration) with the Birdie team. Never send credentials via email or unencrypted channels.

***

### Validating the integration

{% stepper %}
{% step %}

#### Generate an OAuth token (OAuth example)

```bash
curl --request POST "https://<workspace-host>/oidc/v1/token" \
  --header "Content-Type: application/x-www-form-urlencoded" \
  --data "grant_type=client_credentials" \
  --data "client_id=<client_id>" \
  --data "client_secret=<client_secret>" \
  --data "scope=all-apis"
```

You should receive an `access_token`.
{% endstep %}

{% step %}

#### Validate Databricks REST API access

```bash
curl -H "Authorization: Bearer <access_token>" \
  "https://<workspace-host>/api/2.0/workspace/get-status?path=/"
```

Expected response:

```json
{"object_type":"DIRECTORY","path":"/"}
```

{% endstep %}

{% step %}

#### Validate SQL execution

```bash
curl --request POST \
  "https://<workspace-host>/api/2.0/sql/statements" \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --data '{
    "statement": "SELECT 1",
    "warehouse_id": "<warehouse_id>"
  }'
```

If this returns `1`, Birdie can successfully execute SQL queries.Validating the integration
{% endstep %}
{% endstepper %}

***

### References

* [Databricks REST & SQL API](https://docs.databricks.com/api/index.html)
* [Authentication](https://docs.databricks.com/en/dev-tools/auth/index.html)
* [Unity Catalog Privileges](https://docs.databricks.com/en/data-governance/unity-catalog/privileges/index.html)
* [Service Principals](https://docs.databricks.com/en/admin/users-groups/service-principals.html)
* [Personal Access Tokens](https://docs.databricks.com/en/dev-tools/auth/pat.html)
* [Securely Sharing Credentials](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with/databricks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
Section	Field	Required	Description
connection_details	host	Yes	Workspace URL, e.g. `https://dbc-xxxx.cloud.databricks.com`
connection_details	warehouse_id	Yes	Databricks SQL Warehouse ID.
connection_details.auth	method	Yes	Authentication mode: `oauth` (default) or `pat`.
connection_details.auth	client_id	OAuth	Service principal client ID.
connection_details.auth	client_secret	OAuth	Service principal client secret.
connection_details.auth	personal_access_token	PAT	Personal access token.
data_source	catalog	Yes	Unity Catalog name (optional for Hive Metastore).
data_source	schema	Yes	Schema or database name.
data_source	table	Yes	Table or view name.
data_source	partition_column	Yes	Column for incremental ingestion. Defaults to `created_at`
metadata	data_kind	Yes	The type(s) of data in the table, e.g. `Support Ticket`, `NPS`, `CSAT`, `Review`, `Account`.