# Databricks

### Overview

Birdie connects to your Databricks workspace and runs SQL queries through a [SQL Warehouse](https://docs.databricks.com/en/compute/sql-warehouse/index.html) — the serverless compute layer that runs queries without requiring a dedicated cluster.

Birdie can import data from Databricks into Birdie for analytics, enrichment and reporting purposes, including:

* Customers and accounts
* Operational and reference tables
* Feedback-related datasets exposed by the client

Because each Databricks deployment is unique, Birdie's team will work with your administrators to finalize details. This article explains how your Databricks admin can configure access and provide the credentials needed to enable the integration.

Two authentication methods are supported:

* **OAuth (Service Principal)** — recommended for production
* **Personal Access Token (PAT)** — simpler alternative

***

### Schema requirements

All Birdie database connectors (Snowflake, Databricks, BigQuery, etc.) require one table or view per feedback type, following the Birdie schema definition.

Examples of feedback types:

* nps
* csat
* review
* social\_media\_post
* support\_ticket

Each feedback type must exist as a table or a view.

Birdie provides the detailed schema reference separately (same model as the [S3 schema documentation](https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with.../s3-azure-gcs)).

***

#### Incremental Partition Column

Birdie uses a partition column to determine which rows to ingest on each run. Typical queries look like:

```sql
SELECT *
FROM <catalog>.<schema>.<table>
WHERE <partition_column> BETWEEN :start AND :end;
```

#### Requirements

The partition column **must reflect when a row was added or last modified in the table**, not when the original event occurred. For example:

* `loaded_at` or `updated_at` (ETL/pipeline timestamp) — **recommended**
* `inserted_at` (row creation timestamp) — acceptable if the table is append-only

#### Why `posted_at` is not sufficient

Using a business event timestamp like `posted_at` can cause **data gaps**:

1. **Historical updates missed**: If a past row is updated (e.g., a ticket status changes), its `posted_at` remains in the past and Birdie will not re-ingest it.
2. **Pipeline timing gaps**: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An `updated_at` or `loaded_at` column tied to the ETL timestamp ensures these rows are captured.

#### Recommendation

Add a `loaded_at` or `updated_at` column to your table that is set to `current_timestamp()` whenever a row is inserted or modified. Use this column as the `partition_column` when configuring the Birdie integration.

If your table is append-only and never updated, `posted_at` is acceptable as the partition column.

***

### OAuth (Service Principal) Method

#### Overview

Authenticates through Databricks' [OAuth M2M (machine-to-machine)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) flow using client credentials. Recommended for production and automated environments.

#### Requirements

Before starting:

* Your workspace has Unity Catalog enabled (recommended) or Hive Metastore
* You have workspace admin access
* A SQL Warehouse is available

#### Setup in Databricks

{% stepper %}
{% step %}

#### Create a Service Principal

1. Go to **Settings > Identity and access > Service principals**
2. Add a service principal named `birdie`
3. Enable **Workspace Access** and **Databricks SQL Access**
   {% endstep %}

{% step %}

#### Generate OAuth Credentials

1. Generate a client secret for the service principal
2. Save the **Client ID** and **Client Secret**
   {% endstep %}

{% step %}

#### Grant SQL Warehouse Access

Grant **CAN USE** on the SQL Warehouse to `birdie`
{% endstep %}

{% step %}

#### Grant Read-Only Table Permissions

Follow the principle of least privilege: grant access only to the specific catalogs, schemas and tables required by Birdie.

For Unity Catalog:

```sql
GRANT USAGE ON CATALOG <catalog> TO `birdie`;
GRANT USAGE ON SCHEMA <catalog>.<schema> TO `birdie`;
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `birdie`;
```

For Hive Metastore (legacy):

```sql
GRANT SELECT ON TABLE <schema>.<table> TO `birdie`;
```

{% endstep %}
{% endstepper %}

***

### Personal Access Token (PAT) Method

#### Overview

A simpler alternative when OAuth is not configured. Uses a long-lived token tied to a user account. Token lifecycle and rotation are the responsibility of the user who created it.

#### Setup in Databricks

{% stepper %}
{% step %}

#### Create or Use an Existing User

The user needs **Workspace Access** and **Databricks SQL Access**
{% endstep %}

{% step %}

#### Generate a PAT

1. Go to **User Settings > Developer > Access Tokens**
2. Generate a new token
3. Save the token (shown only once)
   {% endstep %}

{% step %}

#### Grant Permissions

Apply the same SQL GRANT statements as the OAuth method above (see Step 4 under OAuth).
{% endstep %}
{% endstepper %}

***

### Share Connection Details with Birdie

To configure the integration, securely provide Birdie with the following information.

**For OAuth (Service Principal):**

```json
{
  "connection_details": {
    "host": "https://dbc-xxxx.cloud.databricks.com",
    "warehouse_id": "a1b2c3d4e5f6g7h8",
    "auth": {
      "method": "oauth",
      "client_id": "your-service-principal-client-id",
      "client_secret": "your-service-principal-client-secret"
    }
  },
  "data_source": {
    "catalog": "main",
    "schema": "default",
    "table": "birdie_export_table",
    "partition_column": "loaded_at"
  },
  "metadata": {
    "data_kind": "Support Ticket, NPS, Account"
  }
}
```

**For PAT:**

```json
{
  "connection_details": {
    "host": "https://dbc-xxxx.cloud.databricks.com",
    "warehouse_id": "a1b2c3d4e5f6g7h8",
    "auth": {
      "method": "pat",
      "personal_access_token": "your-personal-access-token"
    }
  },
  "data_source": {
    "catalog": "main",
    "schema": "default",
    "table": "birdie_export_table",
    "partition_column": "loaded_at"
  },
  "metadata": {
    "data_kind": "Support Ticket, NPS, Account"
  }
}
```

<table><thead><tr><th width="215">Section</th><th>Field</th><th>Required</th><th>Description</th></tr></thead><tbody><tr><td>connection_details</td><td>host</td><td>Yes</td><td>Workspace URL, e.g. <code>https://dbc-xxxx.cloud.databricks.com</code></td></tr><tr><td>connection_details</td><td>warehouse_id</td><td>Yes</td><td>Databricks SQL Warehouse ID.</td></tr><tr><td>connection_details.auth</td><td>method</td><td>Yes</td><td>Authentication mode: <code>oauth</code> (default) or <code>pat</code>.</td></tr><tr><td>connection_details.auth</td><td>client_id</td><td>OAuth</td><td>Service principal client ID.</td></tr><tr><td>connection_details.auth</td><td>client_secret</td><td>OAuth</td><td>Service principal client secret.</td></tr><tr><td>connection_details.auth</td><td>personal_access_token</td><td>PAT</td><td>Personal access token.</td></tr><tr><td>data_source</td><td>catalog</td><td>Yes</td><td>Unity Catalog name (optional for Hive Metastore).</td></tr><tr><td>data_source</td><td>schema</td><td>Yes</td><td>Schema or database name.</td></tr><tr><td>data_source</td><td>table</td><td>Yes</td><td>Table or view name.</td></tr><tr><td>data_source</td><td>partition_column</td><td>Yes</td><td>Column for incremental ingestion. Defaults to <code>created_at</code></td></tr><tr><td>metadata</td><td>data_kind</td><td>Yes</td><td>The type(s) of data in the table, e.g. <code>Support Ticket</code>, <code>NPS</code>, <code>CSAT</code>, <code>Review</code>, <code>Account</code>.</td></tr></tbody></table>

[Share credentials securely](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration) with the Birdie team. Never send credentials via email or unencrypted channels.

***

### Validating the integration

{% stepper %}
{% step %}

#### Generate an OAuth token (OAuth example)

```bash
curl --request POST "https://<workspace-host>/oidc/v1/token" \
  --header "Content-Type: application/x-www-form-urlencoded" \
  --data "grant_type=client_credentials" \
  --data "client_id=<client_id>" \
  --data "client_secret=<client_secret>" \
  --data "scope=all-apis"
```

You should receive an `access_token`.
{% endstep %}

{% step %}

#### &#x20;Validate Databricks REST API access

```bash
curl -H "Authorization: Bearer <access_token>" \
  "https://<workspace-host>/api/2.0/workspace/get-status?path=/"
```

Expected response:

```json
{"object_type":"DIRECTORY","path":"/"}
```

{% endstep %}

{% step %}

#### Validate SQL execution

```bash
curl --request POST \
  "https://<workspace-host>/api/2.0/sql/statements" \
  --header "Authorization: Bearer <access_token>" \
  --header "Content-Type: application/json" \
  --data '{
    "statement": "SELECT 1",
    "warehouse_id": "<warehouse_id>"
  }'
```

If this returns `1`, Birdie can successfully execute SQL queries.Validating the integration
{% endstep %}
{% endstepper %}

***

### References

* [Databricks REST & SQL API](https://docs.databricks.com/api/index.html)
* [Authentication](https://docs.databricks.com/en/dev-tools/auth/index.html)
* [Unity Catalog Privileges](https://docs.databricks.com/en/data-governance/unity-catalog/privileges/index.html)
* [Service Principals](https://docs.databricks.com/en/admin/users-groups/service-principals.html)
* [Personal Access Tokens](https://docs.databricks.com/en/dev-tools/auth/pat.html)
* [Securely Sharing Credentials](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration)
