Databricks

Overview

Birdie connects to your Databricks workspace and runs SQL queries through a SQL Warehouse — the serverless compute layer that runs queries without requiring a dedicated cluster.

Birdie can import data from Databricks into Birdie for analytics, enrichment and reporting purposes, including:

  • Customers and accounts

  • Operational and reference tables

  • Feedback-related datasets exposed by the client

Because each Databricks deployment is unique, Birdie's team will work with your administrators to finalize details. This article explains how your Databricks admin can configure access and provide the credentials needed to enable the integration.

Two authentication methods are supported:

  • OAuth (Service Principal) — recommended for production

  • Personal Access Token (PAT) — simpler alternative


Schema requirements

All Birdie database connectors (Snowflake, Databricks, BigQuery, etc.) require one table or view per feedback type, following the Birdie schema definition.

Examples of feedback types:

  • nps

  • csat

  • review

  • social_media_post

  • support_ticket

Each feedback type must exist as a table or a view.

Birdie provides the detailed schema reference separately (same model as the S3 schema documentation).


Incremental Partition Column

Birdie uses a partition column to determine which rows to ingest on each run. Typical queries look like:

Requirements

The partition column must reflect when a row was added or last modified in the table, not when the original event occurred. For example:

  • loaded_at or updated_at (ETL/pipeline timestamp) — recommended

  • inserted_at (row creation timestamp) — acceptable if the table is append-only

Why posted_at is not sufficient

Using a business event timestamp like posted_at can cause data gaps:

  1. Historical updates missed: If a past row is updated (e.g., a ticket status changes), its posted_at remains in the past and Birdie will not re-ingest it.

  2. Pipeline timing gaps: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An updated_at or loaded_at column tied to the ETL timestamp ensures these rows are captured.

Recommendation

Add a loaded_at or updated_at column to your table that is set to current_timestamp() whenever a row is inserted or modified. Use this column as the partition_column when configuring the Birdie integration.

If your table is append-only and never updated, posted_at is acceptable as the partition column.


OAuth (Service Principal) Method

Overview

Authenticates through Databricks' OAuth M2M (machine-to-machine) flow using client credentials. Recommended for production and automated environments.

Requirements

Before starting:

  • Your workspace has Unity Catalog enabled (recommended) or Hive Metastore

  • You have workspace admin access

  • A SQL Warehouse is available

Setup in Databricks

1

Create a Service Principal

  1. Go to Settings > Identity and access > Service principals

  2. Add a service principal named birdie

  3. Enable Workspace Access and Databricks SQL Access

2

Generate OAuth Credentials

  1. Generate a client secret for the service principal

  2. Save the Client ID and Client Secret

3

Grant SQL Warehouse Access

Grant CAN USE on the SQL Warehouse to birdie

4

Grant Read-Only Table Permissions

Follow the principle of least privilege: grant access only to the specific catalogs, schemas and tables required by Birdie.

For Unity Catalog:

For Hive Metastore (legacy):


Personal Access Token (PAT) Method

Overview

A simpler alternative when OAuth is not configured. Uses a long-lived token tied to a user account. Token lifecycle and rotation are the responsibility of the user who created it.

Setup in Databricks

1

Create or Use an Existing User

The user needs Workspace Access and Databricks SQL Access

2

Generate a PAT

  1. Go to User Settings > Developer > Access Tokens

  2. Generate a new token

  3. Save the token (shown only once)

3

Grant Permissions

Apply the same SQL GRANT statements as the OAuth method above (see Step 4 under OAuth).


Share Connection Details with Birdie

To configure the integration, securely provide Birdie with the following information.

For OAuth (Service Principal):

For PAT:

Section
Field
Required
Description

connection_details

host

Yes

Workspace URL, e.g. https://dbc-xxxx.cloud.databricks.com

connection_details

warehouse_id

Yes

Databricks SQL Warehouse ID.

connection_details.auth

method

Yes

Authentication mode: oauth (default) or pat.

connection_details.auth

client_id

OAuth

Service principal client ID.

connection_details.auth

client_secret

OAuth

Service principal client secret.

connection_details.auth

personal_access_token

PAT

Personal access token.

data_source

catalog

Yes

Unity Catalog name (optional for Hive Metastore).

data_source

schema

Yes

Schema or database name.

data_source

table

Yes

Table or view name.

data_source

partition_column

Yes

Column for incremental ingestion. Defaults to created_at

metadata

data_kind

Yes

The type(s) of data in the table, e.g. Support Ticket, NPS, CSAT, Review, Account.

Share credentials securely with the Birdie team. Never send credentials via email or unencrypted channels.


Validating the integration

1

Generate an OAuth token (OAuth example)

You should receive an access_token.

2

Validate Databricks REST API access

Expected response:

3

Validate SQL execution

If this returns 1, Birdie can successfully execute SQL queries.Validating the integration


References

Last updated