Databricks
Overview
Birdie connects to your Databricks workspace and runs SQL queries through a SQL Warehouse — the serverless compute layer that runs queries without requiring a dedicated cluster.
Birdie can import data from Databricks into Birdie for analytics, enrichment and reporting purposes, including:
Customers and accounts
Operational and reference tables
Feedback-related datasets exposed by the client
Because each Databricks deployment is unique, Birdie's team will work with your administrators to finalize details. This article explains how your Databricks admin can configure access and provide the credentials needed to enable the integration.
Two authentication methods are supported:
OAuth (Service Principal) — recommended for production
Personal Access Token (PAT) — simpler alternative
Schema requirements
All Birdie database connectors (Snowflake, Databricks, BigQuery, etc.) require one table or view per feedback type, following the Birdie schema definition.
Examples of feedback types:
nps
csat
review
social_media_post
support_ticket
Each feedback type must exist as a table or a view.
Birdie provides the detailed schema reference separately (same model as the S3 schema documentation).
Incremental Partition Column
Birdie uses a partition column to determine which rows to ingest on each run. Typical queries look like:
Requirements
The partition column must reflect when a row was added or last modified in the table, not when the original event occurred. For example:
loaded_atorupdated_at(ETL/pipeline timestamp) — recommendedinserted_at(row creation timestamp) — acceptable if the table is append-only
Why posted_at is not sufficient
posted_at is not sufficientUsing a business event timestamp like posted_at can cause data gaps:
Historical updates missed: If a past row is updated (e.g., a ticket status changes), its
posted_atremains in the past and Birdie will not re-ingest it.Pipeline timing gaps: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An
updated_atorloaded_atcolumn tied to the ETL timestamp ensures these rows are captured.
Recommendation
Add a loaded_at or updated_at column to your table that is set to current_timestamp() whenever a row is inserted or modified. Use this column as the partition_column when configuring the Birdie integration.
If your table is append-only and never updated, posted_at is acceptable as the partition column.
OAuth (Service Principal) Method
Overview
Authenticates through Databricks' OAuth M2M (machine-to-machine) flow using client credentials. Recommended for production and automated environments.
Requirements
Before starting:
Your workspace has Unity Catalog enabled (recommended) or Hive Metastore
You have workspace admin access
A SQL Warehouse is available
Setup in Databricks
Personal Access Token (PAT) Method
Overview
A simpler alternative when OAuth is not configured. Uses a long-lived token tied to a user account. Token lifecycle and rotation are the responsibility of the user who created it.
Setup in Databricks
Share Connection Details with Birdie
To configure the integration, securely provide Birdie with the following information.
For OAuth (Service Principal):
For PAT:
connection_details
host
Yes
Workspace URL, e.g. https://dbc-xxxx.cloud.databricks.com
connection_details
warehouse_id
Yes
Databricks SQL Warehouse ID.
connection_details.auth
method
Yes
Authentication mode: oauth (default) or pat.
connection_details.auth
client_id
OAuth
Service principal client ID.
connection_details.auth
client_secret
OAuth
Service principal client secret.
connection_details.auth
personal_access_token
PAT
Personal access token.
data_source
catalog
Yes
Unity Catalog name (optional for Hive Metastore).
data_source
schema
Yes
Schema or database name.
data_source
table
Yes
Table or view name.
data_source
partition_column
Yes
Column for incremental ingestion. Defaults to created_at
metadata
data_kind
Yes
The type(s) of data in the table, e.g. Support Ticket, NPS, CSAT, Review, Account.
Share credentials securely with the Birdie team. Never send credentials via email or unencrypted channels.
Validating the integration
References
Last updated