Databricks

This article describes how to configure your Databricks workspace to allow Birdie to export data to a dedicated catalog and schema.

Prerequisite: Before configuring the export integration, you should already have completed the setup described in Data Ingestion with Databricks. The steps below assume a Birdie service principal (or user) already exists and has access to a SQL Warehouse.

Overview

Birdie can load processed data (e.g., enriched feedback, areas, opportunities) into your Databricks workspace. We recommend creating:

  • A dedicated write role or grant set for Birdie

  • A separate catalog and schema for Birdie exports

This isolates write access from your existing data.

Step 1 – Create a dedicated catalog and schema for exports

Create a catalog and schema for Birdie to write to:

CREATE CATALOG IF NOT EXISTS birdie_export;
CREATE SCHEMA IF NOT EXISTS birdie_export.birdie_export_schema;

Step 2 – Grant required privileges to the Birdie principal

Grant the necessary privileges for Birdie to create and manage tables within the export schema.

For OAuth (Service Principal)

-- Catalog and schema usage
GRANT USE CATALOG ON CATALOG birdie_export TO `birdie`;
GRANT USE SCHEMA ON SCHEMA birdie_export.birdie_export_schema TO `birdie`;

-- Table privileges (create, read, write)
GRANT CREATE TABLE ON SCHEMA birdie_export.birdie_export_schema TO `birdie`;
GRANT MODIFY ON SCHEMA birdie_export.birdie_export_schema TO `birdie`;
GRANT SELECT ON SCHEMA birdie_export.birdie_export_schema TO `birdie`;

For Personal Access Token (PAT)

Apply the same GRANT statements above, replacing `birdie` with the user account that owns the PAT.

Step 3 – Ensure SQL Warehouse access

The Birdie principal must have CAN USE permission on the SQL Warehouse that will execute the export queries. If this was already granted during the data ingestion setup, no additional action is needed.

Step 4 – Share export configuration with Birdie

Share the following details securely with Birdie:

For OAuth (Service Principal):

For PAT:

Section
Field
Required
Description

connection_details

host

Yes

Workspace URL, e.g. https://dbc-xxxx.cloud.databricks.com

connection_details

warehouse_id

Yes

Databricks SQL Warehouse ID.

connection_details.auth

method

Yes

Authentication mode: oauth (recommended) or pat.

connection_details.auth

client_id

OAuth

Service principal client ID.

connection_details.auth

client_secret

OAuth

Service principal client secret.

connection_details.auth

personal_access_token

PAT

Personal access token.

export_destination

catalog

Yes

Unity Catalog name for the export target.

export_destination

schema

Yes

Schema name within the catalog.

If not already shared during the data ingestion setup, also include the warehouse ID and account host.

Share credentials securely with the Birdie team. Never send credentials via email or unencrypted channels.

How the Export Works

Birdie uses Databricks SQL Warehouse to load data through the following pipeline:

  1. Staging: A temporary table or volume is created to hold the incoming data.

  2. Schema alignment: Table schema is derived from the CSV export headers and matched to the target table. If the table does not yet exist, it is created automatically.

  3. Data loading: Data is loaded into the target table using COPY INTO or direct INSERT operations.

  4. Merge: For incremental loads, rows with matching primary keys are deleted and re-inserted within a single transaction.

  5. Commit: Birdie uses MERGE INTO so the upsert is atomic on the target Delta table.

Exported Data Format

Birdie creates one table per CSV file. Table and column naming follows our standard CSV export conventions.

Data is de-duplicated on each load, so it will be ready to use. For details on joining tables, refer to our data exports documentation.

Some data (e.g., custom fields) may require additional handling due to their object-like nature.

References

Last updated