BigQuery

Overview

Birdie can ingest structured datasets directly from BigQuery using read-only SQL queries. Most customers run Birdie ingestion daily, using a partition column (DATE / TIMESTAMP / DATETIME) to import only new rows and avoid full table scans.

Create the Service Account used by Birdie

Birdie connects to BigQuery using a Google Cloud Service Account. This service account is a technical identity used only for read-only access to your data. Steps in Google Cloud Console:

  1. Go to IAM & Admin - Service Accounts

  2. Click “Create service account”

  3. Service account name (suggested): birdie

  4. Description (optional)

  5. Click “Create and continue” (roles will be added in the next section)

  6. Finish creating the service account

Note: After creation, wait up to 1 minute before using the service account. IAM propagation can be slightly delayed.


Grant Read-Only BigQuery Permissions

Permission to run query jobs (Project level) Role: BigQuery Job User (roles/bigquery.jobUser)

This permission allows Birdie to execute SQL queries. It does NOT grant data access by itself.

How to grant:

  • IAM & Admin - IAM

  • Select your Project

  • Add the service account

  • Assign role: BigQuery Job User


Permission to read data (Dataset level – recommended) Role: BigQuery Data Viewer (roles/bigquery.dataViewer)

This permission allows Birdie to read tables or views.

How to grant:

  • Go to BigQuery

  • Select the Dataset that Birdie should ingest

  • Click “Share dataset”

  • Add the service account

Note: Grant Data Viewer at the dataset (or table) level instead of the entire project.


Incremental Partition Column

Birdie uses an incremental partition column to determine which rows to ingest on each run. Choosing the right column is critical for complete data ingestion.

Requirements

The partition column must reflect when a row was added or last modified in the table, not when the original event occurred. For example:

  • loaded_at or updated_at (ETL/pipeline timestamp) - recommended

  • inserted_at (row creation timestamp) - acceptable if the table is append-only

Why posted_at is not sufficient

Using a business event timestamp like posted_at can cause data gaps:

  1. Historical updates missed: If a past row is updated (e.g., a ticket status changes), its posted_at remains in the past and Birdie will not re-ingest it.

  2. Pipeline timing gaps: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An updated_at or loaded_at column tied to the ETL timestamp ensures these rows are captured.

Recommendation

Add a loaded_at or updated_at column to your table that is set to CURRENT_TIMESTAMP() whenever a row is inserted or modified. Use this column as the partition column when configuring the Birdie integration.

If your table is append-only and never updated, posted_at is acceptable as the partition column.


Prepare Your BigQuery Table

Birdie works best when the source table is partitioned by a column used for incremental ingestion (for example: posted_at).

Supported partition column types:

  • DATE

  • TIMESTAMP

  • DATETIME

Steps in BigQuery Console:

  1. Select your Dataset

  2. Click “Create table”

  3. Source: Empty table

  4. Define the schema:

    1. Make sure the schema includes your partition column as DATE, TIMESTAMP, or DATETIME.

  5. Partition and cluster settings:

    1. Partition by field

    2. Select the partition column

    3. Recommended: enable “Require partition filter". This reduces cost and prevents accidental full table scans.

  6. Create the table


Create a Service Account Key (JSON)

Birdie authenticates using a JSON service account key. Steps:

  1. Go to IAM & Admin - Service Accounts

  2. Click the service account (birdie)

  3. Open the “Keys” tab

  4. Click “Add key” - “Create new key”

  5. Select JSON

  6. Click “Create” The JSON file will be downloaded automatically.

Important:

  • This JSON file is what Birdie uses to authenticate


Validate Access

You can validate access using Cloud Shell or your local machine.

  • Activate the service account using the JSON key

  • Run a simple query

  • Validate reading your table using a partition filter

If this works, Birdie can query your data successfully.


Share Connection Details with Birdie

To configure the integration, securely provide Birdie with:

Section
Field
Required
Description

connection_details

project_id

No

GCP project ID. Defaults to the project the service account belongs to.

connection_details

service_account_key

Yes

The full JSON service account key downloaded in the previous step.

data_source

dataset

Yes

The BigQuery dataset name (without the project ID), e.g. gold_layer.

data_source

table

Yes

The table or view to query, e.g. customer_feedback_v1.

data_source

partition_column

No

The date/datetime column used for incremental ingestion. Defaults to posted_at.

data_source

partition_type

No

Type of the partition column: DATE, TIMESTAMP, or DATETIME.

metadata

data_kind

Yes

The type(s) of data in the table, e.g. Support Ticket, NPS, CSAT, Review, Social Media Post, Account.

Share it securely with the Birdie team


Data Types Supported

Birdie can ingest structured datasets exposed as BigQuery tables or views, including:

  • Conversations and messages (Support Tickets, Issues, Social Media Posts)

  • Feedback datasets (review, nps, csat)

  • Operational or reference tables (accounts, users, metadata)

Expose one table or view per dataset type.


References

Last updated