# BigQuery

### Overview

Birdie can ingest structured datasets directly from [BigQuery](https://cloud.google.com/bigquery/docs/introduction) using read-only SQL queries. Most customers run Birdie ingestion daily, using a partition column (DATE / TIMESTAMP / DATETIME) to import only new rows and avoid full table scans.

#### Create the Service Account used by Birdie

Birdie connects to BigQuery using a [Google Cloud Service Account](https://docs.cloud.google.com/iam/docs/service-account-overview). This service account is a technical identity used only for read-only access to your data.\
Steps in Google Cloud Console:

1. [Open Google Cloud Console](https://console.cloud.google.com/)
2. Go to IAM & Admin - [Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
3. Click “Create service account”
4. Service account name (suggested): birdie
5. Description (optional)
6. Click “Create and continue” (roles will be added in the next section)
7. Finish creating the service account

Note: After creation, wait up to 1 minute before using the service account. IAM propagation can be slightly delayed.

***

#### Grant Read-Only BigQuery Permissions

**Permission to run query jobs (Project level)**\
Role: BigQuery Job User (roles/bigquery.jobUser)

This permission allows Birdie to execute SQL queries. It does NOT grant data access by itself.

How to grant:

* IAM & Admin - IAM
* Select your Project
* Add the service account
* Assign role: [BigQuery Job User](https://docs.cloud.google.com/bigquery/docs/access-control#bigquery.jobUser)

***

**Permission to read data (Dataset level – recommended)**\
Role: BigQuery Data Viewer (roles/bigquery.dataViewer)

This permission allows Birdie to read tables or views.

How to grant:

* Go to BigQuery
* Select the Dataset that Birdie should ingest
* Click “Share dataset”
* Add the service account
* Assign role: [BigQuery Data Viewer](https://cloud.google.com/bigquery/docs/access-control#bigquery.dataViewer)

Note: Grant Data Viewer at the dataset (or table) level instead of the entire project.

***

#### Incremental Partition Column

Birdie uses an incremental partition column to determine which rows to ingest on each run. Choosing the right column is critical for complete data ingestion.

#### Requirements

The partition column **must reflect when a row was added or last modified in the table**, not when the original event occurred. For example:

* `loaded_at` or `updated_at` (ETL/pipeline timestamp) - **recommended**
* `inserted_at` (row creation timestamp) - acceptable if the table is append-only

#### Why `posted_at` is not sufficient

Using a business event timestamp like `posted_at` can cause **data gaps**:

1. **Historical updates missed**: If a past row is updated (e.g., a ticket status changes), its `posted_at` remains in the past and Birdie will not re-ingest it.
2. **Pipeline timing gaps**: If your ETL runs at specific times (e.g., every 6 hours), rows loaded between Birdie's query window and the ETL run may not appear until the next cycle. An `updated_at` or `loaded_at` column tied to the ETL timestamp ensures these rows are captured.

#### Recommendation

Add a `loaded_at` or `updated_at` column to your table that is set to `CURRENT_TIMESTAMP()` whenever a row is inserted or modified. Use this column as the partition column when configuring the Birdie integration.

If your table is append-only and never updated, `posted_at` is acceptable as the partition column.

***

#### Prepare Your BigQuery Table

Birdie works best when the source table is [partitioned](https://docs.cloud.google.com/bigquery/docs/creating-partitioned-tables) by a column used for incremental ingestion (for example: posted\_at).

Supported partition column types:

* DATE
* TIMESTAMP
* DATETIME

Steps in BigQuery Console:

1. Open [BigQuery](https://console.cloud.google.com/bigquery)
2. Select your Dataset
3. Click “Create table”
4. Source: Empty table
5. Define the schema:
   1. Make sure the schema includes your partition column as DATE, TIMESTAMP, or DATETIME.&#x20;
6. Partition and cluster settings:
   1. Partition by field
   2. Select the partition column
   3. Recommended: enable “Require partition filter". This reduces cost and prevents accidental full table scans.
7. Create the table

***

#### Create a Service Account Key (JSON)

Birdie authenticates using a JSON [service account key](https://cloud.google.com/iam/docs/keys-create-delete).\
Steps:

1. Go to IAM & Admin - [Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts)
2. Click the service account (birdie)
3. Open the “Keys” tab
4. Click “Add key” - “Create new key”
5. Select JSON
6. Click “Create”\
   The JSON file will be downloaded automatically.

Important:

* This JSON file is what Birdie uses to authenticate

***

Validate Access

You can validate access using Cloud Shell or your local machine.

* Activate the service account using the JSON key

```
gcloud auth activate-service-account --key-file /path/to/birdie.json
```

* Run a simple query

```
bq query --use_legacy_sql=false "SELECT 1 AS ok"
```

* Validate reading your table using a partition filter

```
bq query --use_legacy_sql=false
```

```
"SELECT * FROM <PROJECT_ID>.<DATASET>.<TABLE>
WHERE <PARTITION_COLUMN> >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY) LIMIT 10"
```

If this works, Birdie can query your data successfully.

***

#### Share Connection Details with Birdie

To configure the integration, securely provide Birdie with:

```
{
  "connection_details": {
    "project_id": "your-project-id-123",
    "service_account_key": {
      "type": "service_account",
      "project_id": "your-project-id-123",
      "private_key_id": "...",
      "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
      "client_email": "birdie-integration@project.iam.gserviceaccount.com",
      "client_id": "...",
      "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      "token_uri": "https://oauth2.googleapis.com/token",
      "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      "client_x509_cert_url": "..."
    }
  },
  "data_source": {
    "dataset_id": "marketing_data",
    "table_or_view": "customer_feedback_v1",
    "incremental_partition_column": "created_at",
    "partition_type": "TIMESTAMP"
  },
  "metadata": {
    "data_kind": "Support Ticket, NPS, Account"
  }
}
```

| Section             | Field                 | Required | Description                                                                                                       |
| ------------------- | --------------------- | -------- | ----------------------------------------------------------------------------------------------------------------- |
| connection\_details | project\_id           | No       | GCP project ID. Defaults to the project the service account belongs to.                                           |
| connection\_details | service\_account\_key | Yes      | The full JSON service account key downloaded in the previous step.                                                |
| data\_source        | dataset               | Yes      | The BigQuery dataset name (without the project ID), e.g. `gold_layer`.                                            |
| data\_source        | table                 | Yes      | The table or view to query, e.g. `customer_feedback_v1`.                                                          |
| data\_source        | partition\_column     | No       | The date/datetime column used for incremental ingestion. Defaults to `posted_at`.                                 |
| data\_source        | partition\_type       | No       | Type of the partition column: `DATE`, `TIMESTAMP`, or `DATETIME`.                                                 |
| metadata            | data\_kind            | Yes      | The type(s) of data in the table, e.g. `Support Ticket`, `NPS`, `CSAT`, `Review`, `Social Media Post`, `Account`. |

[Share it securely](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration) with the Birdie team

***

Data Types Supported

Birdie can ingest [structured datasets](https://ask.birdie.ai/integrations-and-data-ingestion/how-to-integrate-with/s3-azure-gcs#data-in-scope) exposed as BigQuery tables or views, including:

* Conversations and messages (Support Tickets, Issues, Social Media Posts)
* Feedback datasets (review, nps, csat)
* Operational or reference tables (accounts, users, metadata)

Expose one table or view per dataset type.

***

### References

* [BigQuery Introduction](https://cloud.google.com/bigquery/docs/introduction)
* [Creating Partitioned Tables](https://cloud.google.com/bigquery/docs/creating-partitioned-tables)
* [BigQuery IAM Roles](https://cloud.google.com/bigquery/docs/access-control#bigquery)
* [Service Account Keys](https://cloud.google.com/iam/docs/keys-create-delete)
* [Securely Sharing Credentials](https://ask.birdie.ai/integrations-and-data-ingestion/securely-sharing-credentials-for-data-integration)
