# Data Schema Definitions for Birdie Export

## Data Export Schema Definitions

All data exported from Birdie—whether through manual CSV downloads or automated Data Forwarding—follows a standardized set of schemas. These definitions describe every file included in the export, its columns, data types, and expected values.

Use this page as a reference when loading Birdie exports into your data warehouse, building dashboards, or integrating insights into downstream systems.

***

### 1. Overview

Every Birdie export produces **seven files**, organized into two groups based on their export behavior and loading strategy:

**Full Export Files** (Truncate & Load)

| File                     | Description                                                  |
| ------------------------ | ------------------------------------------------------------ |
| `areas.csv`              | The complete list of Areas and their associated feedbacks.   |
| `collections.csv`        | The complete list of Collections and their related entities. |
| `area_opportunities.csv` | The many-to-many mapping between Areas and Opportunities.    |

**Incremental Export Files** (Delete & Insert by key)

| File                | Delete Key    | Description                                                              |
| ------------------- | ------------- | ------------------------------------------------------------------------ |
| `feedbacks.csv`     | `ID`          | The core feedback records (reviews, surveys, conversations).             |
| `opportunities.csv` | `Feedback ID` | Opportunities (Opps) detected within each feedback.                      |
| `sentences.csv`     | `Feedback ID` | Individual sentences extracted from feedback text, with NLP annotations. |
| `messages.csv`      | `Feedback ID` | Individual messages within conversation-type feedbacks.                  |

> **Note:** For incremental files, Birdie re-exports all child entities (opportunities, sentences, messages) whenever their parent feedback is updated. Always delete by the `Feedback ID` key before inserting to avoid duplicates.

***

### 2. Data Schemas

#### feedbacks.csv

The primary entity in Birdie's data model. Each row represents a single feedback record — a review, survey response, support ticket, complaint, or social media post. This file contains metadata, timestamps, ratings, and custom fields for every feedback processed by Birdie.

| Column Name              | Type        | Description                                                                                                                                           |
| ------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| ID                       | STRING      | Unique Birdie-generated identifier for the feedback (SHA-256 hash).                                                                                   |
| Ingested ID              | STRING      | The original identifier from the source system, preserved as ingested.                                                                                |
| Source                   | STRING      | The integration or connector that produced the record (e.g., `s3`, `api`).                                                                            |
| Source Alias             | STRING      | A user-customizable label for grouping feedbacks by origin (e.g., `nps`, `support_ticket`).                                                           |
| Kind Name                | STRING      | The record type classification (e.g., `nps`, `csat`, `support_ticket`, `review`, `complaint`, `social_media`).                                        |
| Language                 | STRING      | Language of the feedback text expressed as a BCP 47 code (e.g., `pt-BR`, `en`).                                                                       |
| Text                     | STRING      | The primary text content of the feedback. For conversation-type records, this is typically the opening message or a consolidated view.                |
| Posted At                | STRING      | When the feedback was originally posted or submitted (RFC 3339 timestamp, e.g., `2025-05-04T00:24:27Z`).                                              |
| Batch ID                 | STRING      | Identifier for the ingestion batch that brought this record into Birdie.                                                                              |
| Ingested At              | STRING      | When the record was first ingested into Birdie (RFC 3339 timestamp).                                                                                  |
| Updated At               | STRING      | When the record was last updated within Birdie (RFC 3339 timestamp).                                                                                  |
| Accounts                 | JSON ARRAY  | A JSON array of account identifiers associated with this feedback (e.g., `["acc_hash_1", "acc_hash_2"]`).                                             |
| Total Messages           | INTEGER     | For conversation-type feedbacks, the total number of messages in the thread. Empty for non-conversation records.                                      |
| Messages First Posted At | STRING      | Timestamp of the first message in the conversation thread (RFC 3339). Empty for non-conversation records.                                             |
| Messages Last Posted At  | STRING      | Timestamp of the last message in the conversation thread (RFC 3339). Empty for non-conversation records.                                              |
| Messages Users           | STRING      | Identifier or count of distinct users who participated in the conversation thread.                                                                    |
| Custom Fields            | JSON OBJECT | A JSON object containing all custom fields configured for this record. Each key maps to an object with `description`, `type`, and `value` properties. |
| Category                 | STRING      | A classification label for segmenting feedbacks (e.g., product area, complaint type).                                                                 |
| Status                   | STRING      | Current status of the feedback (e.g., `open`, `pending`, `closed`, `solved`). Primarily used for conversation-type records.                           |
| URL                      | STRING      | Direct link to the original feedback source, if available.                                                                                            |
| Rating                   | FLOAT       | A numerical score associated with the feedback (e.g., NPS 0–10, CSAT 1–5, or a review score).                                                         |
| Author ID                | STRING      | Unique identifier for the author of the feedback.                                                                                                     |
| Author Name              | STRING      | Display name of the feedback author, if available.                                                                                                    |
| Title                    | STRING      | The title or subject line of the feedback, as provided by the author or source system.                                                                |
| Owner                    | STRING      | Indicates the entity owner for competitive analysis (e.g., `Owner` or `Competitor`).                                                                  |
| Subject                  | STRING      | The subject line of the support ticket or conversation, if applicable.                                                                                |
| Priority                 | STRING      | Priority level assigned to the feedback (e.g., `urgent`, `high`, `medium`, `low`). Primarily used for support tickets.                                |
| Channel                  | STRING      | Source channel of the feedback (e.g., `web`, `email`, `chat`, `phone`).                                                                               |

***

#### messages.csv

Contains the individual messages within conversation-type feedbacks (support tickets, complaints, social media threads). Each row represents a single message, linked to its parent feedback via `Feedback ID`.

| Column Name         | Type        | Description                                                                                                              |
| ------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------ |
| Feedback ID         | STRING      | The parent feedback identifier. References `ID` in `feedbacks.csv`.                                                      |
| ID                  | STRING      | Unique Birdie-generated identifier for this specific message.                                                            |
| Ingested ID         | STRING      | The original message identifier from the source system.                                                                  |
| Source              | STRING      | The integration or connector that produced the record.                                                                   |
| Source Alias        | STRING      | A user-customizable label for grouping by origin.                                                                        |
| Kind Name           | STRING      | The record type classification (e.g., `support_ticket`).                                                                 |
| Language            | STRING      | Language of the message text as a BCP 47 code.                                                                           |
| Text                | STRING      | The actual content of the message.                                                                                       |
| Posted At           | STRING      | When the message was posted (RFC 3339 timestamp).                                                                        |
| Batch ID            | STRING      | Identifier for the ingestion batch.                                                                                      |
| Author ID           | STRING      | Unique identifier for the author of the message.                                                                         |
| Author Name         | STRING      | Display name of the message author, if available.                                                                        |
| Author Type         | STRING      | The role of the author within the conversation: `customer`, `agent`, or `bot`.                                           |
| Agent Supervisor ID | STRING      | Identifier for the support agent's supervisor. Populated only when `Author Type` is `agent`.                             |
| Agent Company       | STRING      | The company the support agent belongs to. Populated only when `Author Type` is `agent`.                                  |
| Agent Team          | STRING      | The team the support agent belongs to. Populated only when `Author Type` is `agent`.                                     |
| Agent Experience    | STRING      | The experience or seniority level of the agent (e.g., `junior`, `senior`). Populated only when `Author Type` is `agent`. |
| Custom Fields       | JSON OBJECT | A JSON object containing message-level custom fields, following the same structure as in `feedbacks.csv`.                |
| Ingested At         | STRING      | When the message was first ingested into Birdie (RFC 3339 timestamp).                                                    |
| Created At          | STRING      | When the message record was created in Birdie's internal store (RFC 3339 timestamp).                                     |
| Updated At          | STRING      | When the message record was last updated within Birdie (RFC 3339 timestamp).                                             |

***

#### sentences.csv

Contains the sentence-level NLP analysis produced by Birdie's processing pipeline. Each feedback text is broken into individual sentences, and each sentence is annotated with signal detection, sentiment, specificity, intentions, and product/service aspect tagging.

| Column Name         | Type       | Description                                                                                                                                     |
| ------------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Feedback ID         | STRING     | The parent feedback identifier. References `ID` in `feedbacks.csv`.                                                                             |
| ID                  | STRING     | Unique identifier for the sentence, formatted as `{Feedback ID}#{sequence_number}` (e.g., `abc123#0`, `abc123#1`).                              |
| Language            | STRING     | Detected language of the sentence as a BCP 47 code.                                                                                             |
| Sentence            | STRING     | The extracted sentence text.                                                                                                                    |
| Signal              | BOOLEAN    | Whether the sentence contains a meaningful signal (`true` or `false`). Non-signal sentences (e.g., greetings, pleasantries) are marked `false`. |
| Sentiment Value     | STRING     | The sentiment classification of the sentence. Possible values: `POSITIVE`, `NEGATIVE`, `MIXED`, `NEUTRAL`.                                      |
| Sentiment Intensity | STRING     | The strength of the detected sentiment. Possible values: `LOW`, `MEDIUM`, `HIGH`.                                                               |
| Specificity         | STRING     | How specific or actionable the sentence content is. Possible values: `LOW`, `MEDIUM`, `HIGH`.                                                   |
| Intentions          | JSON ARRAY | A JSON array of detected user intentions (e.g., `["REQUEST"]`, `["PROBLEM"]`, `["PRAISE"]`).                                                    |
| Aspects Product     | JSON ARRAY | A JSON array of product-related aspects or features mentioned in the sentence (e.g., `["máquina"]`, `["app"]`).                                 |
| Aspects Service     | JSON ARRAY | A JSON array of service-related aspects mentioned in the sentence (e.g., `["atendimento"]`, `["suporte"]`).                                     |

***

#### opportunities.csv

Maps feedbacks to Opportunities (Opps) — the insight clusters identified by Birdie's analysis. Each row represents a single feedback-to-opportunity association. One feedback may appear in multiple opportunities, and one opportunity groups many feedbacks.

| Column Name          | Type   | Description                                                                                               |
| -------------------- | ------ | --------------------------------------------------------------------------------------------------------- |
| Opportunity ID       | STRING | Unique identifier for the Opportunity (UUID).                                                             |
| Opportunity Name     | STRING | Human-readable name or description of the Opportunity.                                                    |
| Feedback ID          | STRING | The feedback identifier associated with this Opportunity. References `ID` in `feedbacks.csv`.             |
| Feedback Ingested ID | STRING | The original ingested identifier of the associated feedback. References `Ingested ID` in `feedbacks.csv`. |

***

#### areas.csv

Maps feedbacks to Areas — Birdie's thematic classification buckets. Each row represents a single feedback-to-area association. One feedback may belong to multiple areas.

| Column Name          | Type   | Description                                                                                               |
| -------------------- | ------ | --------------------------------------------------------------------------------------------------------- |
| Area ID              | STRING | Unique identifier for the Area (UUID).                                                                    |
| Area Name            | STRING | Human-readable name of the Area (e.g., `Crédito \| Limite Extra`).                                        |
| Feedback ID          | STRING | The feedback identifier associated with this Area. References `ID` in `feedbacks.csv`.                    |
| Feedback Ingested ID | STRING | The original ingested identifier of the associated feedback. References `Ingested ID` in `feedbacks.csv`. |

***

#### collections.csv

Contains the definitions of Collections and their related entities. Collections are user-curated groupings used to organize Areas or other Birdie entities for monitoring and reporting.

| Column Name      | Type   | Description                                                                              |
| ---------------- | ------ | ---------------------------------------------------------------------------------------- |
| collection\_id   | STRING | Unique identifier for the Collection (UUID).                                             |
| collection\_name | STRING | Human-readable name of the Collection (e.g., `Squads \| Beatriz & João`).                |
| collection\_type | STRING | The type of entity this collection groups. Possible values: `area_interest`.             |
| related\_id      | STRING | The identifier of the related entity (e.g., an Area ID) that belongs to this Collection. |

***

#### area\_opportunities.csv

A junction table that maps the many-to-many relationship between Areas and Opportunities. Use this file to understand which Opportunities belong to which Areas.

| Column Name     | Type   | Description                                                                            |
| --------------- | ------ | -------------------------------------------------------------------------------------- |
| opportunity\_id | STRING | The Opportunity identifier (UUID). References `Opportunity ID` in `opportunities.csv`. |
| area\_id        | STRING | The Area identifier (UUID). References `Area ID` in `areas.csv`.                       |

***

### 3. Data Types Reference

| Type               | Format              | Example                                   |
| ------------------ | ------------------- | ----------------------------------------- |
| STRING             | UTF-8 text          | `abc123`, `support_ticket`                |
| INTEGER            | Whole number        | `12`, `0`                                 |
| FLOAT              | Decimal number      | `5.0`, `8.5`                              |
| BOOLEAN            | Lowercase string    | `true`, `false`                           |
| JSON ARRAY         | JSON-encoded array  | `["value1", "value2"]`                    |
| JSON OBJECT        | JSON-encoded object | `{"key": {"type": "enum", "value": "X"}}` |
| Timestamp (STRING) | RFC 3339            | `2025-05-04T00:24:27Z`                    |

***

### 4. Entity Relationships

```
feedbacks.csv (ID)
 ├── messages.csv (Feedback ID → feedbacks.ID)
 ├── sentences.csv (Feedback ID → feedbacks.ID)
 ├── opportunities.csv (Feedback ID → feedbacks.ID)
 └── areas.csv (Feedback ID → feedbacks.ID)

areas.csv (Area ID)
 └── area_opportunities.csv (area_id → areas.Area ID)

opportunities.csv (Opportunity ID)
 └── area_opportunities.csv (opportunity_id → opportunities.Opportunity ID)

collections.csv (collection_id)
 └── related_id → areas.Area ID (when collection_type = area_interest)
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ask.birdie.ai/integrations-and-data-ingestion/data-schema-definitions-for-birdie-export.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
