> For the complete documentation index, see [llms.txt](https://ask.birdie.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ask.birdie.ai/integrations-and-data-ingestion/data-schema-definitions-for-birdie-ingestion.md).

# Data Schema Definitions for Birdie Ingestion

To ensure seamless data ingestion and high-quality insights, all data imported into Birdie—regardless of the source—must adhere to our standardized data schemas.

Whether you are connecting via Cloud Storage (AWS S3, Azure Blob Storage, or Google Cloud Storage) or Cloud Data Warehouses (BigQuery, Snowflake, or Databricks), your source data must be structured to match these definitions. While Birdie offers robust ingestion capabilities, this process typically requires your data or engineering team to transform and map your internal records to these schemas to ensure compatibility.

These schemas act as a blueprint, telling Birdie exactly how to interpret each record—whether it is a single customer review, a complex support thread, or a detailed user profile.

***

## 1. Requirements

* Data Preparation: Your data team must ensure that source tables or files match the field names and types outlined in the tables below.
* Storage/Database Access: Proper credentials (IAM roles, Service Accounts, or API keys) with read-access to the specific datasets.
* File & Table Formats: Support for Parquet, CSV, or direct database table syncing.
* Timestamp Precision: All date fields must follow the RFC 3339 format (e.g., `2023-10-25T14:30:00Z`) for accurate chronological analysis.

***

## 2. Understanding Record Types

Birdie categorizes data into three primary record types ("Kinds"). Choosing the correct Kind during setup is crucial, as it determines how Birdie processes, analyzes, and presents your data.

**Feedback:** Represents a single-point interaction — one person, one moment, one piece of input. A Feedback record typically contains a text comment, a numerical rating, and metadata. This Kind has two sub-types: `Review`, for general product or service reviews, and `NPS/CSAT`, optimized for structured survey responses where the rating is the primary signal and the text comment may be absent.

> Examples: App Store reviews, Google Play reviews, NPS surveys, CSAT responses, star ratings.

**Conversation:** Represents a multi-turn interaction. This Kind of record is denormalized — it groups multiple messages or events under a single `conversation_id`, capturing the full lifecycle of an exchange. A Conversation has four sub-types: `Support Ticket`, for customer service threads; `Complaint`, for grievance tracking and resolution workflows; `Social Media Post`, for public threads from platforms like Facebook, X, or Reddit; and `Issue`, for bug reports and development tasks from tools like Jira or GitHub.

> Examples: Support tickets with several agent replies, complaint negotiations, social media threads, Jira issues.

**Account:** Defines the "Who" behind the data. Account records contain customer profiles, behavioral attributes, or subscription data used for advanced segmentation and cross-referencing. Every Feedback and every Conversation record must be associated with exactly one Account, establishing the link between what was said and who said it.

<figure><img src="/files/sVNFvS5EvGT9eEAiwltm" alt="Diagram showing the hierarchy of record types: Account connects to Feedback and Conversation, which branch into their respective sub-types."><figcaption><p>Record Types Hierarchy</p></figcaption></figure>

***

## 3. Data Schemas

#### Feedbacks // Review

Used for public or private reviews of products or services. Each row represents a single review entry.

<table data-header-hidden><thead><tr><th width="122.046875"></th><th width="96.56640625"></th><th width="102.47265625"></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td><strong>Impact</strong></td></tr><tr><td><code>feedback_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for each review.</td><td>"rev-2024-abc123"</td><td>❌ Ingestion fails. Duplicates overwrite silently.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>No</td><td>The actual text or comment posted by the user. May be null for rating-only feedbacks (e.g., star ratings without comments).</td><td>"Great app, deposits are instant!"</td><td>⚠️ No Signals, Sentiment, Intentions, Keyword Search, Skye, or Opportunities. Only <code>rating</code> feeds Dashboard metrics.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the feedback was posted (RFC 3339 timestamp format).</td><td>"2024-06-15T14:30:00Z"</td><td>❌ Ingestion fails. Drives all time-series charts, date filters, and Initiative impact tracking.</td></tr><tr><td><code>rating</code></td><td>FLOAT</td><td>Yes</td><td>A numerical rating or score associated with the feedback.</td><td>4</td><td>❌ Ingestion fails. Drives Review AVG Rating and Satisfied/Unsatisfied counts in Dashboards.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>No</td><td>The author's email address, used as a unique identifier for cross-referencing across record types.</td><td>"author@example.com"</td><td>⚠️ No per-author cross-reference with other record types.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>No</td><td>Unique identifier for the account the record belongs to.</td><td>"acct-abc123"</td><td>⚠️ No cross-source analysis. Account-level Segments and Opportunity Prevalence Rate exclude this record.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the record expressed as a BCP 47 code.</td><td>"pt-BR"</td><td>⚠️ Defaults to auto-detection. Sentiment, Intentions, and Signal accuracy degrade. Use BCP 47 codes only.</td></tr><tr><td><code>title</code></td><td>STRING</td><td>No</td><td>The title of the feedback as provided by the author.</td><td>"Best banking app!"</td><td>⚠️ No headline in Exploration UI. Not processed by NLP — no impact on Sentiment/Signals.</td></tr><tr><td><code>category</code></td><td>STRING</td><td>No</td><td>The specific category the review belongs to.</td><td>"Finance"</td><td>⚠️ Category-based filtering unavailable in Areas, Segments, Reasons, and Dashboards.</td></tr><tr><td><code>owner</code></td><td>STRING</td><td>No</td><td>Indicates the entity owner, e.g., "Owner" or "Competitor".</td><td>"Nubank"</td><td>⚠️ Cannot filter by product/brand or compare Owner vs. Competitor in Exploration and Dashboards.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>No</td><td>A user-customizable label used for grouping feedbacks (e.g., "G2", "App Store").</td><td>"App Store BR"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter, header metrics, and Dashboard/Area/Segment breakdowns by source lose meaning.</td></tr></tbody></table>

#### Feedbacks // NPS and CSAT

Optimized for survey responses focusing on sentiment metrics. Unlike reviews, the text comment is often optional in these surveys.

<table data-header-hidden><thead><tr><th></th><th width="98.7578125"></th><th width="102.359375"></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td><strong>Impact</strong></td></tr><tr><td><code>feedback_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for each survey answer.</td><td>"nps-2024-q1-001"</td><td>❌ Ingestion fails. Duplicates overwrite silently.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the feedback was posted (RFC 3339 timestamp format).</td><td>"2024-03-10T09:00:00Z"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>rating</code></td><td>FLOAT</td><td>Yes</td><td>The numerical score (e.g., NPS 0-10 or CSAT 1-5).</td><td><code>9</code> (NPS) / <code>4</code> (CSAT)</td><td>❌ Ingestion fails. Drives NPS Score, CSAT Score, Promoter/Detractor counts, and all survey Dashboard metrics. ⚠️ Wrong scale (e.g., NPS as 1–5): accepted, but all NPS metrics become meaningless.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>No</td><td>Qualitative comment or open-ended text posted by the user.</td><td>"I always recommend this service!"</td><td>⚠️ Score metrics work, but no Signals, Sentiment, Intentions, Keyword Search, Skye, or Opportunities — you lose the "why" behind the score.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>No</td><td>The author's email address, used as a unique identifier for linking the response to a specific respondent.</td><td>"respondent@example.com"</td><td>⚠️ Cannot link response to a specific respondent.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the account the record belongs to.</td><td>"acc-xyz"</td><td>⚠️ No cross-source analysis (e.g., "tickets from NPS detractors"). Account-level Segments exclude this record.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the record expressed as a BCP 47 code.</td><td>"pt-BR"</td><td>⚠️ Defaults to auto-detection. Sentiment and Signal accuracy degrade.</td></tr><tr><td><code>title</code></td><td>STRING</td><td>No</td><td>The name or title of the survey.</td><td>"[NPS 2024-Q1] Recommendation Survey"</td><td>⚠️ Cannot filter by survey wave in Exploration or Dashboards.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>No</td><td>A user-customizable label for grouping feedbacks.</td><td>"NPS Q1 2024"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter and Dashboard breakdowns by source lose meaning.</td></tr></tbody></table>

#### Conversations // Support tickets

This schema is designed to capture the full lifecycle of a support interaction. Because support tickets usually consist of multiple replies, Birdie uses a "long format" where each row represents a single message, but all messages in the same thread share a common `conversation_id`.

<table data-header-hidden><thead><tr><th></th><th width="100.8046875"></th><th width="96.74609375"></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td><strong>Impact</strong></td></tr><tr><td><code>conversation_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the entire conversation thread.</td><td>"ticket-2024-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>message_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for each specific message (must be unique at the account level).</td><td>"msg-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>Yes</td><td>The actual content/text of the message.</td><td>"Hi, I can't log in since yesterday."</td><td>⚠️ No Signals, Sentiment, Intentions, Keyword Search, Skye, Opportunities, or QA Criteria evaluation. HTML in text degrades NLP — strip before sending.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the message was posted (RFC 3339 timestamp).</td><td>"2024-01-15T10:05:00Z"</td><td>❌ Ingestion fails. Drives Conversation Features (Response Times, Duration), message ordering, and time-series charts. Identical timestamps = broken ordering.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>Yes</td><td>The author's email address. Must be an email to ensure proper agent recognition, QA evaluation, and cross-referencing within Birdie.</td><td>"agent@company.com"</td><td>⚠️ QA Agent page empty — no per-agent Quality Score, Manual Evaluation, or Agent Feedback. Agent Handoffs unavailable.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>Yes</td><td>Identifier for the account the message belongs to.</td><td>"acc-xyz"</td><td>⚠️ No cross-source analysis. Account-level Segments exclude this conversation.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the message as a BCP 47 code.</td><td>"pt-BR"</td><td>⚠️ Defaults to auto-detection. Sentiment, Signal, and QA Criteria accuracy degrade.</td></tr><tr><td><code>subject</code></td><td>STRING</td><td>No</td><td>The subject line of the support ticket.</td><td>"Can't reset my password"</td><td>⚠️ No conversation title in Exploration. Not processed by NLP.</td></tr><tr><td><code>status</code></td><td>STRING</td><td>No</td><td>Current status of the ticket (e.g., "open", "pending", "closed").</td><td>"solved"</td><td>⚠️ No status filter/breakdown in Dashboards, Areas, Segments, or Reasons.</td></tr><tr><td><code>priority</code></td><td>STRING</td><td>No</td><td>Priority assigned to the ticket (e.g., "urgent", "low").</td><td>"high"</td><td>⚠️ No priority filter/breakdown in Dashboards, Areas, Segments, or Reasons.</td></tr><tr><td><code>channel</code></td><td>STRING</td><td>No</td><td>Source channel of the ticket (e.g., "web", "email", "chat").</td><td>"email"</td><td>⚠️ No channel filter/breakdown in Dashboards, Areas, Segments, or Reasons.</td></tr><tr><td><code>tags</code></td><td>REPEATED STRING</td><td>No</td><td>An array/list of tags applied to the ticket.</td><td>["billing", "urgent"]</td><td>⚠️ No tag-based filtering in Areas, Segments, or Reasons.</td></tr><tr><td><code>author_type</code></td><td>STRING</td><td>Yes</td><td>The role of the author, specifically one of {"Bot", "Agent", "User"}.</td><td>"Agent"</td><td>⚠️ <strong>Highest-impact optional field.</strong> All Conversation Features (50+ fields) = zero. QA Criteria produce false positives. QA Agent page broken. Values: <code>Customer</code>/<code>User</code>, <code>Agent</code>/<code>Internal Person</code>, <code>Bot</code>.</td></tr><tr><td><code>survey_title</code></td><td>STRING</td><td>No</td><td>Title of the survey presented upon ticket closure.</td><td>"Post-call CSAT"</td><td>⚠️ Survey name not displayed. No feature impact.</td></tr><tr><td><code>survey_type</code></td><td>STRING</td><td>Yes</td><td>Type of the closing survey, specifically one of {"csat", "nps"}.</td><td>"csat"</td><td>⚠️ TCSAT/TNPS Dashboard metrics cannot be calculated. Upload on one message per conversation only.</td></tr><tr><td><code>rating</code></td><td>FLOAT</td><td>Yes</td><td>The numerical rating provided by the client for the support experience.</td><td>8</td><td>⚠️ TCSAT/TNPS Dashboard metrics cannot be calculated. Upload on one message per conversation only.</td></tr><tr><td><code>solved</code></td><td>STRING</td><td>Yes</td><td>Flag indicating if the ticket was resolved, specifically one of {"true", "false"}.</td><td>"true"</td><td>⚠️ No resolution filter/breakdown in Dashboards. No native Resolution Rate metric — filter dimension only.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>Yes</td><td>A user-customizable label for grouping (e.g., "Zendesk", "Intercom").</td><td>"Zendesk"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter, header metrics, and all breakdowns by source lose meaning.</td></tr><tr><td><code>agent_team</code></td><td>STRING</td><td>Yes</td><td>The name of the support agent's team.</td><td>"Billing Team"</td><td>⚠️ No team-level filtering in QA or Dashboards.</td></tr><tr><td><code>agent_company</code></td><td>STRING</td><td>Yes</td><td>The name of the support agent's company.</td><td>"Atento BPO"</td><td>⚠️ No BPO vendor comparison in QA or Dashboards.</td></tr><tr><td><code>agent_supervisor_id</code></td><td>STRING</td><td>Yes</td><td>The email address of the support agent's supervisor, used for supervisor-level rollups and QA reporting.</td><td>"supervisor@company.com"</td><td>⚠️ Supervisor-level rollups unavailable in QA and Dashboards.</td></tr><tr><td><code>agent_experience</code></td><td>STRING</td><td>Yes</td><td>The maturity or experience level of the agent (Enum).</td><td>"senior"</td><td>⚠️ No seniority vs. quality correlation in QA or Dashboards.</td></tr></tbody></table>

Pro-Tip for Data Teams: > \* To ensure data consistency, upload only one row per conversation containing the survey fields (`survey_type`, `rating`, etc.). This should ideally be the final message of the ticket.

* For all other messages in the same thread, ensure the ticket-level fields (like `subject`, `status`, and `priority`) remain consistent across rows.

#### Conversations // Complaints

The Complaints schema is specialized for tracking and analyzing customer grievances. Similar to support tickets, it supports multiple interactions grouped by a single conversation ID to capture the negotiation or resolution process.

<table data-header-hidden><thead><tr><th></th><th width="99.421875"></th><th width="102.18359375"></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td>Impact</td></tr><tr><td><code>conversation_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the specific complaint thread.</td><td>"complaint-2024-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>message_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for each message (must be unique at the account level).</td><td>"cmsg-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>Yes</td><td>The actual content/text of the complaint message.</td><td>"My order hasn't arrived and it's been 15 days already."</td><td>⚠️ No Signals, Sentiment, Intentions, Keyword Search, Skye, or Opportunities.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the message was posted (RFC 3339 timestamp).</td><td>"2024-02-10T08:00:00Z"</td><td>❌ Ingestion fails. Drives Conversation Features and time-series charts.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>No</td><td>The author's email address, used as a unique identifier for the complainant.</td><td>"customer@example.com"</td><td>⚠️ Cannot identify complainant. No cross-reference.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>Yes</td><td>Identifier for the account the message belongs to.</td><td>"acct-456"</td><td>⚠️ No cross-source analysis. Account-level Segments exclude this complaint.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the message expressed as a BCP 47 code.</td><td>"pt-BR"</td><td>⚠️ Defaults to auto-detection. Sentiment and Signal accuracy degrade.</td></tr><tr><td><code>category</code></td><td>STRING</td><td>No</td><td>A classification for segmenting complaints (e.g., "Support", "Shipping", "Billing").</td><td>"Delivery"</td><td>⚠️ No category filter in Areas, Segments, Reasons, or Dashboards. NLP topics still work from <code>text</code>.</td></tr><tr><td><code>status</code></td><td>STRING</td><td>No</td><td>Current state of negotiations (e.g., pending, initiated, ongoing, or solved).</td><td>"resolved"</td><td>⚠️ No resolution status filter/breakdown in Dashboards, Areas, or Segments.</td></tr><tr><td><code>url</code></td><td>STRING</td><td>No</td><td>Direct link to the complaint if it originates from a public source.</td><td>"https://reclameaqui.com.br/complaint/12345"</td><td>⚠️ No click-through to source. No analysis impact.</td></tr><tr><td><code>rating</code></td><td>FLOAT</td><td>Yes</td><td>The score the client gave specifically to the complaint negotiation experience.</td><td>3.0</td><td>⚠️ Complaint satisfaction score unavailable in Dashboards.</td></tr><tr><td><code>author_type</code></td><td>STRING</td><td>Yes</td><td>The role of the author, specifically one of {"Internal Person", "User", "Bot"}.</td><td>"User"</td><td>⚠️ All Conversation Features (50+ fields) empty. Cannot separate customer vs. company messages. QA broken.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>Yes</td><td>A user-customizable label for grouping (e.g., "Public Forum", "Direct Email").</td><td>"Reclame Aqui"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter and all breakdowns by source lose meaning.</td></tr></tbody></table>

#### Conversation // Social Media Post

Used for threads and interactions from social platforms like Facebook, X (Twitter), or Reddit.

<table data-header-hidden><thead><tr><th></th><th width="101.34375"></th><th width="97.94140625"></th><th width="128.42578125"></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td><strong>Impact</strong></td></tr><tr><td><code>conversation_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the post or thread.</td><td>"reddit-post-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>message_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the specific post or comment.</td><td>"rmsg-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>Yes</td><td>Content of the message.</td><td>"The new update broke the search feature."</td><td>⚠️ No Signals, Sentiment, Intentions, Keyword Search, Skye, Opportunities, or Social Media Unsatisfied Count.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the message was posted (RFC 3339 timestamp).</td><td>"2024-04-01T12:00:00Z"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>No</td><td>The author's email address, used as a unique identifier for cross-referencing.</td><td>"user@example.com"</td><td>⚠️ Cannot identify author. No cross-reference.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>No</td><td>Identifier for the account the message belongs to.</td><td>"acct-social-001"</td><td>⚠️ No cross-source analysis. Account-level Segments exclude this post.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the message as a BCP 47 code.</td><td>"en"</td><td>⚠️ Defaults to auto-detection. Sentiment and Signal accuracy degrade.</td></tr><tr><td><code>title</code></td><td>STRING</td><td>No</td><td>Title of the original social media post.</td><td>"Search broken after update"</td><td>⚠️ No headline in Exploration. Not processed by NLP.</td></tr><tr><td><code>owner</code></td><td>STRING</td><td>No</td><td>Indicates the entity owner: Owner or Competitor.</td><td>"Owner"</td><td>⚠️ Cannot distinguish owned vs. competitor mentions in Exploration or Dashboards.</td></tr><tr><td><code>category</code></td><td>STRING</td><td>No</td><td>Segment or sub-grouping (e.g., a specific Subreddit name).</td><td>"r/BirdieApp"</td><td>⚠️ No sub-group filter in Areas, Segments, Reasons, or Dashboards.</td></tr><tr><td><code>url</code></td><td>STRING</td><td>No</td><td>URL of the social post.</td><td>"https://reddit.com/r/BirdieApp/post/001"</td><td>⚠️ No click-through to source. No analysis impact.</td></tr><tr><td><code>channel</code></td><td>STRING</td><td>No</td><td>The source platform (e.g., "facebook", "reddit").</td><td>"reddit""</td><td>⚠️ No platform filter/breakdown in Dashboards, Areas, Segments, or Reasons.</td></tr><tr><td><code>tags</code></td><td>REPEATED STRING</td><td>No</td><td>Array of tags applied to the post.</td><td>["bug", "feature-request"]</td><td>⚠️ No tag-based filtering in Areas, Segments, or Reasons.</td></tr><tr><td><code>author_type</code></td><td>STRING</td><td>No</td><td>The role of the author, specifically one of {"Internal Person", "User", "Bot"}.</td><td>"User"</td><td>⚠️ All Conversation Features (50+ fields) empty. Cannot distinguish brand vs. user posts</td></tr><tr><td><code>upvotes</code></td><td>INTEGER</td><td>No</td><td>The number of likes or upvotes the message has received.</td><td>22</td><td>⚠️ Engagement metrics unavailable. Cannot prioritize by popularity.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>Yes</td><td>A user-customizable label for grouping.</td><td>"Reddit"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter and all breakdowns by source lose meaning.</td></tr></tbody></table>

#### Conversation // Issue

Used for tracking bug reports, development tasks, or tickets from platforms like Jira or GitHub.

<table data-header-hidden><thead><tr><th></th><th width="101.4140625"></th><th width="103.4296875"></th><th></th><th></th><th></th></tr></thead><tbody><tr><td><strong>Column Name</strong></td><td><strong>Type</strong></td><td><strong>Required</strong></td><td><strong>Description</strong></td><td><strong>Example</strong></td><td><strong>Issue</strong></td></tr><tr><td><code>conversation_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for the issue thread.</td><td>"issue-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>message_id</code></td><td>STRING</td><td>Yes</td><td>Unique identifier for each individual update or comment.</td><td>"imsg-001"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>text</code></td><td>STRING</td><td>Yes</td><td>The content of the issue description or comment.</td><td>"Login button unresponsive on mobile Safari."</td><td>⚠️ No Signals, Sentiment, Intentions, Keyword Search, Skye, or Opportunities.</td></tr><tr><td><code>posted_at</code></td><td>STRING</td><td>Yes</td><td>When the record was created/posted (RFC 3339 timestamp).</td><td>"2024-03-15T09:00:00Z"</td><td>❌ Ingestion fails.</td></tr><tr><td><code>author_id</code></td><td>STRING</td><td>No</td><td>The author's email address, used as a unique identifier for cross-referencing.</td><td>"developer@company.com"</td><td>⚠️ Cannot identify reporter. No cross-reference.</td></tr><tr><td><code>account_id</code></td><td>STRING</td><td>No</td><td>Identifier for the account associated with the issue.</td><td>"acct-789"</td><td>⚠️ No cross-source analysis. Account-level Segments exclude this issue.</td></tr><tr><td><code>language</code></td><td>STRING</td><td>No</td><td>Language of the message as a BCP 47 code.</td><td>"en"</td><td>⚠️ Defaults to auto-detection. Sentiment and Signal accuracy degrade.</td></tr><tr><td><code>project_id</code></td><td>STRING</td><td>No</td><td>Unique identifier for the project or repository.</td><td>"birdie/platform"</td><td>⚠️ Cannot segment by project in Areas, Segments, or Dashboards.</td></tr><tr><td><code>project_name</code></td><td>STRING</td><td>No</td><td>Human-readable name of the project.</td><td>"Platform"</td><td>⚠️ Shows raw ID only in Exploration — not user-friendly.</td></tr><tr><td><code>title</code></td><td>STRING</td><td>No</td><td>The title or headline of the issue.</td><td>"Login button unresponsive on mobile"</td><td>⚠️ No headline in Exploration. Not processed by NLP.</td></tr><tr><td><code>status</code></td><td>STRING</td><td>No</td><td>Current status (e.g., "To Do", "In Progress", "Done").</td><td>"in_progress"</td><td>⚠️ No status filter/breakdown in Dashboards, Areas, or Segments.</td></tr><tr><td><code>source</code></td><td>STRING</td><td>Yes</td><td>A user-customizable label for grouping.</td><td>"GitHub"</td><td>⚠️ Defaults to <code>"api"</code>. Source filter and all breakdowns by source lose meaning.</td></tr></tbody></table>

#### Accounts

The foundation for customer profile data and behavior-based segmentation.

| **Column Name** | **Type** | **Required** | **Description**                                                 | **Example**            | **Impact**                                                                                                                                                                 |
| --------------- | -------- | ------------ | --------------------------------------------------------------- | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `account_id`    | STRING   | Yes          | Unique identifier for the account.                              | "acct-gurgel-001"      | ❌ Ingestion fails. If value doesn't match `account_id` in Feedback/Messages: cross-source analysis, Account-level Segments, and Opportunity Prevalence Rate silently fail. |
| `update_id`     | STRING   | Yes          | Unique timestamp used to split chat posts.(RFC 3339 timestamp). | "2024-02-10T08:00:00Z" | ❌ This prevents ingestion in the case of very large files.                                                                                                                 |

***

## 4. Custom Fields

Birdie allows for unlimited and flexible custom fields. If your source data contains information not covered by the standard schemas above (e.g., "Plan Type," "User Region," or "Churn Risk"), you can ingest them by providing the following mapping:

* Source/Destination Mapping: Link your source column to a Birdie field.
* Data Type: Set as `String`, `Bool`, `Number`, `Date`, `Datetime`, `Enum`, or `Unique`.
* Friendly Label: The name displayed to your team in the Birdie UI.
* List Support: A toggle to indicate if the field should accept a single value or a list of values.

## 5. Feature Enablement

This table maps Birdie platform capabilities to the minimum ingested fields required to enable them. Use it to prioritize which fields your data team should focus on based on the features your organization needs.

| **Capability**                                | **Minimum Required Fields**                                                                      | **Schema**               | **What It Enables**                                                                                             |
| --------------------------------------------- | ------------------------------------------------------------------------------------------------ | ------------------------ | --------------------------------------------------------------------------------------------------------------- |
| NPS Score calculation                         | `kind=nps` + `rating` (0–10 scale)                                                               | Feedback                 | NPS Score, Promoter/Passive/Detractor counts, NPS Potential Improvement in Dashboards.                          |
| CSAT Score calculation                        | `kind=csat` + `rating` (1–5 scale)                                                               | Feedback                 | CSAT Score (% with 4–5), CSAT AVG Rating, Satisfied/Unsatisfied counts in Dashboards.                           |
| Review metrics                                | `kind=review` + `rating` + `owner`                                                               | Feedback                 | Review AVG Rating, Satisfied/Unsatisfied counts, Owner vs. Competitor filtering.                                |
| Ticket satisfaction (TCSAT/TNPS)              | `survey_type` (`csat` or `nps`) + `rating` on one message per conversation                       | Conv. Message            | TCSAT Score, TNPS Score in Dashboards.                                                                          |
| NLP analysis (Signals, Sentiment, Intentions) | `text` + `language`                                                                              | Feedback / Conv. Message | Sentiment Score, Intention classification, Signal filtering, Keyword Search, Skye AI analysis.                  |
| Opportunity discovery                         | `text` + `posted_at`                                                                             | Feedback / Conv. Message | AI-powered Opportunity identification via Skye and Areas. Trend analysis over time.                             |
| Conversation Features (50+ metrics)           | `posted_at` + `author_type` on every message                                                     | Conv. Message            | First Response Time, Avg Response Time, Conversation Duration, Turn Count, Agent Message Count, Agent Handoffs. |
| QA Criteria evaluation                        | `text` + `author_type` + `author_id`                                                             | Conv. Message            | AI and manual evaluation of agent behavior. Per-agent Quality Score.                                            |
| QA team/vendor analysis                       | `agent_team`, `agent_company`, `agent_supervisor_id`, `agent_experience`                         | Conv. Message            | Team-level QA, BPO vendor comparison, supervisor rollups, seniority correlation.                                |
| Cross-source analysis                         | `account_id` consistent across all record types                                                  | All                      | Correlate NPS detractors with their support tickets, complaints, reviews, etc.                                  |
| Account-level Segments                        | Account record + `account_id` linked in Feedback/Messages                                        | Account                  | Segmentation by industry, plan, lifecycle stage, company size, revenue.                                         |
| Areas and Segments                            | At least one filterable field (`source`, `channel`, `category`, `tags`, `status`, custom fields) | All                      | Topic-based and metadata-based grouping for analysis.                                                           |
| Initiative impact tracking                    | `posted_at` on underlying records + Opportunity defined                                          | Feedback / Conv. Message | Before/after release date comparison on Opportunity timeline charts.                                            |
| Custom field filtering                        | `additional_fields.*` mapped via Ingester config                                                 | All                      | Custom filters in Exploration, Dashboard breakdowns, Segment/Area/Reason conditions.                            |

## 6. Common Data Issues

Before uploading data to Birdie, review this checklist to avoid the most frequent integration problems.

| **Issue**                                                                                                                | **Symptom in Birdie**                                                                                                                                | **How to Fix**                                                                                                                                                                   |
| ------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `rating`, `survey_type`, and `solved` are present on every message instead of only the last message of the conversation. | Duplicate survey data inflates TCSAT/TNPS metrics — scores appear artificially higher or lower.                                                      | Populate `survey_type`, `rating`, and `solved` on **only one message per conversation** (ideally the last). Leave these fields empty on all other messages.                      |
| `posted_at` is not in RFC 3339 / ISO 8601 format (e.g., `15/06/2024`, `2024-06-15`, `1718451000`).                       | ❌ Ingestion fails (400 error). Record is rejected.                                                                                                   | Always use full ISO 8601 with timezone: `2024-06-15T14:30:00Z`. Convert Brazilian `DD/MM/YYYY` and Unix timestamps before sending.                                               |
| Multiple messages in a single row instead of one message per row.                                                        | Birdie cannot parse individual messages. Conversation structure is lost — no turn-taking, no Conversation Features, no per-message Sentiment.        | Split into one row per message. Each row must have a unique `message_id` and share the same `conversation_id`.                                                                   |
| `author_type` uses values outside the accepted set (e.g., `"System"`, `"Supervisor"`, `"Manager"`).                      | ❌ `"System"` is rejected by validation. Other unrecognized values are also rejected.                                                                 | Use only: `Customer` (or `User`), `Agent` (or `Internal Person`), `Bot`. These are case-insensitive. For automated/system messages, use `Bot` or filter them out before sending. |
| `posted_at` is identical for all messages in a conversation.                                                             | Message ordering is wrong. Conversation summary is incoherent. Conversation Features (Response Times, Duration) = 0.                                 | Use the actual timestamp from the source system for each message. Ensure chronological ordering.                                                                                 |
| `text` field contains raw HTML tags (e.g., `<p>Hello</p><br/>`).                                                         | NLP processes HTML as text — Sentiment, Signal, and topic extraction accuracy degrade.                                                               | Strip all HTML tags before sending. Send plain text only.                                                                                                                        |
| Text encoding is `latin-1` or `Windows-1252` instead of UTF-8.                                                           | Characters appear broken (`nÃ£o` instead of `não`). NLP analysis produces garbage results.                                                           | Convert all files to UTF-8 before uploading. Common with Brazilian data exports.                                                                                                 |
| `account_id` values are inconsistent across record types (e.g., `"123"` in NPS but `"acct-123"` in tickets).             | Cross-source analysis silently fails. Records exist as isolated silos — cannot correlate NPS detractors with their tickets.                          | Use the exact same `account_id` string across all Feedback, Conversation Message, and Account records.                                                                           |
| IDs are sequential integers reused across systems (e.g., Zendesk ticket `1` and Intercom ticket `1`).                    | Records overwrite each other (PUT semantics). Data loss.                                                                                             | Prefix IDs with the source system name: `"zendesk-1"`, `"intercom-1"`.                                                                                                           |
| Boolean fields use `"Sim"/"Não"`, `"Yes"/"No"`, or `"1"/"0"` instead of `true`/`false`.                                  | Custom fields may be rejected or mistyped. `solved` field won't be recognized.                                                                       | Convert to `true` / `false` before sending.                                                                                                                                      |
| `rating` uses wrong scale (e.g., NPS sent as 1–5 instead of 0–10, or CSAT as 0–100).                                     | Record is accepted, but NPS Promoter/Detractor classification is wrong. All NPS/CSAT Dashboard metrics become meaningless. Code only logs a warning. | Confirm the expected scale: NPS = 0–10, CSAT = 1–5. Document the scale used in your source system.                                                                               |
| Conversation uploaded without any messages (or messages uploaded without a parent conversation).                         | Conversation is invisible in Birdie. Messages are orphaned. Neither appears in Exploration.                                                          | Always upload both Conversation and Message records together. Validate completeness before sending.                                                                              |
| `language` field uses full words (`"portuguese"`, `"english"`) instead of BCP 47 codes.                                  | BCP 47 parsing fails. Falls back to auto-detection — NLP may use the wrong language model.                                                           | Use BCP 47 codes: `pt-BR`, `en`, `es`, `fr`, etc.                                                                                                                                |
| `additional_fields` contain nested objects (e.g., `{"address": {"city": "SP"}}`).                                        | ❌ Record is rejected. Only flat key-value pairs are accepted.                                                                                        | Flatten nested structures: `{"address_city": "SP"}`.                                                                                                                             |
| `author_type` is missing on all messages.                                                                                | All Conversation Features (50+ fields) = zero. QA Criteria produce false positives. QA Agent page is empty.                                          | Set `author_type` on every message. This is the single most impactful optional field.                                                                                            |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ask.birdie.ai/integrations-and-data-ingestion/data-schema-definitions-for-birdie-ingestion.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
