# Data Schema Definitions for Birdie Ingestion

To ensure seamless data ingestion and high-quality insights, all data imported into Birdie—regardless of the source—must adhere to our standardized data schemas.

Whether you are connecting via Cloud Storage (AWS S3, Azure Blob Storage, or Google Cloud Storage) or Cloud Data Warehouses (BigQuery, Snowflake, or Databricks), your source data must be structured to match these definitions. While Birdie offers robust ingestion capabilities, this process typically requires your data or engineering team to transform and map your internal records to these schemas to ensure compatibility.

These schemas act as a blueprint, telling Birdie exactly how to interpret each record—whether it is a single customer review, a complex support thread, or a detailed user profile.

***

#### Requirements

* Data Preparation: Your data team must ensure that source tables or files match the field names and types outlined in the tables below.
* Storage/Database Access: Proper credentials (IAM roles, Service Accounts, or API keys) with read-access to the specific datasets.
* File & Table Formats: Support for Parquet, CSV, or direct database table syncing.
* Timestamp Precision: All date fields must follow the RFC 3339 format (e.g., `2023-10-25T14:30:00Z`) for accurate chronological analysis.

***

#### 2. Understanding Record Types

Birdie categorizes data into three primary record types. Choosing the correct "Kind" during setup is crucial for data processing:

* Feedback Schemas: Used for single-point interactions. They typically contain one text field (comment), a numerical rating, and metadata.
  * *Examples:* App Store reviews, NPS surveys, CSAT responses.
* Conversation Schemas: Used for multi-turn interactions. These group multiple messages or events under a single `conversation_id`.
  * *Examples:* Support tickets with several replies, social media threads, or ongoing complaint negotiations.
* Account Schema: Used to define the "Who." These contain customer profiles, behavioral attributes, or subscription data used for advanced segmentation.

***

#### 3. Data Schemas

#### Feedbacks // Review

Used for public or private reviews of products or services. Each row represents a single review entry.

| **Column Name** | **Type** | **Required** | **Description**                                                                  |
| --------------- | -------- | ------------ | -------------------------------------------------------------------------------- |
| `feedback_id`   | STRING   | Yes          | Unique identifier for each review.                                               |
| `text`          | STRING   | Yes          | The actual text or comment posted by the user.                                   |
| `posted_at`     | STRING   | Yes          | When the feedback was posted (RFC 3339 timestamp format).                        |
| `rating`        | FLOAT    | Yes          | A numerical rating or score associated with the feedback.                        |
| `author_id`     | STRING   | No           | Unique identifier for the author of the record.                                  |
| `account_id`    | STRING   | No           | Unique identifier for the account the record belongs to.                         |
| `language`      | STRING   | No           | Language of the record expressed as a BCP 47 code.                               |
| `title`         | STRING   | No           | The title of the feedback as provided by the author.                             |
| `category`      | STRING   | No           | The specific category the review belongs to.                                     |
| `owner`         | STRING   | No           | Indicates the entity owner, e.g., "Owner" or "Competitor".                       |
| `source`        | STRING   | No           | A user-customizable label used for grouping feedbacks (e.g., "G2", "App Store"). |

#### Feedbacks // NPS and CSAT

Optimized for survey responses focusing on sentiment metrics. Unlike reviews, the text comment is often optional in these surveys.

| **Column Name** | **Type** | **Required** | **Description**                                            |
| --------------- | -------- | ------------ | ---------------------------------------------------------- |
| `feedback_id`   | STRING   | Yes          | Unique identifier for each survey answer.                  |
| `posted_at`     | STRING   | Yes          | When the feedback was posted (RFC 3339 timestamp format).  |
| `rating`        | FLOAT    | Yes          | The numerical score (e.g., NPS 0-10 or CSAT 1-5).          |
| `text`          | STRING   | No           | Qualitative comment or open-ended text posted by the user. |
| `author_id`     | STRING   | No           | Unique identifier for the author of the record.            |
| `account_id`    | STRING   | Yes          | Unique identifier for the account the record belongs to.   |
| `language`      | STRING   | No           | Language of the record expressed as a BCP 47 code.         |
| `title`         | STRING   | No           | The name or title of the survey.                           |
| `source`        | STRING   | No           | A user-customizable label for grouping feedbacks.          |

#### Conversations // Support tickets

This schema is designed to capture the full lifecycle of a support interaction. Because support tickets usually consist of multiple replies, Birdie uses a "long format" where each row represents a single message, but all messages in the same thread share a common `conversation_id`.

| **Column Name**       | **Type**        | **Required** | **Description**                                                                                          |
| --------------------- | --------------- | ------------ | -------------------------------------------------------------------------------------------------------- |
| `conversation_id`     | STRING          | Yes          | Unique identifier for the entire conversation thread.                                                    |
| `message_id`          | STRING          | Yes          | Unique identifier for each specific message (must be unique at the account level).                       |
| `text`                | STRING          | Yes          | The actual content/text of the message.                                                                  |
| `posted_at`           | STRING          | Yes          | When the message was posted (RFC 3339 timestamp).                                                        |
| `author_id`           | STRING          | Yes          | Identifier for the author. For `author_type = agent`, use a user-friendly string (e.g., email or login). |
| `account_id`          | STRING          | Yes          | Identifier for the account the message belongs to.                                                       |
| `language`            | STRING          | No           | Language of the message as a BCP 47 code.                                                                |
| `subject`             | STRING          | No           | The subject line of the support ticket.                                                                  |
| `status`              | STRING          | No           | Current status of the ticket (e.g., "open", "pending", "closed").                                        |
| `priority`            | STRING          | No           | Priority assigned to the ticket (e.g., "urgent", "low").                                                 |
| `channel`             | STRING          | No           | Source channel of the ticket (e.g., "web", "email", "chat").                                             |
| `tags`                | REPEATED STRING | No           | An array/list of tags applied to the ticket.                                                             |
| `author_type`         | STRING          | Yes          | The role of the author: Bot, Agent, or User.                                                             |
| `survey_title`        | STRING          | No           | Title of the survey presented upon ticket closure.                                                       |
| `survey_type`         | STRING          | Yes          | Type of the closing survey. Must be csat or nps.                                                         |
| `rating`              | FLOAT           | Yes          | The numerical rating provided by the client for the support experience.                                  |
| `solved`              | STRING          | Yes          | Flag indicating if the ticket was resolved. One of: true or false.                                       |
| `source`              | STRING          | Yes          | A user-customizable label for grouping (e.g., "Zendesk", "Intercom").                                    |
| `agent_team`          | STRING          | Yes          | The name of the support agent's team.                                                                    |
| `agent_company`       | STRING          | Yes          | The name of the support agent's company.                                                                 |
| `agent_supervisor_id` | STRING          | Yes          | Identifier for the support agent's supervisor.                                                           |
| `agent_experience`    | STRING          | Yes          | The maturity or experience level of the agent (Enum).                                                    |

Pro-Tip for Data Teams: > \* To ensure data consistency, upload only one row per conversation containing the survey fields (`survey_type`, `rating`, etc.). This should ideally be the final message of the ticket.

* For all other messages in the same thread, ensure the ticket-level fields (like `subject`, `status`, and `priority`) remain consistent across rows.

#### Conversations // Complaints

The Complaints schema is specialized for tracking and analyzing customer grievances. Similar to support tickets, it supports multiple interactions grouped by a single conversation ID to capture the negotiation or resolution process.

| **Column Name**   | **Type** | **Required** | **Description**                                                                      |
| ----------------- | -------- | ------------ | ------------------------------------------------------------------------------------ |
| `conversation_id` | STRING   | Yes          | Unique identifier for the specific complaint thread.                                 |
| `message_id`      | STRING   | Yes          | Unique identifier for each message (must be unique at the account level).            |
| `text`            | STRING   | Yes          | The actual content/text of the complaint message.                                    |
| `posted_at`       | STRING   | Yes          | When the message was posted (RFC 3339 timestamp).                                    |
| `author_id`       | STRING   | No           | Unique identifier for the author of the message.                                     |
| `account_id`      | STRING   | Yes          | Identifier for the account the message belongs to.                                   |
| `language`        | STRING   | No           | Language of the message expressed as a BCP 47 code.                                  |
| `category`        | STRING   | No           | A classification for segmenting complaints (e.g., "Support", "Shipping", "Billing"). |
| `status`          | STRING   | No           | Current state of negotiations (e.g., pending, initiated, ongoing, or solved).        |
| `url`             | STRING   | No           | Direct link to the complaint if it originates from a public source.                  |
| `rating`          | FLOAT    | Yes          | The score the client gave specifically to the complaint negotiation experience.      |
| `author_type`     | STRING   | Yes          | The role of the author: Internal Person, User, or Bot.                               |
| `source`          | STRING   | Yes          | A user-customizable label for grouping (e.g., "Public Forum", "Direct Email").       |

#### Conversation // Social Media Post

Used for threads and interactions from social platforms like Facebook, X (Twitter), or Reddit.

| **Column Name**   | **Type**        | **Required** | **Description**                                            |
| ----------------- | --------------- | ------------ | ---------------------------------------------------------- |
| `conversation_id` | STRING          | Yes          | Unique identifier for the post or thread.                  |
| `message_id`      | STRING          | Yes          | Unique identifier for the specific post or comment.        |
| `text`            | STRING          | Yes          | Content of the message.                                    |
| `posted_at`       | STRING          | Yes          | When the message was posted (RFC 3339 timestamp).          |
| `author_id`       | STRING          | No           | Identifier for the author of the message.                  |
| `account_id`      | STRING          | No           | Identifier for the account the message belongs to.         |
| `language`        | STRING          | No           | Language of the message as a BCP 47 code.                  |
| `title`           | STRING          | No           | Title of the original social media post.                   |
| `owner`           | STRING          | No           | Indicates the entity owner: Owner or Competitor.           |
| `category`        | STRING          | No           | Segment or sub-grouping (e.g., a specific Subreddit name). |
| `url`             | STRING          | No           | URL of the social post.                                    |
| `channel`         | STRING          | No           | The source platform (e.g., "facebook", "reddit").          |
| `tags`            | REPEATED STRING | No           | Array of tags applied to the post.                         |
| `author_type`     | STRING          | No           | Internal Person, User, or Bot.                             |
| `upvotes`         | INTEGER         | No           | The number of likes or upvotes the message has received.   |
| `source`          | STRING          | Yes          | A user-customizable label for grouping.                    |

#### Conversation // Issue

Used for tracking bug reports, development tasks, or tickets from platforms like Jira or GitHub.

| **Column Name**   | **Type** | **Required** | **Description**                                          |
| ----------------- | -------- | ------------ | -------------------------------------------------------- |
| `conversation_id` | STRING   | Yes          | Unique identifier for the issue thread.                  |
| `message_id`      | STRING   | Yes          | Unique identifier for each individual update or comment. |
| `text`            | STRING   | Yes          | The content of the issue description or comment.         |
| `posted_at`       | STRING   | Yes          | When the record was created/posted (RFC 3339 timestamp). |
| `author_id`       | STRING   | No           | Identifier for the author of the message.                |
| `account_id`      | STRING   | No           | Identifier for the account associated with the issue.    |
| `language`        | STRING   | No           | Language of the message as a BCP 47 code.                |
| `project_id`      | STRING   | No           | Unique identifier for the project or repository.         |
| `project_name`    | STRING   | No           | Human-readable name of the project.                      |
| `title`           | STRING   | No           | The title or headline of the issue.                      |
| `status`          | STRING   | No           | Current status (e.g., "To Do", "In Progress", "Done").   |
| `source`          | STRING   | Yes          | A user-customizable label for grouping.                  |

#### Accounts

The foundation for customer profile data and behavior-based segmentation.

| **Column Name** | **Type** | **Required** | **Description**                    |
| --------------- | -------- | ------------ | ---------------------------------- |
| `account_id`    | STRING   | Yes          | Unique identifier for the account. |

***

#### 4. Custom Fields

Birdie allows for unlimited and flexible custom fields. If your source data contains information not covered by the standard schemas above (e.g., "Plan Type," "User Region," or "Churn Risk"), you can ingest them by providing the following mapping:

* Source/Destination Mapping: Link your source column to a Birdie field.
* Data Type: Set as `String`, `Bool`, `Number`, `Date`, `Datetime`, `Enum`, or `Unique`.
* Friendly Label: The name displayed to your team in the Birdie UI.
* List Support: A toggle to indicate if the field should accept a single value or a list of values.
