Data Schema Definitions for Birdie Ingestion
To ensure seamless data ingestion and high-quality insights, all data imported into Birdie—regardless of the source—must adhere to our standardized data schemas.
Whether you are connecting via Cloud Storage (AWS S3, Azure Blob Storage, or Google Cloud Storage) or Cloud Data Warehouses (BigQuery, Snowflake, or Databricks), your source data must be structured to match these definitions. While Birdie offers robust ingestion capabilities, this process typically requires your data or engineering team to transform and map your internal records to these schemas to ensure compatibility.
These schemas act as a blueprint, telling Birdie exactly how to interpret each record—whether it is a single customer review, a complex support thread, or a detailed user profile.
Requirements
Data Preparation: Your data team must ensure that source tables or files match the field names and types outlined in the tables below.
Storage/Database Access: Proper credentials (IAM roles, Service Accounts, or API keys) with read-access to the specific datasets.
File & Table Formats: Support for Parquet, CSV, or direct database table syncing.
Timestamp Precision: All date fields must follow the RFC 3339 format (e.g.,
2023-10-25T14:30:00Z) for accurate chronological analysis.
2. Understanding Record Types
Birdie categorizes data into three primary record types. Choosing the correct "Kind" during setup is crucial for data processing:
Feedback Schemas: Used for single-point interactions. They typically contain one text field (comment), a numerical rating, and metadata.
Examples: App Store reviews, NPS surveys, CSAT responses.
Conversation Schemas: Used for multi-turn interactions. These group multiple messages or events under a single
conversation_id.Examples: Support tickets with several replies, social media threads, or ongoing complaint negotiations.
Account Schema: Used to define the "Who." These contain customer profiles, behavioral attributes, or subscription data used for advanced segmentation.
3. Data Schemas
Feedbacks // Review
Used for public or private reviews of products or services. Each row represents a single review entry.
Column Name
Type
Required
Description
feedback_id
STRING
Yes
Unique identifier for each review.
text
STRING
Yes
The actual text or comment posted by the user.
posted_at
STRING
Yes
When the feedback was posted (RFC 3339 timestamp format).
rating
FLOAT
Yes
A numerical rating or score associated with the feedback.
author_id
STRING
No
Unique identifier for the author of the record.
account_id
STRING
No
Unique identifier for the account the record belongs to.
language
STRING
No
Language of the record expressed as a BCP 47 code.
title
STRING
No
The title of the feedback as provided by the author.
category
STRING
No
The specific category the review belongs to.
owner
STRING
No
Indicates the entity owner, e.g., "Owner" or "Competitor".
source
STRING
No
A user-customizable label used for grouping feedbacks (e.g., "G2", "App Store").
Feedbacks // NPS and CSAT
Optimized for survey responses focusing on sentiment metrics. Unlike reviews, the text comment is often optional in these surveys.
Column Name
Type
Required
Description
feedback_id
STRING
Yes
Unique identifier for each survey answer.
posted_at
STRING
Yes
When the feedback was posted (RFC 3339 timestamp format).
rating
FLOAT
Yes
The numerical score (e.g., NPS 0-10 or CSAT 1-5).
text
STRING
No
Qualitative comment or open-ended text posted by the user.
author_id
STRING
No
Unique identifier for the author of the record.
account_id
STRING
No
Unique identifier for the account the record belongs to.
language
STRING
No
Language of the record expressed as a BCP 47 code.
title
STRING
No
The name or title of the survey.
source
STRING
No
A user-customizable label for grouping feedbacks.
Conversations // Support tickets
This schema is designed to capture the full lifecycle of a support interaction. Because support tickets usually consist of multiple replies, Birdie uses a "long format" where each row represents a single message, but all messages in the same thread share a common conversation_id.
Column Name
Type
Required
Description
conversation_id
STRING
Yes
Unique identifier for the entire conversation thread.
message_id
STRING
Yes
Unique identifier for each specific message (must be unique at the account level).
text
STRING
Yes
The actual content/text of the message.
posted_at
STRING
Yes
When the message was posted (RFC 3339 timestamp).
author_id
STRING
No
Identifier for the author. For author_type = agent, use a user-friendly string (e.g., email or login).
account_id
STRING
No
Identifier for the account the message belongs to.
language
STRING
No
Language of the message as a BCP 47 code.
subject
STRING
No
The subject line of the support ticket.
status
STRING
No
Current status of the ticket (e.g., "open", "pending", "closed").
priority
STRING
No
Priority assigned to the ticket (e.g., "urgent", "low").
channel
STRING
No
Source channel of the ticket (e.g., "web", "email", "chat").
tags
REPEATED STRING
No
An array/list of tags applied to the ticket.
author_type
STRING
No
The role of the author: Bot, Agent, or User.
survey_title
STRING
No
Title of the survey presented upon ticket closure.
survey_type
STRING
No
Type of the closing survey. Must be csat or nps.
rating
FLOAT
No
The numerical rating provided by the client for the support experience.
solved
STRING
No
Flag indicating if the ticket was resolved. One of: true or false.
source
STRING
No
A user-customizable label for grouping (e.g., "Zendesk", "Intercom").
agent_team
STRING
No
The name of the support agent's team.
agent_company
STRING
No
The name of the support agent's company.
agent_supervisor_id
STRING
No
Identifier for the support agent's supervisor.
agent_experience
STRING
No
The maturity or experience level of the agent (Enum).
Pro-Tip for Data Teams: > * To ensure data consistency, upload only one row per conversation containing the survey fields (survey_type, rating, etc.). This should ideally be the final message of the ticket.
For all other messages in the same thread, ensure the ticket-level fields (like
subject,status, andpriority) remain consistent across rows.
Conversations // Complaints
The Complaints schema is specialized for tracking and analyzing customer grievances. Similar to support tickets, it supports multiple interactions grouped by a single conversation ID to capture the negotiation or resolution process.
Column Name
Type
Required
Description
conversation_id
STRING
Yes
Unique identifier for the specific complaint thread.
message_id
STRING
Yes
Unique identifier for each message (must be unique at the account level).
text
STRING
Yes
The actual content/text of the complaint message.
posted_at
STRING
Yes
When the message was posted (RFC 3339 timestamp).
author_id
STRING
No
Unique identifier for the author of the message.
account_id
STRING
No
Identifier for the account the message belongs to.
language
STRING
No
Language of the message expressed as a BCP 47 code.
category
STRING
No
A classification for segmenting complaints (e.g., "Support", "Shipping", "Billing").
status
STRING
No
Current state of negotiations (e.g., pending, initiated, ongoing, or solved).
url
STRING
No
Direct link to the complaint if it originates from a public source.
rating
FLOAT
No
The score the client gave specifically to the complaint negotiation experience.
author_type
STRING
No
The role of the author: Internal Person, User, or Bot.
source
STRING
No
A user-customizable label for grouping (e.g., "Public Forum", "Direct Email").
Conversation // Social Media Post
Used for threads and interactions from social platforms like Facebook, X (Twitter), or Reddit.
Column Name
Type
Required
Description
conversation_id
STRING
Yes
Unique identifier for the post or thread.
message_id
STRING
Yes
Unique identifier for the specific post or comment.
text
STRING
Yes
Content of the message.
posted_at
STRING
Yes
When the message was posted (RFC 3339 timestamp).
author_id
STRING
No
Identifier for the author of the message.
account_id
STRING
No
Identifier for the account the message belongs to.
language
STRING
No
Language of the message as a BCP 47 code.
title
STRING
No
Title of the original social media post.
owner
STRING
No
Indicates the entity owner: Owner or Competitor.
category
STRING
No
Segment or sub-grouping (e.g., a specific Subreddit name).
url
STRING
No
URL of the social post.
channel
STRING
No
The source platform (e.g., "facebook", "reddit").
tags
REPEATED STRING
No
Array of tags applied to the post.
author_type
STRING
No
Internal Person, User, or Bot.
upvotes
INTEGER
No
The number of likes or upvotes the message has received.
source
STRING
No
A user-customizable label for grouping.
Conversation // Issue
Used for tracking bug reports, development tasks, or tickets from platforms like Jira or GitHub.
Column Name
Type
Required
Description
conversation_id
STRING
Yes
Unique identifier for the issue thread.
message_id
STRING
Yes
Unique identifier for each individual update or comment.
text
STRING
Yes
The content of the issue description or comment.
posted_at
STRING
Yes
When the record was created/posted (RFC 3339 timestamp).
author_id
STRING
No
Identifier for the author of the message.
account_id
STRING
No
Identifier for the account associated with the issue.
language
STRING
No
Language of the message as a BCP 47 code.
project_id
STRING
No
Unique identifier for the project or repository.
project_name
STRING
No
Human-readable name of the project.
title
STRING
No
The title or headline of the issue.
status
STRING
No
Current status (e.g., "To Do", "In Progress", "Done").
source
STRING
No
A user-customizable label for grouping.
Accounts
The foundation for customer profile data and behavior-based segmentation.
Column Name
Type
Required
Description
account_id
STRING
Yes
Unique identifier for the account.
4. Custom Fields
Birdie allows for unlimited and flexible custom fields. If your source data contains information not covered by the standard schemas above (e.g., "Plan Type," "User Region," or "Churn Risk"), you can ingest them by providing the following mapping:
Source/Destination Mapping: Link your source column to a Birdie field.
Data Type: Set as
String,Bool,Number,Date,Datetime,Enum, orUnique.Friendly Label: The name displayed to your team in the Birdie UI.
List Support: A toggle to indicate if the field should accept a single value or a list of values.
Last updated