Data Schema Definitions for Birdie Ingestion

To ensure seamless data ingestion and high-quality insights, all data imported into Birdie—regardless of the source—must adhere to our standardized data schemas.

Whether you are connecting via Cloud Storage (AWS S3, Azure Blob Storage, or Google Cloud Storage) or Cloud Data Warehouses (BigQuery, Snowflake, or Databricks), your source data must be structured to match these definitions. While Birdie offers robust ingestion capabilities, this process typically requires your data or engineering team to transform and map your internal records to these schemas to ensure compatibility.

These schemas act as a blueprint, telling Birdie exactly how to interpret each record—whether it is a single customer review, a complex support thread, or a detailed user profile.


Requirements

  • Data Preparation: Your data team must ensure that source tables or files match the field names and types outlined in the tables below.

  • Storage/Database Access: Proper credentials (IAM roles, Service Accounts, or API keys) with read-access to the specific datasets.

  • File & Table Formats: Support for Parquet, CSV, or direct database table syncing.

  • Timestamp Precision: All date fields must follow the RFC 3339 format (e.g., 2023-10-25T14:30:00Z) for accurate chronological analysis.


2. Understanding Record Types

Birdie categorizes data into three primary record types. Choosing the correct "Kind" during setup is crucial for data processing:

  • Feedback Schemas: Used for single-point interactions. They typically contain one text field (comment), a numerical rating, and metadata.

    • Examples: App Store reviews, NPS surveys, CSAT responses.

  • Conversation Schemas: Used for multi-turn interactions. These group multiple messages or events under a single conversation_id.

    • Examples: Support tickets with several replies, social media threads, or ongoing complaint negotiations.

  • Account Schema: Used to define the "Who." These contain customer profiles, behavioral attributes, or subscription data used for advanced segmentation.


3. Data Schemas

Feedbacks // Review

Used for public or private reviews of products or services. Each row represents a single review entry.

Column Name

Type

Required

Description

feedback_id

STRING

Yes

Unique identifier for each review.

text

STRING

Yes

The actual text or comment posted by the user.

posted_at

STRING

Yes

When the feedback was posted (RFC 3339 timestamp format).

rating

FLOAT

Yes

A numerical rating or score associated with the feedback.

author_id

STRING

No

Unique identifier for the author of the record.

account_id

STRING

No

Unique identifier for the account the record belongs to.

language

STRING

No

Language of the record expressed as a BCP 47 code.

title

STRING

No

The title of the feedback as provided by the author.

category

STRING

No

The specific category the review belongs to.

owner

STRING

No

Indicates the entity owner, e.g., "Owner" or "Competitor".

source

STRING

No

A user-customizable label used for grouping feedbacks (e.g., "G2", "App Store").

Feedbacks // NPS and CSAT

Optimized for survey responses focusing on sentiment metrics. Unlike reviews, the text comment is often optional in these surveys.

Column Name

Type

Required

Description

feedback_id

STRING

Yes

Unique identifier for each survey answer.

posted_at

STRING

Yes

When the feedback was posted (RFC 3339 timestamp format).

rating

FLOAT

Yes

The numerical score (e.g., NPS 0-10 or CSAT 1-5).

text

STRING

No

Qualitative comment or open-ended text posted by the user.

author_id

STRING

No

Unique identifier for the author of the record.

account_id

STRING

No

Unique identifier for the account the record belongs to.

language

STRING

No

Language of the record expressed as a BCP 47 code.

title

STRING

No

The name or title of the survey.

source

STRING

No

A user-customizable label for grouping feedbacks.

Conversations // Support tickets

This schema is designed to capture the full lifecycle of a support interaction. Because support tickets usually consist of multiple replies, Birdie uses a "long format" where each row represents a single message, but all messages in the same thread share a common conversation_id.

Column Name

Type

Required

Description

conversation_id

STRING

Yes

Unique identifier for the entire conversation thread.

message_id

STRING

Yes

Unique identifier for each specific message (must be unique at the account level).

text

STRING

Yes

The actual content/text of the message.

posted_at

STRING

Yes

When the message was posted (RFC 3339 timestamp).

author_id

STRING

No

Identifier for the author. For author_type = agent, use a user-friendly string (e.g., email or login).

account_id

STRING

No

Identifier for the account the message belongs to.

language

STRING

No

Language of the message as a BCP 47 code.

subject

STRING

No

The subject line of the support ticket.

status

STRING

No

Current status of the ticket (e.g., "open", "pending", "closed").

priority

STRING

No

Priority assigned to the ticket (e.g., "urgent", "low").

channel

STRING

No

Source channel of the ticket (e.g., "web", "email", "chat").

tags

REPEATED STRING

No

An array/list of tags applied to the ticket.

author_type

STRING

No

The role of the author: Bot, Agent, or User.

survey_title

STRING

No

Title of the survey presented upon ticket closure.

survey_type

STRING

No

Type of the closing survey. Must be csat or nps.

rating

FLOAT

No

The numerical rating provided by the client for the support experience.

solved

STRING

No

Flag indicating if the ticket was resolved. One of: true or false.

source

STRING

No

A user-customizable label for grouping (e.g., "Zendesk", "Intercom").

agent_team

STRING

No

The name of the support agent's team.

agent_company

STRING

No

The name of the support agent's company.

agent_supervisor_id

STRING

No

Identifier for the support agent's supervisor.

agent_experience

STRING

No

The maturity or experience level of the agent (Enum).

Pro-Tip for Data Teams: > * To ensure data consistency, upload only one row per conversation containing the survey fields (survey_type, rating, etc.). This should ideally be the final message of the ticket.

  • For all other messages in the same thread, ensure the ticket-level fields (like subject, status, and priority) remain consistent across rows.

Conversations // Complaints

The Complaints schema is specialized for tracking and analyzing customer grievances. Similar to support tickets, it supports multiple interactions grouped by a single conversation ID to capture the negotiation or resolution process.

Column Name

Type

Required

Description

conversation_id

STRING

Yes

Unique identifier for the specific complaint thread.

message_id

STRING

Yes

Unique identifier for each message (must be unique at the account level).

text

STRING

Yes

The actual content/text of the complaint message.

posted_at

STRING

Yes

When the message was posted (RFC 3339 timestamp).

author_id

STRING

No

Unique identifier for the author of the message.

account_id

STRING

No

Identifier for the account the message belongs to.

language

STRING

No

Language of the message expressed as a BCP 47 code.

category

STRING

No

A classification for segmenting complaints (e.g., "Support", "Shipping", "Billing").

status

STRING

No

Current state of negotiations (e.g., pending, initiated, ongoing, or solved).

url

STRING

No

Direct link to the complaint if it originates from a public source.

rating

FLOAT

No

The score the client gave specifically to the complaint negotiation experience.

author_type

STRING

No

The role of the author: Internal Person, User, or Bot.

source

STRING

No

A user-customizable label for grouping (e.g., "Public Forum", "Direct Email").

Conversation // Social Media Post

Used for threads and interactions from social platforms like Facebook, X (Twitter), or Reddit.

Column Name

Type

Required

Description

conversation_id

STRING

Yes

Unique identifier for the post or thread.

message_id

STRING

Yes

Unique identifier for the specific post or comment.

text

STRING

Yes

Content of the message.

posted_at

STRING

Yes

When the message was posted (RFC 3339 timestamp).

author_id

STRING

No

Identifier for the author of the message.

account_id

STRING

No

Identifier for the account the message belongs to.

language

STRING

No

Language of the message as a BCP 47 code.

title

STRING

No

Title of the original social media post.

owner

STRING

No

Indicates the entity owner: Owner or Competitor.

category

STRING

No

Segment or sub-grouping (e.g., a specific Subreddit name).

url

STRING

No

URL of the social post.

channel

STRING

No

The source platform (e.g., "facebook", "reddit").

tags

REPEATED STRING

No

Array of tags applied to the post.

author_type

STRING

No

Internal Person, User, or Bot.

upvotes

INTEGER

No

The number of likes or upvotes the message has received.

source

STRING

No

A user-customizable label for grouping.

Conversation // Issue

Used for tracking bug reports, development tasks, or tickets from platforms like Jira or GitHub.

Column Name

Type

Required

Description

conversation_id

STRING

Yes

Unique identifier for the issue thread.

message_id

STRING

Yes

Unique identifier for each individual update or comment.

text

STRING

Yes

The content of the issue description or comment.

posted_at

STRING

Yes

When the record was created/posted (RFC 3339 timestamp).

author_id

STRING

No

Identifier for the author of the message.

account_id

STRING

No

Identifier for the account associated with the issue.

language

STRING

No

Language of the message as a BCP 47 code.

project_id

STRING

No

Unique identifier for the project or repository.

project_name

STRING

No

Human-readable name of the project.

title

STRING

No

The title or headline of the issue.

status

STRING

No

Current status (e.g., "To Do", "In Progress", "Done").

source

STRING

No

A user-customizable label for grouping.

Accounts

The foundation for customer profile data and behavior-based segmentation.

Column Name

Type

Required

Description

account_id

STRING

Yes

Unique identifier for the account.


4. Custom Fields

Birdie allows for unlimited and flexible custom fields. If your source data contains information not covered by the standard schemas above (e.g., "Plan Type," "User Region," or "Churn Risk"), you can ingest them by providing the following mapping:

  • Source/Destination Mapping: Link your source column to a Birdie field.

  • Data Type: Set as String, Bool, Number, Date, Datetime, Enum, or Unique.

  • Friendly Label: The name displayed to your team in the Birdie UI.

  • List Support: A toggle to indicate if the field should accept a single value or a list of values.

Last updated