How to integrate Birdie + S3/Azure/GCS/

Mariana Carrero Rodrigues

Last Update 15일 전

Overview

With the S3 connector, Birdie can import data from multiple file formats from AWS S3 or a storage service that implements the S3 API such as Google Cloud Storage. Once a day the connector checks if there are new objects (files) and if so imports the records in those objects.

Requirements
  • Dedicated bucket for Birdie Integration

  • Birdie integration requires a service account with read-only access. Write access may be granted as optional to support teams during initial/manual dataset uploads.

AWS
  • See docs on creating a user and generating an access key for the user.

  • If you do not wish to provide a service account, see the docs to create a role and create an IAM Policy that allows Birdie (reach out and we'll provide the ID for the Birdie Account) to assume the role (see the docs)

  • Create an IAM Policy that gives the user/role S3 Read or S3 Read/Write Access to the specific bucket. See the docs on how to this.

Azure

To enable secure access to an Azure Blob Storage container, you can use either Shared Access Signature (SAS) or Shared Key Authentication.

SAS Tokens are the recommended approach because they provide limited, temporary access without exposing the account key.

  1. Shared Access Signature

    • To generate a SAS token for an Azure Blob Storage container using the Azure Portal, go to your storage account, select the container, and click on “Shared Access Signature.” Configure the token by selecting the permissions and setting the start and expiry time. Once configured, click “Generate SAS” and copy the SAS URL or token.
    • For more information about Shared Access Signature, see the docs

  2. Shared Key Authentication

    • To access an Azure Blob Storage container using Shared Key Authentication, you use the storage account name and account key, which provide permanent, full access to the storage account. By supplying the account name, account key, and container name to your application or client, you can perform any operation on the blobs and containers.
    • For more information about Shared Key Authentication, see the docs


Note: Shared Key Authentication provides full, permanent access to your Azure storage account. Anyone with the account key can read, modify, or delete all containers and blobs. For this reason, do not share your account key with untrusted parties, and use it only for internal systems or administrative tasks. Consider using SAS tokens for temporary, limited access whenever possible to reduce security risks.

GCP

To enable HMAC Access for a Google Cloud Storage (GCS) bucket, please follow the instructions provided in the Google Cloud documentation.

Recommended Steps:


  1. Create a New Service Account

    • Create a new Service Account with the necessary permissions to list buckets and read objects. Assign the "Storage Object Viewer" role to this Service Account.

  2. Create a New HMAC Key

    • Follow the instructions to create a new HMAC Key for the newly created Service Account. This key will allow secure access to your GCS bucket.

  3. Set Up IAM Policy

    • Bind an IAM policy that grants permissions to list objects only within the target bucket and, if necessary, a specific folder within that bucket. This ensures that access is restricted to the appropriate resources, enhancing security.

  • For more information on how this works, Read about the GCS XML API, which works with S3 compatible tools.

  • For more information about HMAC keys, Read about HMAC keys for GCS

  • IAM Policy example with read-write access:

Parameters
  • Region: The region, e.g "us-west-2" (AWS) or "us-central1" (GCP).

  • Bucket: The bucket name.

  • Prefix: A prefix for the object keys. We suggest organizing it based on the kind of data (e.g birdie/tickets, birdie/nps)

  • Format: The data format to use. Currently only supports parquet and csv.

  • Kind: The kind of data you're trying to import. This defines what schema Birdie expects when reading rows from your file. Supported values are:

    • review

    • nps

    • csat

    • support_ticket

    • social_media_post

    • issue

    • accounts

  • Credentials for S3

    • Access Key ID / HMAC Access ID

    • Secret Key ID / HMAC Secret

    • External ID (optional,AWS Specific)

    • Role ARN (optional, AWS specific)

    • The S3 endpoint to use. Only needed if not using AWS S3.

  • Start Date: Date to filter objects by (object modified at).

S3 Schemas

Each row of the file must fit within one of the following schemas. The schema must match the kind selected when configuring the parameters.


See the oficial PARQUET spec for more information on the supported types and logical types.

Feedbacks // Review
Column Name
TypeOptionalDescription
feedback_id
STRING
Required
Unique identifier for each review.
textSTRINGRequiredText posted by user
posted_atSTRINGRequiredWhen the feedback was posted (RFC 3339 timestamp)
author_idSTRINGOptionalIdentifier for the author of the the record.
account_idSTRINGOptionalIdentifier for the account the record belongs to.
languageSTRINGOptionalLanguage of the record as BCP 47 code.
titleSTRINGOptionalThe title of the feedback given by the author.
ratingFLOATRequiredA rating or score of the feedback.
categorySTRINGOptionalThe category the review belongs to.
ownerSTRINGOptionalOwner, Competitor
sourceSTRINGOptionalA user-customizable label for grouping feedbacks
Feedbacks // NPS and CSAT
Column NameTypeOptionalDescription
feedback_id
STRING
Required
Unique identifier for each answer.
textSTRINGOptionalText posted by user
posted_atSTRINGRequiredWhen the feedback was posted (RFC 3339 timestamp)
author_idSTRINGOptionalIdentifier for the author of the the record.
account_idSTRINGOptionalIdentifier for the account the record belongs to.
languageSTRINGOptionalLanguage of the record as BCP 47 code.
titleSTRINGOptionalThe title of the survey.
ratingFLOATRequiredA rating or score of the feedback.
sourceSTRINGOptionalA user-customizable label for grouping feedbacks
Conversations // Support tickets
Column Name
TypeOptionalDescription
conversation_id
STRING
Required
Unique identifier for each conversation.
message_idSTRINGRequiredUnique identifier for each message (Unique at the account level)
author_idSTRINGOptionalIdentifier for the author of the the message.
For author_type = agent, use a user-friendly string (email, login)
account_idSTRINGOptionalIdentifier for the account the message belongs to.
textSTRINGRequiredText of the message
posted_atSTRINGRequiredWhen the message was posted (RFC 3339 timestamp)
languageSTRINGOptionalLanguage of the message as BCP 47 code.
subjectSTRINGOptionalSubject of the ticket.
statusSTRINGOptionalStatus of the ticket, e.g open.
prioritySTRINGOptionalPriority assigned to the ticket.
channelSTRINGOptionalSource channel of the ticket, e.g web.
tagsREPEATED STRINGOptionalArray of tags applied to the ticket.
author_typeSTRINGOptionalBot, Agent, User
survey_titleSTRINGOptionalTitle of the survey that closes the ticket.
survey_typeSTRINGOptionalType of the survey that closes the ticket. One of: csat or nps
ratingFLOATOptionalRating that the client gave to the support ticket experience.
solvedSTRINGOptionalFlag that indicates if the ticket was solved. One of: true or false
sourceSTRINGOptionalA user-customizable label for grouping feedbacks
agent_teamSTRINGOptionalSupport agent's team name
agent_companySTRINGOptionalSupport agent's company name
agent_supervisor_idSTRINGOptionalSupport agent's supervisor identifier
agent_experience
STRINGOptionalSupport agent's maturity level (Enum)

Note 1: To ensure consistency, please upload only one row per conversation containing the survey response fields (such as survey_type, survey_title, rating, etc.). This message should be the final one for that ticket, reflecting the client’s closing thoughts on the service provided.


Note 2: To upload multiple messages per ticket, make sure the "ticket" fields (such as subject, status, priority, channel and tags) are consistent across all messages

Conversations // Complaints
Column Name
TypeOptionalDescription
conversation_id
STRING
Required
Unique identifier for each conversation.
message_idSTRINGRequiredUnique identifier for each message  (Unique at the account level)
author_idSTRINGOptionalIdentifier for the author of the the message.
account_idSTRINGOptionalIdentifier for the account the message belongs to.
textSTRINGRequiredText of the message
posted_atSTRINGRequiredWhen the message was posted (RFC 3339 timestamp)
languageSTRINGOptionalLanguage of the message as BCP 47 code.
categorySTRINGOptionalA classification for segmenting complaints. e.g Support, Shipping
statusSTRINGOptionalStatus of the complaint negotiations, e.g pending, initiated, ongoing and solved.
urlSTRINGOptionalURL for the complaint if from a public source.
ratingFLOATOptionalRating that the client gave to the complaint negotiation experience.
author_typeSTRINGOptionalInternal Person, User, Bot
sourceSTRINGOptionalA user-customizable label for grouping feedbacks
Conversation // Social Media Post
Column Name
TypeOptionalDescription
conversation_id
STRING
Required
Unique identifier for each conversation.
message_idSTRINGRequiredUnique identifier for each message (Unique at the account level)
author_idSTRINGOptionalIdentifier for the author of the the message.
account_idSTRINGOptionalIdentifier for the account the message belongs to.
textSTRINGRequiredText of the message
posted_atSTRINGRequiredWhen the message was posted (RFC 3339 timestamp)
languageSTRINGOptionalLanguage of the message as BCP 47 code.
titleSTRINGOptionalTitle of the post.
ownerSTRINGOptionalOwner, Competitor
categorySTRINGOptionalThe category the post was under, e.g a subreddit name.
urlSTRINGOptionalURL of the post.
channelSTRINGOptionalSource channel of the post, e.g facebook.
tagsREPEATED STRINGOptionalArray of tags applied to the post.
author_typeSTRINGOptionalInternal Person, User, Bot
upvotesINTEGEROptionalThe number of upvotes the message has.
sourceSTRINGOptionalA user-customizable label for grouping feedbacks
Conversation // Issue
Column Name
TypeOptionalDescription
conversation_id
STRING
Required
Unique identifier for each conversation.
message_idSTRINGRequiredUnique identifier for each message (Unique at the account level)
author_idSTRINGOptionalIdentifier for the author of the the message.
account_idSTRINGOptionalIdentifier for the account the message belongs to.
textSTRINGRequiredText of the message
posted_atSTRINGRequiredWhen the message was posted (RFC 3339 timestamp)
languageSTRINGOptionalLanguage of the message as BCP 47 code.
project_idSTRINGOptionalProject identifier
project_nameSTRINGOptionalProject Name
titleSTRINGOptionalIssue title
statusSTRINGOptionalIssue status
sourceSTRINGOptionalA user-customizable label for grouping feedbacks.
Accounts
Column Name
TypeOptionalDescription
account_id
STRING
Required
Unique identifier for the account.
Custom Fields

Any columns that don't fit under the previously listed schemas may become custom fields.


The name of the column in the Parquet Schema must be configured as the key/source of the custom field inside the Birdie App.

Was this article helpful?

1 out of 1 liked this article