# S3 / Azure / GCS

### Overview

With the S3 connector, Birdie can import data from multiple file formats from AWS S3 or a storage service that implements the S3 API such as Google Cloud Storage. Once a day the connector checks if there are new objects (files) and if so imports the records in those objects.

### Requirements

* Dedicated bucket for Birdie Integration.
* Birdie integration requires a service account with read-only access. Write access may be granted as optional to support teams during initial/manual dataset uploads.

### Setup in S3-compatible storage

#### AWS

* See docs on creating a [user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html) and generating an [access key](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) for the user.
* If you do not wish to provide a service account, see the docs to [create a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html#roles-creatingrole-user-console) and create an IAM Policy that allows Birdie (reach out and we'll provide the ID for the Birdie Account) to assume the role (see the [docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html#tutorial_cross-account-with-roles-2)).
* Create an IAM Policy that gives the user/role S3 Read or S3 Read/Write Access to the specific bucket. See the [docs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_rw-bucket-console.html) on how to this.

#### Azure <a href="#azure" id="azure"></a>

To enable secure access to an Azure Blob Storage container, the recommended approach is to use **Azure AD authentication with a Service Principal**.

This method provides secure, role-based access control (RBAC) without exposing storage account keys.

As a fallback, **Shared Key Authentication** can be used when Azure AD is not available.

{% hint style="info" %}
**Why Service Principal is Recommended**

Using a Service Principal allows secure, scoped access to a specific Storage Account via RBAC.

Instead of sharing account keys, you grant explicit permissions to an application identity.

No storage account keys are exposed, only role-based access control (RBAC) is granted, the credentials are revocable, secrets can be rotated.

This is the enterprise security best practice.
{% endhint %}

**Service Principal (Azure AD)**

**Step 1. Create a Storage Account**

* Go to Azure Portal;
* Navigate to [**Storage accounts**](https://portal.azure.com/#view/Microsoft_Azure_StorageHub/StorageHub.MenuView/~/StorageAccountsBrowse)**;**
* Click **Create;**
* Provide:
  * Subscription;
  * Resource Group;
  * Storage account name (lowercase, globally unique);
  * Region.
* Click **Review + Create.**

The Blob endpoint will be:

```
https://<storage-account-name>.blob.core.windows.net
```

**Step 2 — Create a Container**

* Open your Storage Account;
* Go to **Containers;**
* Click **+ Container;**
* Define the container name.

This will be used as the `bucket`.

**Step 3 — Create a Service Principal**

* Navigate to [**App registrations**](https://portal.azure.com/#view/Microsoft_AAD_RegisteredApps/ApplicationsListBlade)**;**
* Click **New registration;**
* Provide a name;
* Click **Register.**
* Copy **Application (client) ID;**
* Copy **Directory (tenant) ID.**

**Step 4 — Generate a Client Secret**

* Inside the App Registration;
* Go to **Certificates & secrets;**
* Click **New client secret;**
* Set expiration;
* Copy the **Secret VALUE** (*not* the Secret ID).

**Step 5 — Assign Storage Permissions**

* Go to your **Storage Account;**
* Open **Access Control (IAM);**
* Click **Add role assignment;**
* Select role: `Storage Blob Data Contributor`;
* Assign access to:`User, group, or service principal`;
* Select your App Registration;
* Save.

{% hint style="info" %}
**To configure the integration, you'll need to share with Birdie:** Directory (tenant) ID, Application (client) ID, Client Secret VALUE, Endpoint, Region and Container name.
{% endhint %}

**Shared Key Authentication (fallback)**

Shared Key Authentication provides full, **permanent** access to the entire storage account.

Anyone with the key can:

* Read all blobs
* Modify data
* Delete containers

Use this method **only** when Azure AD authentication is not possible.

In order to retrieve the Shared Key:

* Go to **Storage Account**
* Navigate to **Access keys**
* Copy:
  * Storage account name
  * Key1 or Key2

{% hint style="info" %}
**To configure the integration, you'll need to share with Birdie:** Storage account name, Access key, Endpoint and Container name.
{% endhint %}

#### GCP

To enable HMAC Access for a Google Cloud Storage (GCS) bucket, please follow the instructions provided in the [Google Cloud documentation](https://cloud.google.com/storage/docs/authentication/managing-hmackeys?hl=pt-br#create).

**Recommended Steps:**

1. Create a New Service Account
   * Create a new Service Account with the necessary permissions to list buckets and read objects. Assign the "Storage Object Viewer" role to this Service Account.
2. Create a New HMAC Key
   * Follow the [instructions](https://cloud.google.com/storage/docs/authentication/managing-hmackeys?hl=pt-br#create) to create a new HMAC Key for the newly created Service Account. This key will allow secure access to your GCS bucket.
3. Set Up IAM Policy
   * Bind an IAM policy that grants permissions to list objects only within the target bucket and, if necessary, a specific folder within that bucket. This ensures that access is restricted to the appropriate resources, enhancing security.

* For more information on how this works, Read about the [GCS XML API](https://cloud.google.com/storage/docs/interoperability?hl=pt-br#xml_api), which works with S3 compatible tools.
* For more information about HMAC keys, Read about [HMAC keys for GCS](https://cloud.google.com/storage/docs/authentication/hmackeys?hl=pt-br#overview).
* IAM Policy example with read-write access:

### Connect to Birdie

To configure the connector, provide Birdie with the following parameters and credentials.

#### Parameters

* Region: The region, e.g "us-west-2" (AWS) or "us-central1" (GCP).
* Bucket: The bucket name.
* Prefix: A prefix for the object keys. We suggest organizing it based on the kind of data (e.g birdie/tickets, birdie/nps).
* Format: The data format to use. Currently only supports parquet and csv.
* Kind: The kind of data you're trying to import. This defines what schema Birdie expects when reading rows from your file. Supported values are:
  * review
  * nps
  * csat
  * support\_ticket
  * social\_media\_post
  * issue
  * accounts
* Credentials for S3
  * Access Key ID / HMAC Access ID
  * Secret Key ID / HMAC Secret
  * External ID (optional, AWS specific)
  * Role ARN (optional, AWS specific)
  * The S3 endpoint to use. Only needed if not using AWS S3.
* Start Date: Date to filter objects by (object modified at).

### Data in scope

#### S3 Schemas

Each row of the file must fit within one of the following schemas. The schema must match the kind selected when configuring the parameters.

See the oficial PARQUET spec for more information on the [supported types](https://github.com/apache/parquet-format?tab=readme-ov-file#types) and [logical types](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md).

#### Feedbacks // Review

| feedback\_id | STRING | Required | Unique identifier for each review.                |
| ------------ | ------ | -------- | ------------------------------------------------- |
| text         | STRING | Required | Text posted by user                               |
| posted\_at   | STRING | Required | When the feedback was posted (RFC 3339 timestamp) |
| author\_id   | STRING | Optional | Identifier for the author of the the record.      |
| account\_id  | STRING | Optional | Identifier for the account the record belongs to. |
| language     | STRING | Optional | Language of the record as BCP 47 code.            |
| title        | STRING | Optional | The title of the feedback given by the author.    |
| rating       | FLOAT  | Required | A rating or score of the feedback.                |
| category     | STRING | Optional | The category the review belongs to.               |
| owner        | STRING | Optional | Owner, Competitor                                 |
| source       | STRING | Optional | A user-customizable label for grouping feedbacks  |

#### Feedbacks // NPS and CSAT

| Column Name             | Type              | Optional            | Description                                       |
| ----------------------- | ----------------- | ------------------- | ------------------------------------------------- |
| <p>feedback\_id<br></p> | <p>STRING<br></p> | <p>Required<br></p> | Unique identifier for each answer.                |
| text                    | STRING            | Optional            | Text posted by user                               |
| posted\_at              | STRING            | Required            | When the feedback was posted (RFC 3339 timestamp) |
| author\_id              | STRING            | Optional            | Identifier for the author of the the record.      |
| account\_id             | STRING            | Optional            | Identifier for the account the record belongs to. |
| language                | STRING            | Optional            | Language of the record as BCP 47 code.            |
| title                   | STRING            | Optional            | The title of the survey.                          |
| rating                  | FLOAT             | Required            | A rating or score of the feedback.                |
| source                  | STRING            | Optional            | A user-customizable label for grouping feedbacks  |

#### Conversations // Support tickets

| <p>conversation\_id<br></p>  | <p>STRING<br></p> | <p>Required<br></p> | Unique identifier for each conversation.                                                                                    |
| ---------------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------------------------------------------------- |
| message\_id                  | STRING            | Required            | Unique identifier for each message (Unique at the account level)                                                            |
| author\_id                   | STRING            | Optional            | <p>Identifier for the author of the the message.<br>For author\_type = agent, use a user-friendly string (email, login)</p> |
| account\_id                  | STRING            | Optional            | Identifier for the account the message belongs to.                                                                          |
| text                         | STRING            | Required            | Text of the message                                                                                                         |
| posted\_at                   | STRING            | Required            | When the message was posted (RFC 3339 timestamp)                                                                            |
| language                     | STRING            | Optional            | Language of the message as BCP 47 code.                                                                                     |
| subject                      | STRING            | Optional            | Subject of the ticket.                                                                                                      |
| status                       | STRING            | Optional            | Status of the ticket, e.g open.                                                                                             |
| priority                     | STRING            | Optional            | Priority assigned to the ticket.                                                                                            |
| channel                      | STRING            | Optional            | Source channel of the ticket, e.g web.                                                                                      |
| tags                         | REPEATED STRING   | Optional            | Array of tags applied to the ticket.                                                                                        |
| author\_type                 | STRING            | Optional            | Bot, Agent, User                                                                                                            |
| survey\_title                | STRING            | Optional            | Title of the survey that closes the ticket.                                                                                 |
| survey\_type                 | STRING            | Optional            | Type of the survey that closes the ticket. One of: csat or nps                                                              |
| rating                       | FLOAT             | Optional            | Rating that the client gave to the support ticket experience.                                                               |
| solved                       | STRING            | Optional            | Flag that indicates if the ticket was solved. One of: true or false                                                         |
| source                       | STRING            | Optional            | A user-customizable label for grouping feedbacks                                                                            |
| agent\_team                  | STRING            | Optional            | Support agent's team name                                                                                                   |
| agent\_company               | STRING            | Optional            | Support agent's company name                                                                                                |
| agent\_supervisor\_id        | STRING            | Optional            | Support agent's supervisor identifier                                                                                       |
| <p>agent\_experience<br></p> | STRING            | Optional            | Support agent's maturity level (Enum)                                                                                       |

{% hint style="info" %}
To ensure consistency, please upload only one row per conversation containing the survey response fields (such as survey\_type, survey\_title, rating, etc.). This message should be the final one for that ticket, reflecting the client’s closing thoughts on the service provided.
{% endhint %}

{% hint style="info" %}
To upload multiple messages per ticket, make sure the "ticket" fields (such as subject, status, priority, channel and tags) are consistent across all messages.
{% endhint %}

#### Conversations // Complaints

| <p>conversation\_id<br></p> | <p>STRING<br></p> | <p>Required<br></p> | Unique identifier for each conversation.                                          |
| --------------------------- | ----------------- | ------------------- | --------------------------------------------------------------------------------- |
| message\_id                 | STRING            | Required            | Unique identifier for each message (Unique at the account level)                  |
| author\_id                  | STRING            | Optional            | Identifier for the author of the the message.                                     |
| account\_id                 | STRING            | Optional            | Identifier for the account the message belongs to.                                |
| text                        | STRING            | Required            | Text of the message                                                               |
| posted\_at                  | STRING            | Required            | When the message was posted (RFC 3339 timestamp)                                  |
| language                    | STRING            | Optional            | Language of the message as BCP 47 code.                                           |
| category                    | STRING            | Optional            | A classification for segmenting complaints. e.g Support, Shipping                 |
| status                      | STRING            | Optional            | Status of the complaint negotiations, e.g pending, initiated, ongoing and solved. |
| url                         | STRING            | Optional            | URL for the complaint if from a public source.                                    |
| rating                      | FLOAT             | Optional            | Rating that the client gave to the complaint negotiation experience.              |
| author\_type                | STRING            | Optional            | Internal Person, User, Bot                                                        |
| source                      | STRING            | Optional            | A user-customizable label for grouping feedbacks                                  |

#### Conversation // Social Media Post

| <p>conversation\_id<br></p> | <p>STRING<br></p> | <p>Required<br></p> | Unique identifier for each conversation.                         |
| --------------------------- | ----------------- | ------------------- | ---------------------------------------------------------------- |
| message\_id                 | STRING            | Required            | Unique identifier for each message (Unique at the account level) |
| author\_id                  | STRING            | Optional            | Identifier for the author of the the message.                    |
| account\_id                 | STRING            | Optional            | Identifier for the account the message belongs to.               |
| text                        | STRING            | Required            | Text of the message                                              |
| posted\_at                  | STRING            | Required            | When the message was posted (RFC 3339 timestamp)                 |
| language                    | STRING            | Optional            | Language of the message as BCP 47 code.                          |
| title                       | STRING            | Optional            | Title of the post.                                               |
| owner                       | STRING            | Optional            | Owner, Competitor                                                |
| category                    | STRING            | Optional            | The category the post was under, e.g a subreddit name.           |
| url                         | STRING            | Optional            | URL of the post.                                                 |
| channel                     | STRING            | Optional            | Source channel of the post, e.g facebook.                        |
| tags                        | REPEATED STRING   | Optional            | Array of tags applied to the post.                               |
| author\_type                | STRING            | Optional            | Internal Person, User, Bot                                       |
| upvotes                     | INTEGER           | Optional            | The number of upvotes the message has.                           |
| source                      | STRING            | Optional            | A user-customizable label for grouping feedbacks                 |

#### Conversation // Issue

| conversation\_id | STRING | <p>Required<br></p> | Unique identifier for each conversation.                         |
| ---------------- | ------ | ------------------- | ---------------------------------------------------------------- |
| message\_id      | STRING | Required            | Unique identifier for each message (Unique at the account level) |
| author\_id       | STRING | Optional            | Identifier for the author of the the message.                    |
| account\_id      | STRING | Optional            | Identifier for the account the message belongs to.               |
| text             | STRING | Required            | Text of the message                                              |
| posted\_at       | STRING | Required            | When the message was posted (RFC 3339 timestamp)                 |
| language         | STRING | Optional            | Language of the message as BCP 47 code.                          |
| project\_id      | STRING | Optional            | Project identifier                                               |
| project\_name    | STRING | Optional            | Project Name                                                     |
| title            | STRING | Optional            | Issue title                                                      |
| status           | STRING | Optional            | Issue status                                                     |
| source           | STRING | Optional            | A user-customizable label for grouping feedbacks.                |

#### Accounts

| account\_id | STRING | Required | Unique identifier for the account. |
| ----------- | ------ | -------- | ---------------------------------- |

#### Custom Fields

Any columns that don't fit under the previously listed schemas may become custom fields.

The name of the column in the Parquet Schema must be configured as the key/source of the custom field inside the Birdie App.
