# Data Anonymizer - User Guide

This guide provides comprehensive instructions on how to install, configure, and run the Interactive Birdie Internal Anonymizer script. This tool leverages Large Language Models (LLMs) to identify and mask sensitive information in your datasets.

***

### Download Birdie Internal Anonymizer script

<a href="https://drive.google.com/uc?export=download&#x26;id=1NZXdlIfEnqRWv_zSEec9u1M0hKVYLRQa" class="button primary">Download script</a>

***

### Prerequisites

Before you begin, ensure you have the following:

* Python installed (version 3.8 or higher recommended).
* API credentials for your chosen LLM provider (OpenAI, Anthropic, Google Gemini, or Azure).

***

### Installation and Setup

To ensure a clean environment and avoid dependency conflicts, follow these steps to set up the project.

#### 1. Project Initialization

Extract the project files to a directory of your choice on your local machine.

#### 2. Create a Virtual Environment (Recommended)

Open your terminal or command prompt and run the following:

Create the environment:

`python -m venv .venv`

Activate the environment:

* Windows: `.venv\Scripts\activate`
* Linux/MacOS: `source .venv/bin/activate`

#### 3. Install Dependencies

Install the required Python libraries using the provided requirements file:

`pip install -r requirements.txt`

#### 4. Configure API Keys

You must set your API key as an environment variable so the script can communicate with the LLM.

* OpenAI: `export OPENAI_API_KEY='your-key-here'`
* Anthropic: `export ANTHROPIC_API_KEY='your-key-here'`
* Google Gemini: `export GOOGLE_API_KEY='your-key-here'`

*(Note: On Windows, use `set` instead of `export`.)*

***

### Running the Anonymizer

The tool provides an interactive, step-by-step CLI (Command Line Interface) to guide you through the process.

#### Choosing the Right Script

Depending on your file format, run one of the two commands below:

* For JSON Files: `python run_anonymizer.py`

  *(Note: This converts JSON to CSV before processing)*
* For CSV Files: `python run_anonymizer_csv.py`

#### The 9-Step Interactive Process

Once the script is running, follow these prompts:

1. Input File Selection: Provide the path to your source file.
2. Column Analysis: The tool scans your data structure.
3. Column Selection: Choose specifically which columns contain PII (Personally Identifiable Information).
4. Output File Selection: Define where the cleaned file should be saved.
5. LLM Provider Selection: Choose from OpenAI, Azure, Anthropic, Google, or a Local LLM.
6. Model Selection: Enter the specific model name (e.g., `gpt-4o` or `claude-3-5-sonnet`).
7. Output Language: Select the language for the anonymized text (Default: English).
8. Processing Data: The tool sends data to the LLM for masking.
9. Sample Results: Review a preview of the anonymized data before finishing.

***

### Supported LLM Providers

| **Provider**  | **Default Model**            | **Best For**                         |
| ------------- | ---------------------------- | ------------------------------------ |
| OpenAI        | `gpt-4o-mini`                | General purpose & speed              |
| Anthropic     | `claude-3-5-sonnet-20241022` | High-accuracy reasoning              |
| Google Gemini | `gemini-1.5-pro`             | Large context windows                |
| Local LLM     | User Defined                 | Privacy-sensitive, offline workflows |

***

### Frequently Asked Questions

Can I use a local model for privacy?

Yes. If you select Local LLM (Option 5), you can connect to any OpenAI-compatible API (like Ollama or LocalAI) to keep data processing on your own infrastructure.

What happens to my JSON structure?

The tool currently flattens JSON data into a CSV format during the `run_anonymizer.py` workflow to ensure consistent processing across the LLM providers.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ask.birdie.ai/admin-and-settings/security/data-anonymizer-user-guide.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
