> For the complete documentation index, see [llms.txt](https://ask.birdie.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ask.birdie.ai/agent-quality-assurance/manual-evaluation.md).

# Manual Evaluation

### Overview

Manual Evaluation allows teams to perform **human-led quality assessments of customer interactions directly inside Birdie**, for criteria that are too complex, contextual, or process-specific to be reliably evaluated by AI.

The feature centralizes all manual evaluation workflows, ensuring **governance, traceability, and consistency** while integrating human judgment seamlessly with AI-driven metrics. All manual evaluations **start at the Area level**, providing a standardized and scalable entry point that represents the full operational scope.

Manual Evaluation helps Birdie remain the **single source of truth** for quality monitoring by unifying configuration, execution, revision, and analytics in one place.

***

### Key Concepts

Before using Manual Evaluation, it is helpful to understand how it is structured:

| Concept              | Description                                                                                                                        |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| **Manual Criterion** | An evaluation rule flagged as "Manual" that requires human review instead of AI evaluation                                         |
| **Evaluation**       | A completed human assessment of a single interaction, answering all criteria linked to a Reason                                    |
| **Version**          | Each evaluation has versioned records — a **Review** (initial) and optionally a **Revision** (second pass)                         |
| **Revision**         | A second round of evaluation performed on the same interaction, allowing a different evaluator to validate the original assessment |
| **Quality Score**    | The percentage of criteria answered **Yes** in an evaluation, weighted by criterion importance                                     |

***

### How manual criteria are configured

Birdie only includes criteria in a manual evaluation form when they meet all of these conditions:

* The criterion is marked **Manual**
* The criterion is linked to a **Reason**
* That Reason is available in the selected **Area**

Manage this setup in [Taxonomy → Criteria](/admin-and-settings/taxonomy/criteria.md). For the QA model behind it, see [Criteria](/agent-quality-assurance/criteria.md).

***

### How Feedback Is Selected

When starting a new manual evaluation, Birdie automatically selects an interaction for you. This is done through a **weighted random selection** algorithm:

1. Birdie counts how many interactions are available for each Reason, within a configurable lookback period
2. Reasons with more available interactions have a higher probability of being selected, ensuring representative distribution
3. Birdie then picks one interaction that has not yet been evaluated for the selected Reason
4. The interaction is **temporarily locked** to prevent two evaluators from assessing the same interaction simultaneously

This ensures evaluations are spread proportionally across all Reasons and that no interaction is double-evaluated.

***

### Performing a Manual Evaluation

#### Prerequisites

* Manual criteria must already be configured by an Admin for the Reasons in the Area
* You need at least **Viewer** permissions

#### Step-by-Step

{% stepper %}
{% step %}

#### Navigate to an Area

Go to the desired **Area** in Birdie and click **Start Manual Evaluation**.
{% endstep %}

{% step %}

#### Select Reasons

A dialog appears showing all Reasons that have manual criteria configured. Select one or more Reasons to include and confirm to start.

> Only Reasons with at least one manual criterion are shown.
> {% endstep %}

{% step %}

#### Review the Interaction

Birdie automatically opens an interaction for evaluation. The evaluation screen shows:

* **Interaction details** — the full conversation transcript and available context fields (e.g. agent name, company, ticket ID)
* **Criteria form** — each manual criterion linked to the selected Reason, displayed with its name and description
  {% endstep %}

{% step %}

#### Answer Each Criterion

For each criterion, select one of three answers:

| Answer          | Meaning                                              |
| --------------- | ---------------------------------------------------- |
| **Yes**         | The agent fulfilled the criterion                    |
| **No**          | The agent failed to fulfill the criterion            |
| **Not Applied** | The criterion was not applicable to this interaction |

You can optionally add:

* An **observation** per criterion (a free-text note explaining your answer)
* A **checklist** response (if the criterion has a manual checklist configured) — available only when the answer is **Yes**

> Selecting **No** or **Not Applied** automatically clears any observations or checklist items previously entered for that criterion.
> {% endstep %}

{% step %}

#### Submit the Evaluation

When all criteria are answered, click **Submit**. A confirmation dialog appears where you can add an **overall comment** before finalizing.

After submission, the evaluation is automatically linked to the Area, Reason, agent, and interaction, and becomes available in dashboards and reports.
{% endstep %}
{% endstepper %}

***

### Revising an Evaluation

A **Revision** allows a second evaluator (or the same evaluator) to perform a new pass on an already-submitted evaluation. This is useful for calibration, quality control, or supervisor review.

#### How to Create a Revision

1. Navigate to **Monitor Evaluations** within the Area
2. Find the evaluation you want to revise in the table
3. Click the **Edit** action
4. A new version of the evaluation opens with the original answers pre-filled
5. Adjust answers as needed and submit

#### Revision Rules

* Each evaluation supports **one revision only** — once a revision exists, the evaluation is locked from further edits
* The revision always records **which user** created it and when
* Both the original review and the revision are stored and visible in the evaluation history

***

### Monitoring Evaluations

Once evaluations are submitted, you can view, filter, and manage them from the **Manual Evaluation** page, accessible from left side menu

The Monitor page gives you a full picture of evaluation activity: summary analytics at the top, and a filterable table of every completed evaluation below. You can also create revisions or delete evaluations directly from the table.

For full details on everything the Monitor page offers, see Monitor Evaluations.

***

### Permissions

| Role                   | Capabilities                                            |
| ---------------------- | ------------------------------------------------------- |
| **Viewer**             | Perform new evaluations and create revisions            |
| **Supervisor**         | View evaluation results, analytics, and monitor page    |
| **Admin / Specialist** | Create and manage criteria, modify Reason configuration |

***

### Troubleshooting & FAQs

**Can I choose which interaction to evaluate?**

No. Birdie automatically selects interactions using a weighted random algorithm. This ensures fair distribution across Reasons and prevents duplicate evaluations.

**Can I choose a specific evaluation form?**

No. The form is dynamically generated based on the manual criteria associated with the selected Reason. This ensures governance and consistency across evaluators.

**Why was a different Reason selected than expected?**

Birdie applies weighted sampling to ensure evaluations are proportionally distributed across Reasons. Reasons with more available interactions are more likely to be selected.

**What happens if I leave an evaluation unfinished?**

The interaction remains locked temporarily to prevent another evaluator from picking it up. If you abandon the session, the lock is released and the interaction becomes available again.

**Can an evaluation be revised more than once?**

No. Each evaluation supports exactly one revision. Once a revision exists, the evaluation cannot be edited further.

**Do manual evaluations appear in the same dashboards as AI evaluations?**

Yes. Manual evaluation results are integrated directly into Birdie's analytics dashboards and can be filtered by evaluation source (Manual or AI).

**Who should configure manual criteria?**

Only authorized **Admins** can create or modify criteria and their configuration. All configuration changes apply only to future evaluations and do not affect historical data.

***

### See Also

* Monitor Evaluations
* Criteria
* Reasons


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://ask.birdie.ai/agent-quality-assurance/manual-evaluation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.