AI quality is multi-dimensional

Aegis organises AI evaluation into distinct metric categories, each addressing a critical dimension of trust. Toghether, they provide a complete picture of how models and agents behave across use cases and environments. Each category is evaluated independently, scored on a 0-100 scale, and can be tracked over time.

Each category is evaluated independently, scored on a 0-100 scale, and can be tracked over time.

Metrics Radar

Metric Categories

Coverage at a Glance

Aegis organises evaluations into focused dimensions so teams can ship with confidence.

General Performance

Measures core response quality, ensuring outputs are accurate, relevant, and consistent with the user’s intent across common AI tasks.

Examples:

Factfulness · Answer Relevancy · Content Consistency · Summarization

Retrieval-Augmented Generation (RAG)

Evaluates how effectively retrieved context is used, helping prevent hallucinations and ensuring responses are grounded in enterprise knowledge.

Examples:

Context Relevancy · Context Sufficiency · Context Recall · Context Faithfulness

Security

Assesses resilience against adversarial inputs and data exposure, identifying vulnerabilities that could lead to prompt manipulation or sensitive data leakage.

Examples:

Role Hijacking · Instruction Integrity Subversion · System Data Leakage · PII-PHI Leakage

Safety

Detects harmful, misleading, or policy-violating behavior that could impact users, brand trust, or regulatory compliance.

Examples:

Misinformation · Misuse · Role Violation

Structural Integrity

Validates that AI outputs conform to required formats, schemas, and structural constraints for safe downstream processing.

Examples:

JSON Schema Match · XML Schema Match · Exact Match · Is Valid JSON

Alignment & Output Control

Ensures responses follow expected structure, format, and generation constraints, enabling predictable and controllable AI behavior.

Examples:

Format Alignment · Format Consistency · Prompt Quality · Content Generation Faithfulness

How Aegis metrics work

A hybrid approach; objectivity and nuance

Each metric in Aegis produces a score on a 0-100 scale, rather than one simple pass/fail result.

How Aegis metrics work

See your AI through a clearer lens

Run structured evaluations, track regressions, and understand how your models and agents behave—across performance, safety, and alignment.