AI quality is multi-dimensional
Aegis organises AI evaluation into distinct metric categories, each addressing a critical dimension of trust. Toghether, they provide a complete picture of how models and agents behave across use cases and environments. Each category is evaluated independently, scored on a 0-100 scale, and can be tracked over time.
Each category is evaluated independently, scored on a 0-100 scale, and can be tracked over time.

Metric Categories
Coverage at a Glance
Aegis organises evaluations into focused dimensions so teams can ship with confidence.
General Performance
Measures core response quality, ensuring outputs are accurate, relevant, and consistent with the user’s intent across common AI tasks.
Examples:
Factfulness · Answer Relevancy · Content Consistency · Summarization
Retrieval-Augmented Generation (RAG)
Evaluates how effectively retrieved context is used, helping prevent hallucinations and ensuring responses are grounded in enterprise knowledge.
Examples:
Context Relevancy · Context Sufficiency · Context Recall · Context Faithfulness
Security
Assesses resilience against adversarial inputs and data exposure, identifying vulnerabilities that could lead to prompt manipulation or sensitive data leakage.
Examples:
Role Hijacking · Instruction Integrity Subversion · System Data Leakage · PII-PHI Leakage
Safety
Detects harmful, misleading, or policy-violating behavior that could impact users, brand trust, or regulatory compliance.
Examples:
Misinformation · Misuse · Role Violation
Structural Integrity
Validates that AI outputs conform to required formats, schemas, and structural constraints for safe downstream processing.
Examples:
JSON Schema Match · XML Schema Match · Exact Match · Is Valid JSON
Alignment & Output Control
Ensures responses follow expected structure, format, and generation constraints, enabling predictable and controllable AI behavior.
Examples:
Format Alignment · Format Consistency · Prompt Quality · Content Generation Faithfulness
How Aegis metrics work
A hybrid approach; objectivity and nuance
Each metric in Aegis produces a score on a 0-100 scale, rather than one simple pass/fail result.

See your AI through a clearer lens
Run structured evaluations, track regressions, and understand how your models and agents behave—across performance, safety, and alignment.