72 concepts · 10 modules
📋
AI Agent Evaluation
Benchmarks, automated evaluation methods, trajectory analysis, and production monitoring for AI agents.
Start Module 01Curriculum
A structured path through the course content.
📋
Module 01 7 concepts
Start here
Foundations of Agent Evaluation
Core concepts, challenges, and frameworks for evaluating AI agents.
🏆
Module 02 9 concepts
Benchmark Ecosystem
Major benchmarks, leaderboards, and evaluation datasets.
🤖
Module 03 8 concepts
Automated Evaluation Methods
LLM-as-judge, rubric-based scoring, and automated metrics.
🔍
Module 04 7 concepts
Trajectory & Process Analysis
Analyzing agent reasoning chains and decision processes.
📊
Module 05 7 concepts
Statistical Methods
Statistical rigor, confidence intervals, and significance testing.
⚖
Module 06 6 concepts
Cost-Quality-Latency Tradeoffs
Balancing evaluation cost, quality, and speed.
🛡
Module 07 8 concepts
Safety & Alignment Evaluation
Red teaming, safety benchmarks, and alignment testing.
🔧
Module 08 7 concepts
Evaluation Tooling
Frameworks, platforms, and infrastructure for evaluation.
📡
Module 09 6 concepts
Production Monitoring
Online evaluation, A/B testing, and production metrics.
🔬
Module 10 7 concepts
Frontier Research
Open problems and emerging directions in agent evaluation.