Who It's For
CRAB helps researchers and developers test AI agents. If you build agents that interact with different digital systems, like computers or phones, this tool lets you evaluate their performance and find improvements.
What You Get
You get a complete system to evaluate AI agents. It supports many environments, so agents can work across various interfaces. CRAB includes a smart evaluation tool for detailed performance insights, not just pass/fail. It also automatically creates complex tasks, saving time.
How It Works
CRAB lets you define agent actions and environments with Python. Agents run in diverse settings, from Linux to Android. The system automatically generates realistic tasks by combining smaller steps. A "graph evaluator" tracks agent progress, checking each step for a detailed report on strengths and weaknesses.