arXiv AI Papers

The World Won't Stay Still: Programmable Evolution for Agent Benchmarks

Back to overview

Researchers introduce ProEvolve, a graph-based framework that enables programmable environment evolution for AI agent benchmarks. Unlike static benchmarks, ProEvolve uses typed relational graphs to represent data, tools, and schemas, allowing controlled modifications through graph transformations. This approach evaluates how LLM-driven agents adapt to real-world environmental changes.