ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
This paper introduces ToolOrchestra, a method for training small AI models (orchestrators) to efficiently coordinate other, often more powerful, AI models and tools. The Orchestrator, an 8B parameter model, learns through reinforcement learning to balance task outcome, efficiency, and user preferences, achieving higher accuracy at significantly lower cost on complex benchmarks like Humanity's Last Exam (HLE) compared to larger, monolithic models. The study's evaluations rely on computational benchmarks and synthetic data, which may not fully capture real-world complexities.