RouteLLM: Revolutionizing LLM Routing with Cost-Effective Open-Source Framework

Researchers from UC Berkeley and Anyscale introduce RouteLLM, an open-source framework for cost-effective LLM routing that balances price and performance.
RouteLLM: Revolutionizing LLM Routing with Cost-Effective Open-Source Framework
Photo by Force Majeure on Unsplash

RouteLLM: The Open-Source Framework Revolutionizing LLM Routing

Large Language Models (LLMs) have showcased impressive capabilities across various tasks, but varying widely in costs and capabilities poses a significant challenge in deploying these models in real-world applications. Researchers from UC Berkeley, Anyscale, and Canva propose RouteLLM, an open-source LLM routing framework that effectively balances price and performance to address this issue.

Challenges in LLM Routing

LLM routing aims to determine which model should handle each query to minimize costs while maintaining response quality. The routing system must infer the characteristics of incoming queries and the capabilities of different models, making the problem complex. RouteLLM addresses this by utilizing preference data to train its routers, allowing the system to learn which queries can be handled by weaker models and which require stronger models.

Large Language Models(_download_image) Illustrative image of Large Language Models

Framework and Methodology

RouteLLM formalizes the problem of LLM routing and explores augmentation techniques to improve router performance. The framework uses public data from Chatbot Arena and incorporates novel training methods. Four different routers were trained: Similarity-weighted (SW) ranking router, Matrix factorization model, BERT classifier, and Causal LLM classifier. The training process leverages preference data, where each data point consists of a prompt and a comparison of response quality between two models. This method helps understand the strengths and weaknesses of different models relative to various queries.

Performance and Cost Efficiency

The performance of these routers was evaluated on benchmarks like MT Bench, MMLU, and GSM8K. The results demonstrated that the routers could significantly reduce costs without compromising quality. For instance, on MT Bench, the matrix factorization router achieved 95% of GPT-4’s performance while making only 26% of the calls to GPT-4, resulting in a 48% cost reduction compared to the random baseline. Augmenting the training data using an LLM judge further improved the routers’ performance, reducing the number of GPT-4 calls required to just 14% while maintaining the same performance level.

Router Performance(_download_image) Illustrative image of Router Performance

Comparison with Commercial Offerings

RouteLLM’s performance was compared against commercial routing systems like Martian and Unify AI. Using GPT-4 Turbo as the strong model and Llama 2 70B or Mixtral 8x7B as the weak model, RouteLLM achieved similar performance while being over 40% cheaper. This comparison underscores the cost-effectiveness and competitive edge of the RouteLLM framework.

Generalization to Other Models

To demonstrate its generalizability, RouteLLM was tested with different model pairs, such as Claude 3 Opus and Llama 3 8B. The routers maintained strong performance without retraining, indicating that they learned common characteristics that help distinguish between strong and weak models, applicable to new model pairs.

Conclusion

RouteLLM provides a scalable and cost-effective solution for deploying LLMs by effectively balancing cost and performance. The framework’s use of preference data and data augmentation techniques ensures high-quality responses while significantly reducing costs—making it an attractive option for businesses and organizations looking to integrate LLMs into their operations.

RouteLLM(_download_image) Illustrative image of RouteLLM