The Future of Generative AI: From RAG to Agent Systems
The world of generative AI has come a long way since the development of foundation models. From language models to multi-modal models, the possibilities are endless. As we move forward, it’s essential to understand how these models work and how they can be leveraged to bring about real impact.
Illustration of AI entities walking
Gen AI 1.0: LLMs and Emergent Behavior from Next-Generation Tokens
The core models from Anthropic, OpenAI, Mixtral, Meta, and elsewhere have become much more in-tune with what people want out of them. By understanding how language is converted to tokens, we have learned that formatting is important (YAML tends to perform better than JSON). By better understanding the models themselves, the generative AI community has developed “prompt-engineering” techniques to get the models to respond effectively.
“By providing a few examples (few-shot prompt), we can coach a model towards the answer style we want. Or, by asking the model to break down the problem (chain of thought prompt), we can get it to generate more tokens, increasing the likelihood that it will arrive on the correct answer to complex questions.”
Gen AI 1.5: Retrieval Augmented Generation, Embedding Models, and Vector Databases
Another foundation for progress is expanding the amount of information that an LLM can process. State-of-the-art models can now process up to 1M tokens (a full-length college textbook), enabling users to control the context with which they answer questions in ways that weren’t previously possible.
Complex text processing
Gen 2.0 and Agent Systems
The next evolution is in creatively chaining multiple forms of gen AI functionality together. The first steps in this direction will be in manually developing chains of action (a system like BrainBox.ai ARIA, a gen-AI powered virtual building manager, that understands a picture of a malfunctioning piece of equipment, looks up relevant context from a knowledge base, generates an API query to pull relevant structured information from an IoT data feed, and ultimately suggests a course of action).
Agent-based systems
Conclusion
As organizations mature in their use of LLMs over the next year, the game will be about obtaining the highest quality outputs (tokens), as quickly as possible, at the lowest possible price. This is a fast-moving target, so it is best to find a partner who is continuously learning from real-world experience running and optimizing genAI-backed solutions in production.
Optimizing genAI solutions