The Rise of AI: Understanding Visual Knowledge of Language Models
The concept of a picture being worth a thousand words has been around for centuries, but what if a large language model (LLM) has never seen an image before? Can it still understand the visual world? Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have found that language models trained purely on text have a solid understanding of the visual world. They can write image-rendering code to generate complex scenes with intriguing objects and compositions.
The visual knowledge of these language models is gained from how concepts like shapes and colors are described across the internet, whether in language or code. When given a direction like “draw a parrot in the jungle,” users jog the LLM to consider what it’s read in descriptions before. To assess how much visual knowledge LLMs have, the CSAIL team constructed a “vision checkup” for LLMs: using their “Visual Aptitude Dataset,” they tested the models’ abilities to draw, recognize, and self-correct these concepts.
“We essentially train a vision system without directly using any visual data,” says Tamar Rott Shaham, co-lead author of the study and an MIT electrical engineering and computer science (EECS) postdoc at CSAIL. “Our team queried language models to write image-rendering codes to generate data for us and then trained the vision system to evaluate natural images. We were inspired by the question of how visual concepts are represented through other mediums, like text.”
The researchers gathered these illustrations, which were then used to train a computer vision system that can recognize objects within real photos (despite never having seen one before). With this synthetic, text-generated data as its only reference point, the system outperforms other procedurally generated image datasets that were trained with authentic photos.
The Future of AI Governance
However, as AI technology advances, concerns about data privacy and governance are growing. Meta has deferred training its large language models using publicly available Facebook and Instagram content without the explicit consent of its adult users across the European Union. This decision comes amid a complaint from Austrian non-profit noyb accusing Meta of violating the General Data Protection Regulation with developing AI by leveraging users’ data.
In other news, Kong Inc. has announced the general availability of Kong AI Gateway, providing enterprises with an AI-native API gateway to govern and secure generative AI workloads across any cloud environment. The product offers a suite of infrastructure capabilities tailored for AI, including support for multiple large language models (LLMs), semantic caching, semantic routing, semantic firewalling, and model lifecycle management.
“Organizations and developers are building new gen AI use cases to create better user experiences and customer experiences. But like every technology in the world, in the early days of starting something new with cutting edge technology, it is very hard to scale that in production without having proper infrastructure,” explained Marco Palladino, CTO and co-founder of Kong.
The Kong AI Gateway serves as a central hub, providing a unified API interface to manage and secure multiple AI technologies across various applications. The gateway offers a comprehensive set of AI-specific capabilities, enabling enterprises to effectively deploy and scale their generative AI initiatives.
As AI technology continues to advance, it is crucial to address the concerns surrounding data privacy and governance. With the rise of generative AI, it is essential to have the proper infrastructure in place to ensure the responsible development and deployment of AI models.
Image
Image
Image