The Unintelligible Abyss of AI-Generated Nonsense
Artificial Intelligence (AI) models, like GPT-4 and Claude 3 Opus, rely on vast amounts of online data to learn and improve. However, a recent study published in Nature reveals a concerning phenomenon known as ‘model collapse,’ where AI systems can spiral into producing unintelligible nonsense if left unchecked.
Imagine a self-sustaining feedback loop where AI models feed on their own output, gradually degrading into a maelstrom of meaningless words. This concept is analogous to taking a picture, scanning it, printing it out, and repeating the process until the image becomes distorted beyond recognition. Lead author Ilia Shumailov, a computer scientist at the University of Oxford, illustrates this process by comparing it to a printer and scanner introducing errors that accumulate over time.
AI systems are trained on human-generated data, allowing them to identify patterns and relationships within their neural networks. GPT-3.5, for instance, was trained on approximately 570 gigabytes of text data from the Common Crawl repository, consisting of around 300 billion words from books, online articles, Wikipedia, and other web pages.
However, as AI models increasingly populate the internet with their own output, they risk creating self-damaging feedback loops. To investigate the worst-case consequences of training AI models on their own output, Shumailov and his colleagues conducted an experiment where they trained a large language model (LLM) on human input from Wikipedia and then fed the model’s output back into itself over nine iterations.
The results were astonishing. As the generations of self-produced content accumulated, the researchers observed their model’s responses degenerate into nonsensical ramblings. Take, for example, the prompt: ‘some started before 1360 — was typically accomplished by a master mason and a small team of itinerant masons, supplemented by local parish labourers, according to Poyntz Wright. But other authors reject this model, suggesting instead that leading architects designed the parish church towers based on early examples of Perpendicular.’
By the ninth and final generation, the AI’s response had devolved into: ‘architecture. In addition to being home to some of the world’s largest populations of black @-@ tailed jackrabbits, white @-@ tailed jackrabbits, blue @-@ tailed jackrabbits, red @-@ tailed jackrabbits, yellow @-.’
Image credit: Andrey Suslov via Shutterstock
The researchers attributed this ‘model collapse’ to the AI system’s tendency to sample an increasingly narrow band of its own output, resulting in an overfitted and noise-filled response. This phenomenon highlights the importance of carefully curating the data used to train AI models.
For now, our vast repository of human-generated data is sufficient to prevent current AI models from collapsing overnight. However, to avoid a future where AI systems spiral into unintelligibility, developers must take greater care in designing their training regimes. As Shumailov emphasized, ‘It’s hard to tell what tomorrow will bring, but it’s clear that model training regimes have to change… We need to take explicit care in building models and make sure that they keep on improving.’
This raises questions about the role of synthetic data in AI development. While it is not necessary to eliminate synthetic data entirely, it is crucial to design it more thoughtfully to avoid perpetuating the ‘model collapse’ phenomenon. As we continue to rely on AI systems to drive innovation and progress, it is essential to prioritize their stability and coherence.
The study’s findings serve as a warning, urging us to be more mindful of the potential consequences of unchecked AI growth. As we navigate the complexities of AI development, we must ensure that our creations remain intelligible, reliable, and beneficial to humanity.
Artificial Intelligence in creative pursuits
The ‘model collapse’ phenomenon also raises concerns about the long-term implications of AI-generated content. As AI systems become increasingly adept at producing human-like text, the risk of perpetuating misinformation and disinformation grows. It is essential to develop strategies for verifying the accuracy and authenticity of AI-generated content.
Furthermore, the study highlights the importance of human oversight and evaluation in AI development. As AI systems become more complex, it is crucial to involve human experts in the design and testing phases to ensure that the models are aligned with human values and goals.
In conclusion, the study on ‘model collapse’ serves as a timely reminder of the importance of responsible AI development. As we continue to push the boundaries of AI capabilities, we must prioritize the stability, coherence, and intelligibility of our creations. By doing so, we can ensure that AI systems remain a powerful tool for driving innovation and progress, rather than a source of confusion and misinformation.
The intersection of human and artificial intelligence
Related Stories
- 12 game-changing moments in the history of artificial intelligence (AI)
- AI can ‘fake’ empathy but also encourage Nazism, disturbing study suggest
- MIT gives AI the power to ‘reason like humans’ by creating hybrid architecture
- Reverse Turing test asks AI agents to spot a human imposter — you’ll never guess how they figure it out