The AI Hallucination Conundrum: Can it Ever be Fixed?

_{^{Photo by Matt Ridley on Unsplash}}

AI Hallucination in LLM and Beyond: Will it Ever be Fixed?

The honeymoon period for Generative AI (GenAI) is well and truly on, with broad consensus on its transformative power. However, like us mere mortals, GenAI isn’t without its flaws. Sometimes subtle, sometimes glaringly obvious, AI can have a tendency to make factual mistakes, or hallucinate. These are instances where GenAI models produce incorrect, illogical, or purely nonsensical output, amounting to beautifully wrapped gibberish.

AI hallucination in action

From Google Gemini’s historically inaccurate images to Meta AI’s gender-biased pictures, whether it’s ChatGPT’s imaginary academic citations for generative text or Microsoft Edge’s Bing Copilot giving erroneous information, these mistakes are noteworthy. Call it inference failure or Woke AI, they’re all shades of AI hallucinations on display.

LLM Hallucination Leaderboard (as of June 28, 2024)

Researchers have created a public leaderboard on GitHub to track the hallucination rates in popular LLMs. They built an AI model to detect hallucinations in LLM outputs, feeding 1000 short documents to various AI models and measuring the rate of factual consistency and hallucination in their output. The models were also measured by their answer rate and average summary length. According to their leaderboard, some of the LLMs with the lowest hallucination rates are GPT-4 Turbo, Snowflake Arctic, and Intel Neural Chat 7B.

Why Does AI Hallucinate?

AI hallucinations in popular LLMs like Llama 2 (70 billion parameters), GPT-3.5 (175 billion parameters), Claude Sonnet (70 billion parameters), etc., are all ultimately linked to their training data. Despite its gigantic size, if the training data of these LLMs had built-in bias of some kind, the generative AI output of these LLMs can have hallucinated facts that try to reinforce and transfer that bias in some form or another – similar to the Google Gemini blunders, for example. On the other end of the spectrum, absence of enough variety of data on any given subject can also lead to AI hallucinations every time the LLM is prompted on a topic it isn’t well-versed to answer with authority.

Training data bias leading to AI hallucination

Can AI Hallucination be Detected and Stopped?

University of Oxford researchers seem to have made significant progress in ensuring the reliability of information generated by AI, one that addresses the issue of AI hallucination fair and square. Their study, published in Nature, introduces a novel method for detecting instances when LLMs hallucinate by inventing plausible-sounding but imaginary facts. The new method proposed by Oxford researchers analyses the statistics behind any given AI model’s answer, specifically looking at the uncertainty in the meaning of a phrase in a generated sentence rather than just its grammatical structure, allowing it to determine if the model is genuinely unsure about the answer it generates for any given prompt.

Oxford researchers’ method for detecting AI hallucination

Microsoft also claims to tackle AI hallucinations through new tools as part of its Azure AI Studio suite for enterprise customers, according to a report by The Verge. Microsoft is able to detect AI hallucinations in GenAI-based deployments of its enterprise customers’ apps by blocking malicious prompts that trick their customers’ AI into deviating from its training data.

AI Hallucination: A Glass Half Full or Empty?

Long story short, it appears there’s no single solution for stopping AI hallucinations. With GenAI deployments across various industries still accelerating, the problem of AI hallucination remains an ongoing area of research for all major tech players and academia. In fact, one research paper from the National University of Singapore asserts that AI hallucination is inevitable due to an innate limitation of LLMs.

AI hallucination: a curse and a blessing in disguise

Depending on how you look at it, the phenomenon of AI hallucination seems to be both a curse and a blessing in disguise (but it’s mostly a curse). It mirrors the complexities of the human brain and cognitive thought, in a process shrouded in mystery that both medical researchers and computer scientists don’t fully understand. Just as our brains can sometimes misinterpret or fill gaps in information, creating illusions or mistaken perceptions, AI systems too encounter limitations in interpreting data. While efforts are underway to enhance their accuracy and reliability, these occasional AI hallucinations also present opportunities for creativity and innovation, for thinking out of the box – similar to how our minds can unexpectedly spark new ideas.

AI hallucination opportunities for creativity and innovation

This realisation should make you appreciate your LLM’s output even more, that GenAI isn’t too dissimilar from us when it comes to brainfarts. Until the experts lobotomise the problem, keep double-triple checking your favourite LLM’s response.