Researchers Uncover Reasons for Hallucinations from Generative AI

In mid-September, researcher Leon Chlon and his colleagues, Ahmed Karim and Maggie Chlon, published a paper arguing that large language models (LLMs) sometimes invent facts because of flaws in the way they compress information. The problem comes partly from the order in which models read prompts. Key details that appear late in a prompt can slip down the list of priorities or be missed altogether, leading the system to fill in gaps with guesswork, the team wrote.
The study introduces a new way to pinpoint when a model doesn’t have enough information to answer, and is more likely to compose hallucinations. On his Substack, Chlon suggested a practical fix: reshuffling the same prompt in different orders. By averaging the model’s answers, researchers can detect when confidence is real and when it’s just a side effect of prompt wording or order.
Using an example in medical diagnosis, Chlon demonstrated how an LLM misfires if symptoms appear late in a prompt, a phenomenon he called the Extraction Pathway Problem.
Google Staff Architect Mohammad Ghodratigohar analyzed Chlon’s paper in a YouTube video, demonstrating how to introduce technical, code-level solutions to predict and reduce hallucinations in a model.