Large language models (LLMs) have a well-known propensity to “hallucinate,” or provide false information in response to the user's prompt—note that the National Institute of Standards and Technology's preferred term for this is "confabulate," but this has yet to catch on. Researchers at Stanford University previously found that legal AI tools hallucinated 58–82% of the time on legal queries. This phenomenon received attention in the case of the now-infamous New York lawyer who faced sanctions over his use of generative AI tools, as well as Chief Justice John Roberts's 2023 annual report on the judiciary, which warned attorneys that using legal generative AI tools could result in inaccuracies due to AI hallucinations.
To minimize hallucinations, the industry is turning to retrieval-augmented generation (RAG) as a solution. RAG has shown promise in reducing false information outputted by LLMs in domain-specific contexts because RAG involves retrieving the appropriate document for the LLM to base its response on. However, Stanford researchers examine such RAG-powered generative AI legal tools and note that hallucination still occurs in approximately 17–34% of responses. This can be the result of sub-optimal document retrieval, sycophancy (where the model has a tendency to tell the user what it wants to hear, not necessarily what is accurate), or a number of other factors. Hence caution is still needed when making use of generative AI legal tools, even if they are supplemented by RAG techniques.