The law relies on being precise. AI is disrupting that

A man types into the window of generative AI on his laptop
Banner: Getty Images

Businesses, governments and firms are continuing to experiment with generative AI, but for lawyers and courts, accuracy matters

By Andrew Lim and Professor Jeannie Marie Paterson, University of Melbourne

Professor Jeannie Marie PatersonAndrew Lim

Published 23 June 2025

The use of AI is spreading throughout all professions and industries – and law is no exception.

Lawyers have been innovative in their use of large language models (LLMs), but we’re also seeing examples of them being caught out by the language capabilities of generative AI.

Judges at Banco Court, Queens Square in Sydney
Courts and legal regulators are warning lawyers not to rely on generative AI without checking their work. Picture: Getty Images

Don’t rely on AI

Refined AI models, combined with retrieval or ‘RAG’ systems, have the capacity to summarise, review and analyse documents.

But increasingly, courts and legal regulators are warning lawyers not to rely on generative AI without checking their work – and in some instances advised not to use it at all.

In fact, we’ve already seen courts admonish lawyers for submitting legal documents that include content fabricated by AI.

And in Australia, courts themselves have been cautious about the use of the technology.

Ordinary meaning and generative AI

Some courts around the world are experimenting with using generative AI.

In England, Lord Justice Birss advised that he used ChatGPT to summarise an area of the law. In the US case of Snell v United Specialty Insurance Co, Justice Kevin Newsom used ChatGPT to determine the plain and ordinary meaning of a contentious term in an insurance policy.

When interpreting documents to decide their legal meaning, courts typically look to the ordinary meaning of words, as well as the context in which they are used.

But how do judges determine the ordinary meaning of a word?

A woman working on a computer screen which reflects in her glasses.
Courts typically look to the ordinary meaning of words. Picture: Getty Images

One approach is to ask ‘ordinary people’.

For most of the nineteenth century the US Supreme Court mandated that its justices ride town-to-town hearing cases to give them exposure to everyday citizens and conditions outside the capital.

Judges often consult a dictionary. Or perhaps they ask ChatGPT.

One tool among many

Let’s go back to the Snell case in the US mentioned earlier. In mid-2024, Justice Kevin Newsom was tasked with deciding whether an insurance policy’s coverage of ‘landscaping’ included the installation of a trampoline.

Justice Newsom checked three dictionaries and found three very different answers.

His Honour considered a “visceral, gut-instinct” feeling, only to decide it didn’t seem very legally compelling. Instead, His Honour asked ChatGPT.

The possibility of generative AI providing an ‘ordinary meaning’ is in some ways compelling. 

After all, these models are trained on vast corpuses of the English language – books, newspapers, user prompts – covering all sorts of speech in all sorts of contexts.

Their reading is not limited by background, interests or age. In this sense ChatGPT might be seen as representing an amalgam of the ordinary person in a way that judges, or even the compilers of dictionaries, are not.

A woman's hand going through a dictionary in front of a laptop screen
Not only does generative AI hallucinate, it can also be sycophantic. Picture: Getty Images

All that said, the judge ended by sounding a note of caution. In his Honour’s view, LLMs should only be “one tool among many”, held up and tested against historical context, common sense and dictionary meanings.

Not an oracle of ordinary meaning

There are also important differences between generative AI and dictionaries, which limit their role as oracles of ordinary meaning.

As numerous lawyers have discovered, not only does generative AI hallucinate, it can also be sycophantic, providing the answer that, as suggested by the context, is what the person providing the prompt wants to hear.

This means the tools are likely to offer up confident-but-rubbish answers by attempting to mimic human speech.

Another concern is around transparency.

We know how dictionaries are compiled and the process is scrupulously documented. We have no clear understanding of the training data used in the free general-purpose AI models.

And, unlike dictionaries, which use a collaborative process for compiling content, much of the AI training data is often used without the knowledge or consent of the original author.

From common meaning to judgment

Courts, business, governments and firms are continuing to experiment with generative AI. Tech firms and developers are continuing to find ways to refine and improve their outputs.

For lawyers and courts, accuracy matters. But so do other values, like transparency, fairness and accountability.

The Front window of the Law Courts in Australia, with the coat of arms of Australia.
A judge’s core responsibility is resolving the tension between legal language and lived reality. Picture: Getty Images

These are part of Australia’s ethical AI framework and central to the administration of justice. 

A judge’s core responsibility in interpreting texts has remained unchanged across the centuries: resolving the tension between clinical legal language and the often-messy nature of our lived realities.

The allure of generative AI’s data-backed outputs should not distract from the fact that these decisions invariably rely on judgment and context.

While it may present new pieces of information in making these decisions, generative AI cannot be treated as any more authoritative or reliable than any other source – and certainly no more ethically compelling.

Find out more about research in this faculty

Engineering & Technology