I can't help but think there is an element of "The Emperor's New Clothes" in our world today. Take a breath.. look past the glare of the shiny new toy and don't be afraid to ask questions. Today's blog focuses on the real risk of hallucination and why we should all tread a little carefully here..
Whichever way you turn your head, AI is everywhere. Everyone is talking about it, writing about it, and singing its praises. You would be forgiven if you already felt you had missed the boat. Lawyers are notorious for being slow adopters of new technology (images of wax seals and ribbons come to mind) but, this time, we might be leaders in our more cautious approach. I can’t help but think there is an element of “The Emperor’s New Clothes” in our world today. Instead of rushing to join the crowd, take a breath… Look past the glare of the shiny new toy and don’t be afraid to ask questions.
Don’t get me wrong – I am not saying don’t use AI. That horse has already bolted and I, myself, am an avid user of several AI tools already in my personal and professional life but I am treading carefully. I recognize my own lack of understanding and am not ashamed to say that I need to learn more before I dive in further. I have grave concerns about the “humanizing” of AI and how a misguided or premature over reliance on it might create a generation of “non-experts” in the future of our profession - but I shall write about those issues in a future edition of this Blog. Today, I want to start by airing some of my concerns about the proliferation of garbage information out there in our “cyber world” today and the problems it creates for us in the age of AI both in terms of the more mundane “garbage in garbage out” situations we are all familiar with since the early days of computing and programming to the more frightening issues of AI hallucinations and how those can be too quickly relied upon by generations that grew up with one predominant information source: the Internet.
Prior to the Internet (yes, my young readers, that time existed not so long ago), it was not so easy to publish a story, thought or opinion on something and before doing so, you usually had to go through a couple of fact & editorial checks beforehand. Now, it is easy to publish. Anyone can do it at any time and on any media. We post, we tweet, we blog, vlog and upload to our heart’s content. It is impossible to comprehend the amount of information that is now available to us online.. in encyclopedic reams.. at the touch of a button. Because it was once so hard to publish something, we got used to relying on the veracity of what was published. Now, we need to be more cautious. How do we know what we’re reading is true? How do we know this thing actually happened or that person actually did or said what we are reading? How do we know what we are researching is not taken out of context? And is it up to date?
Add to this the publicly acknowledged risk of generative AI “learning” from bad information sources and “generating” incorrect assumptions and conclusions from that – aka “hallucinating” – and we open a whole new Pandora’s Box of problems.
There are ways to verify what our AI tools are telling us, but we – the human – still need to do some of the work here. Legislators recognize this as they endeavor to put some layer of human oversight into the compliance and ethical obligations of business and leveraging AI tools but I am concerned that they are trying to close the door after the horse has bolted. I am struggling to envisage a feasible way to put a human intervention layer around a tool whose attraction is its ability to process billions of data points. I think they may have placed this obligation on the wrong party but that is also a topic for a future Blog of mine – watch this space, my friends, watch this space.
Today’s blog – Veritas Lost – focuses on the real risk of hallucination and why we should all tread a little carefully here. Stanford University HAI recently published an article by Matthew Dahl, Varun Magesh, Mirac Suzgun & Daniel E. Ho highlighting the disturbing findings of their study on “Hallucinating Law: Legal Mistakes with Large Language Models” – you can find it here: https://tinyurl.com/Hallucinating-Law I quote some of their findings here to illustrate my concerns – bold text is my own added emphasis. If I have additional thoughts, I have added those in (brackets & italics):
“..a core problem remains: hallucinations, or the tendency of LLMs to produce content that deviates from actual legal facts or well established legal principles and precedents”
“..legal hallucinations are pervasive and disturbing: hallucination rates range from 69% to 88%..” (I don’t know about you but, if I remember my early trainee lawyer days, if I produced legal work that was incorrect 70-90% of the time, I think I would have been fired before I finished my traineeship).
“..moreover, these models often lack self-awareness about their errors and tend to reinforce incorrect legal assumptions..” (Self-Awareness is frequently cited as one of the most important Leadership Skills or Competencies and rightly so. As we rely more and more on Generative AI and LLMs, we should consider carefully what we are using them for. Not only might these tools churn out bad work (and I keep calling them “tools” for a reason – more on that in a future Blog..) but if we remove many of these “traditional learning” tasks from our young professional workforce, it makes me wonder what kind of leaders we are creating for the future? And I will definitely be writing about that topic soon..!)
“Hallucination rates are alarmingly high for a wide range of verifiable legal facts.” (This statement makes me want to run for the hills!)
“..performance deteriorates when dealing with more complex tasks that require a nuanced understanding of legal issues or interpretation of legal texts…. most LLMS do no better than random guessing.. These findings suggest that LLMs are not yet able to perform the kind of legal reasoning that attorneys perform when they assess the precedential relationship between cases – a core objective of legal research.” (I saw a meme on FaceBook the other day that comes to mind here and it suggested that instead of having AI do all the complex work so that we can do the dishes & laundry, perhaps we should have AI do the dishes & laundry so we can do the complex work. A simplistic view perhaps but there is a grain of truth here and it goes to my point in this Blog about being cautious about how we choose to use AI in our lives. I believe humanity has an essence to it that cannot be emulated or perfected by a tool. I believe we are wise to remember, embrace and protect that).
“..LLMs show a tendency to perform better with more prominent cases, particularly those in the Supreme Court… [but] hallucinations are most common among the Supreme Court’s oldest and newest cases.. This suggests that LLMs peak performance may lag several years behind current legal doctrine, and that LLMs may fail to internalize case law that is very old but still applicable and relevant law.” (Systemic and selective bias come to mind here and, when we think of longstanding legal doctrine – principles that we once fought very hard to establish as a civilized nation – if we’re not careful how we use these tools, we could be at risk of inadvertently dismantling some well founded legal and civil rights).
The writers of the Stanford article also highlighted the danger of an LLM’s susceptibility to what they called “contra-factual bias” – “..namely the tendency to assume that a factual premise in a query is true, even if it is flatly wrong..” They found this phenomenon to be “particularly pronounced” in language models like GPT 3.5 “.. which often provide credible responses to queries based on false premises, likely due to its instruction-following training..” (Here comes the mother of all “garbage in garbage out” demons and it is exacerbated by improper prompts. Be careful my eager friend.. yes, you… the one who rushes in to “play around” with Generative AI to “see what it can do” and to “learn on the fly” (another leadership skill open to misuse in this new world we live in today).. Be careful – a little knowledge can be a dangerous thing…)
The writers also found that “..models are imperfectly calibrated for legal questions…. model confidence is correlated with the correctness of answers.. a common thread across all models is a tendency towards overconfidence, irrespective of their actual accuracy. This overconfidence is particularly evident in complex tasks..”
“There is also a looming risk of LLMs contributing to legal “monoculture”. Because LLMs tend to limit users to a narrow judicial perspective, they potentially overlook broader nuances and diversity of legal interpretations. This is substantively alarming…. There is also a version of representational harm: LLMs may systematically erase the contributions of one member of the legal community, such as Justice Ginsburg, by misattributing them to another…” (cue: stunned silence……… I’ve deliberately not highlighted the “legal backdrop” wording here so you can appreciate the wider ramifications across all walks of life from this particular finding.)
The writers eloquently close their article with words of warning to “warrant significant caution” and to promote the idea of “human-centered AI”.. using AI to “augment lawyers, clients and judges and not – as Chief Justice Roberts put it, risk “dehumanizing the law”.” (I go one step further and say we should be very careful that we do not “humanize” AI lest we forget who is really the master here. I, for one, will keep referring to AI as a “tool” and I refuse to depict AI with friendly humanlike avatars. AI is not human and never will be. We are wise to remember that).
Until next time, I hope you enjoyed this edition of #TheMaasMinute #AI #AlternativeInsights - Editorial Opinion and Thought Leadership by Alexia J. Maas; ©2024 All Rights Reserved (www.maasstrategic.com)
Comments