The Hidden Cost of Trusting AI Without Checking Its Work

AI is good at sounding right. That is exactly why it can be dangerous.

I use AI constantly in development work. It helps me move faster, sketch ideas sooner, and get through routine tasks with less friction. But I do not treat AI output as fact, and I do not think anyone else should either.

There is a big difference between using AI as a starting point and using it as a source of truth. That gap is where the real risk lives.

This is not hypothetical. Courts have sanctioned lawyers for filing briefs with fabricated citations. Hospitals and clinicians have used transcription tools that can invent words the speaker never said. Customer-service chatbots have given false policy information that companies were still held responsible for. Researchers and reviewers are now dealing with AI-generated references that look real enough to slip into published work. 1 2 3 4 5 6 7 8 9 10 11 12 13

The common thread is not that AI is useless. It is that AI output becomes dangerous the moment people stop verifying it.

Law Is the Clearest Warning Sign

The legal profession has become one of the clearest examples of what happens when AI-generated output is used without verification.

The best-known case is Mata v. Avianca. A New York attorney submitted a brief containing nonexistent cases generated by ChatGPT. In its sanctions order, the court described the filing as containing bogus judicial decisions along with bogus quotes and bogus internal citations. 1

That did not turn out to be a one-off embarrassment. By August 2025, one widely cited database tracking AI-related court incidents had logged more than 300 cases globally involving AI-generated hallucinations in litigation and judicial proceedings, including more than 200 recorded in the first eight months of 2025 alone. 2

The consequences have become more serious over time. In July 2025, lawyers representing Mike Lindell were sanctioned after filing a brief that reportedly contained more than two dozen errors, including fabricated citations, and a federal judge fined the attorneys $3,000 each. 3 In California, an appellate court sanctioned attorney Christopher Mostafavi $10,000 in Noland v. Land of the Free for filing briefs with fabricated citations generated by AI. 4 Reporting on that ruling also noted that the court criticized opposing counsel for not identifying the fake citations earlier. 14 In Arizona, a magistrate judge revoked attorney Maren Bam’s pro hac vice status, struck her filing, removed her as counsel, and imposed additional sanctions after fabricated authorities were attributed to real judges. 15 16

None of these incidents happened because the lawyers lacked access to information. They happened because fabricated output was treated as usable without a proper cite check.

That risk is not theoretical. A Stanford-led study found that large language models hallucinated at high rates when asked specific, verifiable legal questions. One version of the paper reported a hallucination range of 69% to 88% for the evaluated models on direct questions about random federal court cases. 17 18

Healthcare Has a Different Failure Mode, but the Same Root Problem

Legal cases are dramatic because sanctions are public. In healthcare, the danger is quieter.

OpenAI’s Whisper has been used in healthcare transcription workflows even though OpenAI’s own model documentation warns against use in “high-risk domains” and in “decision-making contexts, where flaws in accuracy can lead to pronounced flaws in outcomes.” 5

Researchers at Cornell found that roughly 1% to 1.4% of tested Whisper transcriptions contained entire hallucinated phrases or sentences that were not present in the underlying audio. The same research found that 38% of the hallucinations included explicit harms such as violence, false personal information, or misleading authority cues. 6 7

Reporting on real-world deployments made the problem more concrete. AP reported examples in which one engineer found hallucinations in roughly half of more than 100 hours of transcriptions he reviewed, while another developer reported hallucinations in nearly every one of 26,000 transcripts he analyzed. That reporting also said Whisper-based tooling was being used by more than 30,000 clinicians across 40 health systems or organizations in medical documentation workflows. 8

PBS reported that in at least one case involving a Whisper-based workflow, the original audio was discarded after transcription, leaving the AI-generated transcript as the only surviving record of what happened during the appointment. 19

That is the real issue. The danger is not just that AI can be wrong. It is that people can build processes where the wrong output becomes the official record.

Researchers have also shown that medical chatbots can be pushed into generating dangerous falsehoods with fabricated citations. A 2025 Reuters report on work from Flinders University described leading models that could be induced to produce misinformation, such as false claims about sunscreen and skin cancer, while citing journals that did not actually support the claim. 20

Academia Has a Compounding Problem

In academic work, AI errors can become self-reinforcing.

A fabricated citation is not always obviously fake. AI-generated references often use plausible author names, realistic journal titles, convincing formatting, and DOI-like identifiers, making them harder to spot at a glance. 12 13

That is where things get worse. Once a fake or corrupted citation slips into a paper, it can be copied into other papers, presentations, bibliographies, or literature reviews. The problem stops being a single hallucination and starts becoming contamination. 12 13

A 2025 JMIR Mental Health study found that 19.9% of citations generated in its test set were completely fabricated. The same study found that many of the remaining citations still contained bibliographic errors, and its authors concluded that citation fabrication and bibliographic inaccuracies remained common in GPT-4o outputs. 11 21

Post-publication analysis has also raised concerns about peer review catching these errors. GPTZero reported at least 100 confirmed hallucinated citations in accepted NeurIPS 2025 papers, and a later arXiv analysis described 100 fabricated citations appearing across 53 published NeurIPS 2025 papers. 12 13

The exact numbers will vary by model, prompt, and field. The broader point does not. If a citation looks real enough to pass a quick visual check, then “looks legitimate” is no longer a meaningful standard.

Customer Service Already Shows the Liability Problem

One of the clearest business examples came from Air Canada.

In 2024, a British Columbia tribunal found Air Canada responsible for misinformation from its chatbot about bereavement fares. The user had been told he could seek a refund after booking under conditions that did not match the airline’s actual policy. Air Canada argued, among other things, that the chatbot should be treated as a separate legal entity responsible for its own statements. The tribunal rejected that argument and held the airline responsible for the misinformation on its website. 9 10

The award totaled C$812.02, consisting of the fare difference, interest, and tribunal fees. The damages were modest, but the lesson was not. A company can still be held responsible when an automated system gives customers false information in the course of business. 9 10

That matters for any team shipping AI into support, onboarding, quoting, benefits, scheduling, or policy communication.

Defamation Makes the Same Point in a Harder Way

The reputational risk is not limited to internal mistakes. It can also become a public-facing business problem.

In 2025, Wolf River Electric sued Google over an AI Overview that allegedly told users the company was being sued by the Minnesota Attorney General for deceptive sales practices. According to reporting on the complaint, Wolf River alleged that the cited sources did not actually say what the AI-generated overview claimed they said. Reporting on the lawsuit also described specific alleged business losses tied to the false output, including terminated or threatened contracts after customers saw the AI summary. 22 23

Later that year, Robby Starbuck sued Google over chatbot outputs that allegedly included false accusations of sexual assault, fabricated criminal history, and invented court records. Reuters reported that Google later argued Starbuck had intentionally prompted the outputs and had not shown that anyone believed or saw the allegedly defamatory content, while also contesting the required showing of actual malice. 24 25 26

Those cases involve allegations and legal arguments, not final rulings on every issue. But they show the same pattern: if a system produces authoritative-sounding falsehoods about real people or businesses, the damage can arrive before the legal system resolves anything.

The Real Pattern

Across all of these examples, the same sequence keeps repeating.

Someone uses AI output as if it were reliable on its own.
The output contains something fabricated, distorted, or unsupported.
The error is not caught before it affects a filing, a record, a customer, a publication, or a reputation.
Then the consequences land. 1 3 8 9 23 24

The tool changes. The industry changes. The mechanics vary. The pattern does not.

AI is useful precisely because it can generate fluent, plausible output quickly. That is also why it is risky. The same system that can produce a helpful draft can also produce a confident fabrication in the same tone, with the same formatting, and with just enough surface plausibility to get waved through. 17 18

What This Means for People Building or Buying Software

If you are building products with AI, you need real verification layers. Not just another model evaluating the first one, but checks against source data, known records, validated citations, controlled retrieval, structured guardrails, and human review where the stakes justify it.

If you are using AI in your own work, treat the output like a draft, not a fact base. That applies to code, research, legal writing, medical documentation, content generation, and customer communication.

I use AI every day because it is genuinely useful. But speed only helps when the output is correct. If you skip verification, you are not eliminating work. You are moving it downstream into a more expensive failure.

Most people in these cases were not trying to commit fraud. They were trying to move faster. That is exactly why this matters.

The trap is not using AI.

The trap is trusting it before you check it.

Sources

Mata v. Avianca, Inc. sanctions order (PDF): Source ↗
Jones Walker LLP, “From Enhancement to Dependency: What the Epidemic of AI Failures in Law Means for Professionals” (Aug. 19, 2025): Source ↗
The Colorado Sun, “MyPillow CEO’s lawyers fined for AI-generated court filing in Denver defamation case” (July 7, 2025): Source ↗
Noland v. Land of the Free, L.P. (Cal. Ct. App. 2025): Source ↗
Whisper model documentation: Source ↗
Cornell Chronicle, “AI speech-to-text can hallucinate violent language” (June 11, 2024): Source ↗
Koenecke et al., “Careless Whisper: Speech-to-Text Hallucination Harms” (ACM FAccT 2024): Source ↗
AP, “Researchers say AI transcription tool used in hospitals invents things no one ever said” (Oct. 26, 2024): Source ↗
American Bar Association, “BC Tribunal Confirms Companies Remain Liable for Information Provided by AI Chatbot” (Feb. 29, 2024): Source ↗
The Guardian, “Air Canada ordered to pay customer who was misled by airline’s chatbot” (Feb. 16, 2024): Source ↗
Linardon et al., “Influence of Topic Familiarity and Prompt Specificity on Citation Fabrication in Mental Health Research Using Large Language Models” (JMIR Mental Health, 2025): Source ↗
GPTZero, “GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers” (Jan. 21, 2026): Source ↗
Ansari, “Compound Deception in Elite Peer Review: A Failure Mode Taxonomy of 100 Fabricated Citations at NeurIPS 2025” (arXiv, 2026): Source ↗
LawNext, “A New Wrinkle in AI Hallucination Cases: Lawyers Dinged for Failing to Detect Opponent’s Fake Citations” (Sept. 16, 2025): Source ↗
Mavy v. Commissioner of Social Security Administration order: Source ↗
ABA Journal, “Confronted with AI hallucinations in filings, one court shows ‘justifiable kindness,’ while another gets tough” (Aug. 19, 2025): Source ↗
Dahl et al., “Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models” (paper PDF): Source ↗
Stanford Law School publication page for “Large Legal Fictions”: Source ↗
PBS NewsHour, “What to know about an AI transcription tool that ‘hallucinates’ medical interactions” (Jan. 25, 2025): Source ↗
Reuters, “It’s too easy to make AI chatbots lie about health information, study finds” (July 1, 2025): Source ↗
EurekAlert summary of the JMIR study (Nov. 17, 2025): Source ↗
Government Technology / Star Tribune syndication, “Minnesota Solar Company Sues Google Over AI Summary” (June 13, 2025): Source ↗
Volokh Conspiracy / Reason, “Large Libel Models: Small Business Sues Google, Claiming AI Overview in Searches Hallucinated Attorney General Lawsuit” (June 11, 2025): Source ↗
The Regulatory Review, “What Starbuck v. Google Reveals About AI Liability” (Dec. 22, 2025): Source ↗
Reuters, “Conservative activist sues Google over AI-generated statements” (Oct. 22, 2025): Source ↗
Reuters, “Google asks court to dismiss conservative influencer’s AI defamation lawsuit” (Nov. 17, 2025): Source ↗

The Hidden Cost of Trusting AI Without Checking Its Work

Law Is the Clearest Warning Sign

Healthcare Has a Different Failure Mode, but the Same Root Problem

Academia Has a Compounding Problem

Customer Service Already Shows the Liability Problem

Defamation Makes the Same Point in a Harder Way

The Real Pattern

What This Means for People Building or Buying Software

Sources

Let's Talk

Free Consultation

Trusted By

The Hidden Cost of Trusting AI Without Checking Its Work

Law Is the Clearest Warning Sign

Healthcare Has a Different Failure Mode, but the Same Root Problem

Academia Has a Compounding Problem

Customer Service Already Shows the Liability Problem

Defamation Makes the Same Point in a Harder Way

The Real Pattern

What This Means for People Building or Buying Software

Sources

Related Posts

Claude Opus 4.6: What Actually Matters for Developers

When Claude Code Crashes: Using Claude.ai as a Fallback for Codebase Analysis

Building a Personal Workflow in 10 Minutes with Claude

Let's Talk

Free Consultation

Trusted By