9 min read

Why misquoted standards are engineering's quiet liability

Why misquoted standards are engineering's quiet liability
16:40

Most reviewers check that a citation exists. Almost nobody checks that it says what the report claims it says. That gap is where the trouble lives.


Key takeaways

  • The most common citation problem in technical reports is not made-up references. It is real references that do not say what the report claims they say.
  • Standard manual review checks that a citation exists. It rarely checks that the cited clause actually supports the statement attached to it.
  • Generic AI tools are particularly bad at this. Stanford research found that even purpose-built legal AI tools hallucinate 17% to 33% of the time when answering questions tied to specific sources.
  • The fix is not a smarter chatbot. It is a deterministic check that compares the claim in the report against the actual text of the cited standard.
  • Engineering, audit, and climate disclosure work all run on citations to ISO, AS/NZS, ASTM, IEC, IEEE, PAS and similar standards. The cost of getting them wrong shows up at exactly the wrong moment.

Picture the moment

You are a discipline lead. It is Friday afternoon. You are signing off a 90-page design report that needs to go to the client on Monday. The report cites ISO 14064 in three places, AS/NZS 5667 once, and PAS 2080:2023 a dozen times.

You skim. The citations look fine. The format is consistent. The references list is complete. You sign.

Did you actually check that ISO 14064-1, clause 6.4.1 says what the report claims it says? Of course you didn't. Nobody does.

This is the problem.

Most citation errors are not what you think

When people talk about citation problems, they usually mean two things: missing references, or fabricated references (the famous Mata v. Avianca case where two New York lawyers were sanctioned in 2023 for filing a brief full of cases that did not exist). Both of those are real problems, and both are relatively easy to catch. Missing references show up in proofreading. Fabricated references collapse the moment somebody looks them up.

The harder problem, and the one that does the most actual damage, is the citation that exists but does not support the claim attached to it.

The Stanford RegLab and Stanford HAI researchers who studied legal AI hallucinations called this category "misgrounded citation". Their description is worth quoting: a misgrounded citation is "a citation that exists but does not actually support the claim being made". They argued, correctly, that this is in some ways more dangerous than a citation that does not exist, because the lookup test passes. The reference is real. The standard is real. The clause number is real. It just does not say what the document claims it says.

In the Stanford team's testing of major legal AI tools, this was the most common failure mode. Their concrete example: a Westlaw AI product confidently asserted that a specific paragraph of the Federal Rules of Bankruptcy Procedure stated that deadlines were jurisdictional. No such paragraph exists. The actual law says something close to the opposite.

If that can happen in a tool built specifically for legal research with curated source material, it can happen in a 90-page engineering report drafted by three people across two offices over six weeks.

Why your review process misses this

Stop and ask honestly: what does your current review actually check?

For most engineering, audit and assurance reports, the review covers:

  • Does the report make sense?
  • Is the methodology sound?
  • Are the conclusions supported by the analysis?
  • Are the recommendations reasonable?
  • Is the writing clear?
  • Are the figures and tables labelled?
  • Is the references list present?

What it usually does not cover:

  • Does the cited clause of ISO 9001 actually say what the report claims it says?
  • Does the version of PAS 2080 referenced match the version the client requires?
  • Does the IEC 61508 clause referenced still exist after the latest amendment?
  • Does the ASTM standard cited apply to the material described?

The reason is simple. Checking these things requires opening the standard, finding the clause, reading it, and comparing it to the claim. For a single citation, that takes a few minutes. For a report with thirty citations, that takes most of a day. No senior reviewer has that day.

So the check does not happen. The citation gets a cursory glance. The report goes out.

The cost of getting this wrong

Most of the time, nothing happens. The client reads the report, accepts it, and moves on. The miscited clause sits in the document forever and nobody notices.

Some of the time, somebody does notice. That is when the cost lands.

In an engineering context, a miscited standard can void the assurance basis of a design report. The standard you cited does not actually require what you said it requires. The peer reviewer signed off on the basis of the cited compliance. The compliance does not exist. The professional indemnity question is awkward.

In an audit context, a misgrounded citation in a working paper or report can be picked up in inspection. The PCAOB, FRC, FMA and equivalent regulators all conduct inspections that include checking whether the engagement evidence supports the conclusions reached. A citation that does not support the conclusion is a finding.

In an expert witness context, the consequences are immediate and public. Opposing counsel will check every reference. A citation that does not support the assertion attached to it is the kind of thing that ends careers.

In a climate disclosure context, with ISSB S1 and S2, CSRD, and the various national equivalents now mandatory, a misquoted reference to the underlying standard is the kind of thing that triggers a restatement. Restatements are expensive and embarrassing.

None of these are routine outcomes. But the firm-level question is not whether they are routine. It is whether you would rather not find out.

So can AI help?

Yes and no, and it matters which.

Generic AI assistants (ChatGPT, Claude, Copilot, Gemini, the rest) are exactly the wrong tool for this job. The Stanford RegLab study Dahl et al. 2024 found that general-purpose large language models hallucinate on specific legal queries somewhere between 58% and 88% of the time, depending on the model and the query type. They make up clause numbers. They get versions wrong. They confidently assert that a standard says things it does not say. And they are particularly bad at recognising their own errors.

If you have ever asked ChatGPT to look up a clause in an ISO standard, you may have already seen this. The answer reads beautifully. It is also frequently wrong, and there is no good way to tell which is which without going to the source yourself.

The interesting finding is what happens when you bolt a curated database onto a large language model, which is the standard architecture (RAG, retrieval-augmented generation) used by most commercial AI tools that handle reference work. The Stanford team's follow-up study tested the leading legal AI tools (Lexis+ AI, Westlaw AI-Assisted Research, Ask Practical Law AI) and published the results in the Journal of Empirical Legal Studies in 2025. The headline finding: even purpose-built RAG-based tools hallucinate between 17% and 33% of the time, despite vendor claims of "hallucination-free" or "near zero" hallucination.

That is a meaningful improvement over generic chatbots. It is also nowhere near good enough to act as the verification layer in a high-stakes professional document.

The lesson is that for citation verification, probabilistic systems are the wrong shape of tool. You do not want a model that is usually right. You want a check that either confirms the citation matches the source or flags it for human review.

What actually works

The architecture that works for citation verification has three properties that are missing from generic AI tools.

It is deterministic. The same citation checked twice should return the same answer. No probabilistic drift. No "the model was different today". If the cited clause matches the source, it matches. If it does not, it does not.

It is grounded in a curated source. The check happens against a specific, pinned version of the standard, not against whatever the model picked up during training. PAS 2080:2023 is a different document from PAS 2080:2016. ISO 9001:2015 is a different document from ISO 9001:2008. The version matters, and the system has to know which one is which.

It is transparent. When the system flags a citation, it shows you exactly what the report claims, exactly what the source says, and exactly where the mismatch is. The reviewer makes the call. The system shows the working.

This is not the same architecture as a chatbot. It is closer to a unit test for citations: a structured comparison between two specific things, with a binary outcome and an audit trail.

The work this changes is not the writing. It is the review. Instead of skimming citations and trusting that the team got them right, the senior reviewer gets a checked list of every citation in the document with the matches and mismatches called out. The half-day of looking-up that nobody has time for becomes a 30-second scan of the flagged items.

What this looks like for engineering, audit and climate work

Different sectors, same problem, slightly different shapes.

Engineering. Reports cite ISO 9001, ISO 14001, ISO 14064, AS/NZS standards (4801, 5667, 5263), ASTM standards, IEC 61508 and 61511, IEEE standards, PAS 2080:2023, plus client-specific design standards. Each has versions, amendments, and clause-level granularity. A check needs to compare the claim against the right version of the right standard.

Audit and assurance. Working papers cite ISA standards, IFRS, GAAP equivalents, and increasingly ISQM 1 and ISQM 2 for the firm's own quality management. Misgrounded citations in working papers are the kind of finding that triggers a partner conversation.

Climate and ESG. Disclosures reference ISSB S1 and S2, TCFD, CSRD/ESRS, GHG Protocol, ISO 14064-1, PAS 2060, and a dozen national standards (NZ XRB CS 1, AASB S2, etc). Mandatory reporting regimes add the regulatory layer. A misquoted reference to a disclosure standard is now potentially a restatement event.

In all three cases, the work that gets attention is the writing. The work that gets neglected is the verification. The verification is where the audit trail will be looked at, when somebody decides to look.

What you can do this week

A few practical steps that do not require waiting for any new tool.

Pick five recent reports and actually check the citations. Not all of them, just five. Open the standards. Read the clauses. Compare them to what the report says. You will probably find at least one mismatch. This is the kind of internal exercise that gets people's attention faster than any blog post.

Pin your reference versions. Maintain a list of which version of which standard your firm is currently working against, by client and by project type. Vague references to "PAS 2080" are how teams end up quoting the wrong version six months after a publication update.

Stop using generic AI to look up clauses. Genuinely, just stop. The hallucination rates are too high for high-stakes work. If you must use AI for this, use a tool that can show you the source text, not a chatbot that gives you a confident sentence.

When you next look at AI tools, ask the verification question. When a vendor offers AI for technical document work, ask specifically how it handles citation verification, what database it pulls from, how versions are managed, and what evidence it gives the reviewer. If the answer is vague, the tool is not built for this part of the problem.

A closing observation

The tools currently being built around AI for professional document work mostly focus on writing. That is the easier problem and the more visible one. Reference verification is the harder problem and the more hidden one.

It also happens to be the problem that, when it goes wrong, costs the most. The senior reviewer who signed off the report carries the responsibility for the citations they did not actually check. Most of the time that responsibility never converts into consequence. Some of the time it does, and the firm finds out together.

Qrtr is being built specifically with reference verification as a first-class capability, not an afterthought. If that sounds like something your firm needs, you can register your interest for early access.


Frequently asked questions

What is a misgrounded citation? A misgrounded citation is a reference that exists and is correctly formatted, but does not actually support the claim it is attached to. The standard is real, the clause number is real, but the cited clause does not say what the document claims it says. This is the most common citation failure in professional documents and the hardest to catch in routine review.

Can I use ChatGPT to check citations in a technical report? Not reliably. Stanford research found that general-purpose large language models hallucinate on specific legal and reference queries between 58% and 88% of the time. They confidently invent clause numbers, get versions wrong, and assert things that the source does not say. For high-stakes professional documents, this rate of error is too high to depend on without independent verification.

Why does retrieval-augmented generation (RAG) not solve the problem? RAG improves accuracy compared to generic chatbots, but it does not eliminate hallucination. Stanford's 2024 evaluation of the major legal AI tools found that even purpose-built, RAG-based commercial tools hallucinated between 17% and 33% of the time. For reference verification, where the requirement is that the citation matches the source exactly, even a 17% error rate is too high.

What standards are most likely to be miscited in engineering reports? The standards most often cited at clause-level granularity tend to be the ones most often miscited. In engineering, this includes ISO 9001, ISO 14001, ISO 14064, AS/NZS 4801, IEC 61508, IEEE design standards, ASTM material standards, and PAS 2080. In audit, the ISA series and (increasingly) ISQM 1 and 2. In climate disclosure, the ISSB, TCFD, CSRD and GHG Protocol references.

How do you actually verify a citation against a source? The reliable approach is deterministic comparison: pin the exact version of the source standard, locate the cited clause, extract the relevant text, and compare it against the claim in the document. The check should be transparent (showing both the source and the claim) and should produce a binary outcome (match, partial match, or mismatch) for the reviewer to act on.


References

  1. Dahl, M., Magesh, V., Suzgun, M. and Ho, D. E. (2024). Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. Stanford RegLab and Stanford HAI. Available at: https://arxiv.org/abs/2401.01301
  2. Magesh, V., Surani, F., Dahl, M., Suzgun, M., Manning, C. D. and Ho, D. E. (2025). Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. Journal of Empirical Legal Studies, 22, 216. Originally published as preprint May 2024. Available at: https://arxiv.org/abs/2405.20362
  3. Mata v. Avianca, Inc., 22-cv-1461 (S.D.N.Y. 2023). The "ChatGPT lawyer" sanctions case.
  4. Stanford HAI (2024). AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries. Available at: https://hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries

 

 

Sustainability reports are about to be assured. Here's what changes.

1 min read

Sustainability reports are about to be assured. Here's what changes.

ISSB is now mandatory in over 20 jurisdictions. The new global sustainability assurance standard takes effect in December 2026. The reports that used...

Read More
How Qrtr improves technical report review quality and consistency

1 min read

How Qrtr improves technical report review quality and consistency

Why We Built Qrtr: Improving the Way Technical Reports Are Reviewed Technical reports are central to engineering and professional services. They...

Read More
How engineering consultancies lose hours and money on RFP responses

1 min read

How engineering consultancies lose hours and money on RFP responses

Where bid teams burn time, where compliance failures kill otherwise winnable proposals, and where AI can actually help.

Read More