lean4-mermaidEnglish ↔ Lean4 · clarity heatmap

A Lean 4 microscope for English claims.

For drafters — lawyers, policy writers, product specifiers — who already know that one missing definition costs more than the whole drafting fee. You write a Lean 4 snippet — or open a worked lesson — and this app renders the structure as a Mermaid diagram with a clarity-heatmap [colour-coded by which declarations carry a clarity flag]. Wherever the Lean cannot close the proof, the diagram pinpoints which English assumption was left hanging.

English“all swans are white.”
Lean 4
-- "all swans are white"
opaque ObservedSwans : Type
axiom every_swan_white : ∀ s : ObservedSwans, IsWhite s
-- ↑ which population? naked axiom — clarity flag.
DiagramObservedSwansevery_swan_wh…IsWhite s

The opaque type and the naked axiom are the missing parameter and missing axiom — the diagram colours them amber and red.

The benefit is compounding precision.

Every contract, policy, spec, and prompt is a prompt a chain of agents will read — a junior associate, a court, a counterparty, an LLM, a future you. Each reading is a reasoning step. Each reasoning step has a small probability of going wrong. The probabilities multiply.

Probability the chain of n reasoning steps all succeed, as a function of the per-step success rate p.
Per-step p5 steps10 steps20 steps50 steps
0.90 unaided English0.590.350.120.005
0.95 careful English0.770.600.360.08
0.99 Lean-checked draft0.950.900.820.61

Where the chain is long enough to matter

  • Contracts that AI agents execute. A clause read by a payment gateway, a procurement bot, or a compliance check every time it fires — chain length grows with traffic.
  • Regulations interpreted repeatedly. Each audit, each new product, each cross-border transaction is another reading of the same source.
  • AI-coded specifications. A prompt an LLM expands into code is read by a much longer chain than a prompt a human writes once. 0.9^50 ≈ 0.005. Vague specs do not survive this length.

What you do, what the machine does

What only humans can do

  • Choose the policy.
  • Pick the carve-outs.
  • Decide what is fair.
  • Name the test that matters.

What only machines should do

  • Exhaustive case analysis.
  • Mechanical closure of definitions.
  • Type-checking every reference.
  • Refusing to ship a hole.

The tool removes the part of drafting that is mechanical-but-tedious and leaves the part that is judgement-but-essential. The machine does not write your policy — it just refuses to let you ship one with a hole.

The three gaps every English claim hides

Sentences like “all swans are white”, “data must be erased on request”, or “all employees do not like the policy” read as if they were precise. Translating them to Lean 4 forces three kinds of gap into the open:

  1. Missing parameter [a value the claim relies on but never names] — e.g. which population of swans, which time window, which jurisdiction. The Lean translation either invents an opaque placeholder or leaves an underscore.
  2. Missing axiom [a normative bridge the reader is silently expected to supply] — e.g. “if X harms users, X is forbidden”. The Lean translation has to declare a naked axiom; the diagram flags it.
  3. Scope ambiguity [two non-equivalent readings of the same surface form] — e.g. “not all” vs “all not”. The Lean translation must pick one; the picker is forced to be explicit.
English“all swans are white.”
Beforeflag: opaque · axiom
opaque ObservedSwans : Type
axiom every_swan_white : ∀ s : ObservedSwans, IsWhite s
Afterdischarged
-- pin the population and the observation frame
def UK_Mute_Swans_2024 : Set Swan := { s | s.species = .Cygnus_olor ∧ s.region = .UK ∧ s.year = 2024 }
theorem mute_swans_2024_white : ∀ s ∈ UK_Mute_Swans_2024, IsWhite s := by
  intro s hs; exact bto_field_data hs

Resolution: Replace the opaque type with a definite set and cite the BTO species account; the axiom becomes a theorem with a citation-grounded proof.

Each gap leaves a fingerprint in the Lean source: a sorry, an opaque, an unjustified axiom, or a stray _. The app calls these clarity-flags [marker tokens that mean “here be a gap”] and turns them into the diagram’s heatmap.

From clarity flag to discharge — four grounding kinds

Once a flag is exposed, the drafter has to ground it. Every defined term eventually faces the regressus-ad-terminos [if A is defined using B, and B using C, the chain has to stop somewhere]. The definitions registry (0 terms) makes the four classical stops machine-checkable:

Primitive0

Accepted without further definition. The declared ground.

e.g. natural-person, tamper-evident.

Cited0

Definition delegated to a normative external source — a statute, ISO standard, or RFC.

e.g. personal-data → UK GDPR Art. 4(1).

Defined0

Composed from other registry terms; must reach a Primitive in ≤ 5 hops, no cycles.

e.g. data-subject depends on natural-person + personal-data.

Scoped0

A per-document override — the document explicitly declares what is in and out of scope.

e.g. uk-region = the four constituent countries; Crown Dependencies and Overseas Territories explicitly excluded.

Try it — one click into the editor

Each card below is a paste-ready Lean snippet. Pick one and the free editor will open with it loaded; the same src/lib/extractor/ir.ts parses it into the diagram instantly.

The swan claim

Two clarity flags in three lines — an opaque population and a naked universal axiom.

-- "all swans are white"
opaque ObservedSwans : Type
axiom every_swan_white : ∀ s : ObservedSwans, IsWhite s
theorem all_swans_white : ∀ s : ObservedSwans, IsWhite s := every_swan_white

GDPR Article 17 erasure

A defined duty with an underscore window — the cited authority discharges it, but the parameter is still flagged.

-- erasure duty under UK GDPR Article 17
opaque ErasureRequest : Type
opaque Erased : ErasureRequest → Prop
def erasureDuty (req : ErasureRequest) : Prop :=
  Erased req
axiom UKGDPR_Art17 : ∀ r : ErasureRequest, erasureDuty r
theorem art17_holds : ∀ r, erasureDuty r := UKGDPR_Art17

Not-all vs all-not

Two definitions, one English sentence — the picker is forced to commit, so the diagram shows both readings as separate nodes.

-- "all employees do not like the policy"
opaque Employee : Type
opaque Policy : Type
opaque Likes : Employee → Policy → Prop
opaque current_policy : Policy
def reading_notAll : Prop := ¬ ∀ e : Employee, Likes e current_policy
def reading_allNot : Prop := ∀ e : Employee, ¬ Likes e current_policy

FAQ

The questions a drafter asks before adopting a tool.

What is the actual benefit, in one sentence?

Compounding precision. A small improvement in clarity per reasoning step compounds exponentially across the chain of re-interpretations every business document goes through. Move from 90% correct per step to 99% correct per step and the ten-step chain goes from a coin-flip to almost-certain.

How is this different from just writing carefully?

Writing carefully is necessary and not sufficient. Careful prose still leaves implicit definitions, unenumerated exceptions, and unbounded scope. The Lean 4 translation forces every implicit into a slot you must fill — the holes appear as red marks in the diagram, and you cannot dismiss them by re-reading. The machine is your second pair of eyes, every time.

Where does this matter most?
  1. Contracts that AI agents will execute. When a clause gets read by a reasoning agent every time it is invoked, the chain length is large. Per-step p matters exponentially.
  2. Regulations that compliance teams interpret repeatedly. Each audit, each new product, each cross-border transaction re-interprets the same source.
  3. AI-coded specifications. Prompts that an LLM expands into code are read by much longer chains than prompts a human writes once. Vague specs do not survive this length.
What does a drafter actually do with the tool?
  1. Read a lesson. Open a sentence you already write. See, in thirty seconds, the holes the Lean translation surfaces.
  2. Re-draft a clause. Paste your own clause into the free editor and rewrite until the clarity-heatmap goes green. The machine will not let you ship a hole.
  3. Build a registry. Pin every recurring defined term to a single source of truth and reuse it across documents.
What this does not replace

The tool does not write your policy. It does not pick your carve-outs. It does not decide what is fair. Those decisions are yours, and the tool is built so that only those decisions remain. Everything else — naming the time, naming the parties, naming the exceptions, checking the closure — the machine handles.

Why Lean 4 specifically? Why not just structured English?
  1. It is a programming language with a proof checker built in. Holes do not pass silently. If you wave a hand the machine flags it as sorry or axiom and the heatmap goes red.
  2. It has a definitions registry first-class. Every defined term is a typed reference, not a string match. A registry hit is either right or a type error — it cannot be a near-miss.
  3. It composes. A clause defined here can be re-used verbatim in a contract, in a regulation summary, in an LLM prompt. Three artefacts, one source of truth.
Do I have to read or write Lean to use it?

For the curriculum, no. The lessons are paired English sentences with the Lean shown beside them, like a Rosetta stone — you read the English, the machine reads the Lean. For the free editor, a little. The fifteen lessons teach the twenty-or-so Lean idioms you need to express most clauses.

Is this safe to share with internal documents?

Yes. The tool runs entirely in your browser. No source ever leaves your machine without an explicit click — there is no telemetry, no preview-rendering server, no analytics that include the source. The permalink encoder is client-side; it produces a hash-only URL that your browser can decode. This is part of the contract — Contracts.Invariants formalises it as permalink_clientside.

How do you know the lessons are calibrated?

Because we check. A small script verifies, for every lesson, that score(before) > score(after) — the unclear version really is flagged as worse than the clarified one. CI runs the check on every push. Currently 15 of 15 lessons satisfy the calibration claim, with a corpus mean of 22 points before and 4 points after. The Lean side records this as Contracts.Calibration.score_monotone_on_corpus.

Where to go from here