Proof assistant code is high reliability, there is no room to fudge it. This is ...

UltraSane · 2025-08-03T08:05:16 1754208316

Actually automated theorem provers like Lean are the PERFECT use for LLMs because you can instantly determine if the proof it generated is correct.

markusde · 2025-08-04T00:16:18 1754266578

Proofs, sure, but not definitions. A human needs to be sure that the definitions align with that they expect. Unfortunately, humans generating correct definitions and LLM's generating correct proofs are not independent problems.

wizzwizz4 · 2025-08-03T14:32:12 1754231532

Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)

UltraSane · 2025-08-03T15:27:06 1754234826

Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.

A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.

wizzwizz4 · 2025-08-04T10:04:53 1754301893

Steps to reproduce:

1. Load a complex data structure, e.g. from JSON. (I do not consider typing out the JSON to be part of the human's job.)

2. Process the data, using the magic of functional programming. (Performance? What's that? Performance is not a priority.)

3. Add some helpful lemmas.

4. Prove the data processing stage was correct… but proving this in general is haaaard, so just make automation that's capable of proving it for most cases… eventually.

5. Great! It takes 3 minutes to run on my tiny examples. Now run it on the real data.

And that's where hours come from!

UltraSane · 2025-08-05T01:49:30 1754358570

This is NOT what Lean 4 is designed to do. You seem to be using Lean 4 for what TLA+ is much better suited for.

wizzwizz4 · 2025-08-06T22:33:47 1754519627

If it is not suitable for programming, then it should not be branded as a programming language. Heck, its homepage says:

> Lean's metaprogramming capabilities enable users to extend the language with domain-specific notations and new proof automation techniques.

griffzhowl · 2025-08-03T11:35:32 1754220932

The problem is if they don't produce any correct ones then that doesn't matter

adastra22 · 2025-08-03T06:36:36 1754202996

Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.

rocqua · 2025-08-03T06:54:08 1754204048

When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.

It's, as far as I know, quite hard to teach an LLM things it doesn't know.

UltraSane · 2025-08-03T08:06:02 1754208362

He is right and it doesn't matter because you can instantly tell if the proof the LLM generates is true or not.

kmill · 2025-08-03T17:15:26 1754241326

I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.

It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.