Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Proof assistant code is high reliability, there is no room to fudge it. This is perhaps the one place where you can really see how bad LLMs are when you care about reliability.


Actually automated theorem provers like Lean are the PERFECT use for LLMs because you can instantly determine if the proof it generated is correct.


Proofs, sure, but not definitions. A human needs to be sure that the definitions align with that they expect. Unfortunately, humans generating correct definitions and LLM's generating correct proofs are not independent problems.


Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)


Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.

A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.


Steps to reproduce:

1. Load a complex data structure, e.g. from JSON. (I do not consider typing out the JSON to be part of the human's job.)

2. Process the data, using the magic of functional programming. (Performance? What's that? Performance is not a priority.)

3. Add some helpful lemmas.

4. Prove the data processing stage was correct… but proving this in general is haaaard, so just make automation that's capable of proving it for most cases… eventually.

5. Great! It takes 3 minutes to run on my tiny examples. Now run it on the real data.

And that's where hours come from!


This is NOT what Lean 4 is designed to do. You seem to be using Lean 4 for what TLA+ is much better suited for.


If it is not suitable for programming, then it should not be branded as a programming language. Heck, its homepage says:

> Lean's metaprogramming capabilities enable users to extend the language with domain-specific notations and new proof automation techniques.


The problem is if they don't produce any correct ones then that doesn't matter


Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.


When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.

It's, as far as I know, quite hard to teach an LLM things it doesn't know.


He is right and it doesn't matter because you can instantly tell if the proof the LLM generates is true or not.


I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.

It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: