Proof assistant code is high reliability, there is no room to fudge it. This is perhaps the one place where you can really see how bad LLMs are when you care about reliability.
Proofs, sure, but not definitions. A human needs to be sure that the definitions align with that they expect. Unfortunately, humans generating correct definitions and LLM's generating correct proofs are not independent problems.
Actually, some proofs take longer for the computer to verify than for a human (even an unskilled typist) to type out. Several hours to evaluate a two-page document isn't unusual. (I prefer to write optimised proofs, but it can take me weeks to find the correct abstractions for that.)
Are you talking about Lean 4? Lean 4 is usually pretty fast at verifying proofs and most proofs are created in interactive mode so are checked as they are typed.
A "two-page document" (perhaps 200-300 lines of code) would typically check in a matter of seconds to, at most, a few minutes. If it took hours, it would mean the file contains a pathological case that a developer would be expected to identify and fix. It is absolutely not a normal or "not unusual" occurrence.
1. Load a complex data structure, e.g. from JSON. (I do not consider typing out the JSON to be part of the human's job.)
2. Process the data, using the magic of functional programming. (Performance? What's that? Performance is not a priority.)
3. Add some helpful lemmas.
4. Prove the data processing stage was correct… but proving this in general is haaaard, so just make automation that's capable of proving it for most cases… eventually.
5. Great! It takes 3 minutes to run on my tiny examples. Now run it on the real data.
Why? Coding assistants tend to do even better in contexts where they have tools like type checkers and linters to verify their implementations. This area seems actually uniquely well suited to LLM usage.
When I asked experts on formal proofs a year ago, their intuition was that there isn't enough formal proofs out there for LLMs to be very good at the syntax.
It's, as far as I know, quite hard to teach an LLM things it doesn't know.
I see people on Zulip using Copilot to write Lean proofs, and they have some success, but the quality is really bad right now, creating long, unmaintainable proofs. New users get stuck, thinking they're 90% of the way to the end, but really the whole thing probably should be scrapped and they should start over.
It's a bit frustrating because, before Copilot, new users would come with proofs and you could spend some time helping them write better proofs and they'd learn things and gain skills, but now it's not clear that this is time well spent on my part. Copilot is not going to learn from my feedback.