Physically describable doesn't mean computable. You're making too many unjustified logical leaps which makes your argument circular & conflates "physical" w/ "computable".
We don't know any physical processes that allow to compute Turing-incomputable functions. An assumption that the brain uses such a process is not based on any positive knowledge.
Argument from ignorance is not as well known as other fallacies but very common in discussions about sentience, consciousness, and computability, i.e. not having evidence for something doesn't mean that thing is false. It is possible there are physical processes that are not computable & not being aware of such processes doesn't mean the alternative (everything is computable) is true.
So instead of making any unjustifiable claims like "everything physical is computable" you should instead just say "I believe consciousness is computable and that is why it is possible to instantiate it on any computational substrate, including strategy games like Age of Empires, properly arranged dominoes, and water wheels".
OK. I know that we haven't found any processes that violate the physical Church-Turing thesis, and I believe that we will not find them in the brain that got intelligent enough only after scaling to a hundred billion of neurons and hundreds of trillions of synapses. And, BTW, we don't have theories (except the controversial Orch OR) that allow such computations.
Not quite an answer to your question, but you might find this interesting. The Olmo Hybrid paper has some results on relative complexity of problems that can be solved by transformers and RNNs. They don't look at size, just solvability, and find that the sets of problems solvable by the two architectures are incomparable. They actually use these results to inform their architecture design, which includes both attention and state space layers. (Specifically, they choose gated delta-nets with negative eigenvalues, which they show have greater expressivity than those without.)
Proposition 16. UHATs have polynomially bounded expansion over LTL. In particular, given an
LTL formula φ, one can construct in polynomial time a UHAT T such that L(T) = L(φ).
i.e. the blowup is only exponential in one direction.
That says every LTL formula can be compiled into UHAT w/ polynomial overhead. It doesn't say that all languages recognizable w/ UHATs necessarily do not have succinct recgonizers in LTLs or RNNs.
Edit: Actually nevermind. If UHAT could be compiled into LTL w/ polynomial overhead then that would also work for the languages that have exponential overhead in LTL but since they don't there is a strict separation.
> "Moreover, it must be confessed that perception and that which depends upon it are inexplicable on mechanical grounds, that is to say, by means of figures and motions. And supposing there were a machine, so constructed as to think, feel, and have perception, it might be conceived as increased in size, while keeping the same proportions, so that one might go into it as into a mill. That being so, we should, on examining its interior, find only parts which work one upon another, and never anything by which to explain a perception."
- Monadology, Section 17
Conscious self-awareness is neither scale invariant nor independent of substrate. Computational theories will never account for it b/c computational abstractions are both scale invariant & substrate independent.
> However, our system relies on frontier LLMs, whose compute costs still remain nontrivial. Moreover, each textbook was formalized in isolation, without the careful planning needed to make it maximally compatible with existing mathlib infrastructure. Human involvement currently remains necessary to handle such organisational concerns — selecting and ordering books in a dependency-aware manner, bridging mismatched conventions across sources, etc
> if f and g are algebraically equivalent programs then FPSan(f) and FPSan(g) produce identical results when given identical inputs
Ok, but we want the other direction. If FPSan(f) & FPSan(g) produce identical results for identical inputs then we want to conclude that f & g are also equivalent. If g is an "optimized" version of f then this would allow checking equivalence but that's not what they are proving or maybe they are but it looks like the converse is contingent on an unproven conjecture.
Right. Put differently, we have that FPSan() is a well-defined function, so [ f = g ] => [ FPSan(f) = FPSan(g)], but we need to show that FPSan() is injective, i.e., [ FPSan(f) = FPSan(g) ] => [ f = g ]. I confess I haven't looked very closely but it should not be so hard. We can prove injectibility in the alternative by analyzing ker(FPSan()), the set of all inputs in the domain of functions mapping to the identity element in the co-domain. If the kernel is trivial and only consists of the identity map, the injectibility is established, but I am not immediately seeing the proof. Fun!
It seems to me that one could maliciously construct a failure. For example:
phi(1.0) * 2^32 = 0
So:
1.0 + … + 1.0 (2^32 terms added)
Will turn into zero in the embedding. (I bet other, dramatically smaller zeros could be found by other operations. phi^-1(2^16) could be a good starting point, but you don’t necessarily need a shorter one — see below.)
Now you find a floating point expression tree that has only 1.0, 0, and -1.0 at the leaves and generates this spurious zero. (For example, 1.0 + 1.0, squared five times.)
Now you maliciously transform a program by adding one of these spurious zero expressions somewhere. Am I missing something?
More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely, but I feel like it would be mildly surprising if it does much in the setting of trying to prove something without any probability of error. And it seems a bit unfortunate that no choice of the scrambling constants has any effect on the expressions that start with 1.0 and use only multiplication and addition to get to zero.
Also, how does floating point infinity fit in? It seems like it doesn’t act very infinite in the integer embedding.
(I could be totally wrong here. I only read the definitions twice, and I didn’t try to write anything down.)
Yes, this is correct, and indeed injectivity is impossible from a counting argument: for example, there are only (2^32)^(2^32) unary functions in this finite ring, but infinitely many polynomials with integer coefficients, so we cannot hope to distinguish them all.
> More generally, what is the multiply-xorshift-multiply sequence accomplishing? I feel like it might make non-malicious collisions unlikely
Exactly this. One issue with the bitcast embedding (other than not mapping {-1.0, 0.0, 1.0} to {-1, 0, 1}) is that 'round numbers' such as 42.0 have floating-point representations (such as 0x42280000) that end in very many trailing zero bits. This is particularly problematic because multiplying two such numbers gives 0.
In general, FPSan isn't designed for an adversarial regime. There are ways that it could be hardened in this direction: for example, instead of using the ring of integers modulo 2^32, choose a prime-power modulus using a large prime derived from a cryptographic hash of the source code of the two programs that you're trying to compare. But I'm not confident that even that guarantees adversarial robustness: for instance, there may be some clever way to efficiently implement a nonzero polynomial that vanishes at every small integer (say, every integer with absolute value < 10^100) and that would break this hardened variant.
The motivation behind FPSan is to protect against accidental implementation bugs: for example, someone writes a complicated fused kernel that improves the runtime of model inference, but makes an honest mistake whilst doing so. Bitwise floating-point matching is too strict; approximate matching is too messy (what should the error tolerances be?); symbolic methods are too unscalable.
> But I'm not confident that even that guarantees adversarial robustness: for instance, there may be some clever way to efficiently implement a nonzero polynomial that vanishes at every small integer (say, every integer with absolute value < 10^100) and that would break this hardened variant.
That’s a lot of zeros.
I wonder if there’s a construction with the property that, for all pairs of circuits with size below some threshold (with size appropriately defined), then, if the circuits are not algebraically equal, then, under randomization of whatever parameters are randomized, their embedded versions are, with high probability, not equal and differ in at least a constant fraction of inputs.
At the very least, I imagine that most ML kernels have polynomial-ish order far below 100 in a sense where e^x is defined to have low polynomial-ish order. sin and cos might be worse due to actually having infinite zeros, but maybe pi isn’t constructible and no one can find those zeros.
I don’t know why they are on there, but YC startups list their batch and year in parentheses on job posts, e.g. (W25). Example: https://www.workatastartup.com/jobs/88812
The Plaid listing you linked doesn’t have a batch by their name.
Everyone who said they would leave w/ him apparently trust him & that was enough people at the company to make the board members who voted to oust him look like vindictive amateurs.
Trust doesn't mean he's a good guy, trust means they were willing to go along w/ whatever decision he was going to make. Meaning, they trusted him to take what they considered was the right course of action for them & whatever current or future company he decided to join.
It's not one-shot. Other people had attempted the same problem w/ the same AI & failed. You're confused about terms so you redefine them to make your version of the fairy tale real.
We already know that same problem has been examined by many credible mathematicians already and couldn't be solved by any of them yet.
Why are we expecting AGI to one shot it? Can't we have an AGI that can fails occasionally to solve some math problem? Is the expectation of AGI to be all knowing?
By the way I agree that AGI is not around the corner or I am not arguing any of the llm s are "thinking machines". It's just I agree goal post or posts needs to be set well.
People want to believe in magic so they will find excuses to do so. Computers have been proving theorems for a long time now but Isabelle/HOL didn't have the marketing budget of OpenAI so people didn't care. Now that Sam Altman is doing the marketing people all of a sudden care about proving theorems.
Isabelle/HOL (a specialized software to do math proofs) doing proofs is not the analogue to LLMs (with the common accepted degeratory description: automated plagiarism machine) being capable of doing proofs. It's not the marketing, it's what the intention and the capability matrix is coming up to. I would be excited the same when Isabelle/HOL writes poetry.
Chemistry is magic to uninitiated. Perhaps LLMs are to you because you are not initiated yet? I never said LLMs are AGI or will ever be AGI. I also never suggested LLMs are perfect and can prove math problems. But having incidents suggesting there are instances that does excites me. Because it was never in my expectation levels.
You were misrepresenting what actually happened b/c you want to believe in magic. I'm not calling it magic, I'm saying your interpretation of events is magical b/c you don't actually understand how computers work. There is nothing magical about theorem proving, Isabelle/HOL has been doing it for decades.
Isabelle/HOL haven't been solving open problems, as far as I'm aware. They've been used for making fully-formal proofs of problems that were already considered proved to a satisfactory level by the mathematical community. I believe mathematicians generally consider proving something to the mathematical community the "hard part", while making it fully formal is just a kind of tedious bookkeeping thing.
reply