Hacker Newsnew | past | comments | ask | show | jobs | submit | swalsh's commentslogin

My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.

From the recent New Yorker piece on Sam:

“My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”


Amusing! Even if they believe that, they should know the company communicated the opposite earlier.

No chance an openAI spokesperson doesnt know what existential safety is

I did not read the response as...

>Please provide the definition of Existential Safety.

I read:

>Are you mentally stable? Our product would never hurt humanity--how could any language model?


The absolute gall of this guy to laugh off a question about x-risks. Meanwhile, also Sam Altman, in 2015: "Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity. There are other threats that I think are more certain to happen (for example, an engineered virus with a long incubation period and a high mortality rate) but are unlikely to destroy every human in the universe in the way that SMI could. Also, most of these other big threats are already widely feared." [1]

[1] https://blog.samaltman.com/machine-intelligence-part-1


Why are these people always like this.

Likely an improvement on:

> We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.

<https://arxiv.org/abs/2502.05171>


Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.

AI 2027 is not a real thing which happened. At best, it is informed speculation.

Funny if you open their website and go to April 2026 you literally see this: 26b revenue (Anthropic beat 30b) + pro human hacking (mythos?).

I don’t think predictions, but they did a great call until now.


I agree that they called many things remarkably well! That doesn't change the fact that AI 2027 is not a thing which happened, so it isn't valid to point out "this killed us in AI 2027." There are many reasons to want to preserve CoT monitorability. Instead of AI 2027, I'd point to https://arxiv.org/html/2507.11473.

That's sounds really interesting. Do you have some hints where to read more?

Oh, of course they will /s

I had no idea one could buy a Blackhawk for $1.5M

It's the fuel cost that gets you...

Someone's not watching HeavyDSparks ;)

https://www.youtube.com/watch?v=m3P3FWkBFU4


There are Chinooks with no bids as of yet....as well as a Bombardier Challenger

I gave the same prompt (a small rust project that's not easy, but not overly sophisticated) to both Gemma-4 26b and Qwen 3.5 27b via OpenCode. Qwen 3.5 ran for a bit over an hour before I killed it, Gemma 4 ran for about 20 minutes before it gave up. Lots of failed tool calls.

I asked codex to write a summary about both code bases.

"Dev 1" Qwen 3.5

"Dev 2" Gemma 4

Dev 1 is the stronger engineer overall. They showed better architectural judgment, stronger completeness, and better maintainability instincts. The weakness is execution rigor: they built more, but didn’t verify enough, so important parts don’t actually hold up cleanly.

Dev 2 looks more like an early-stage prototyper. The strength is speed to a rough first pass, but the implementation is much less complete, less polished, and less dependable. The main weakness is lack of finish and technical rigor.

If I were choosing between them as developers, I’d take Dev 1 without much hesitation.

Looking at the code myself, i'd agree with codex.


There are issues with the chat template right now[0], so tool calling does not work reliably[1].

Every time people try to rush to judge open models on launch day... it never goes well. There are ~always bugs on launch day.

[0]: https://github.com/ggml-org/llama.cpp/pull/21326

[1]: https://github.com/ggml-org/llama.cpp/issues/21316


What causes these? Given how simple the LLM interface is (just completion), why don't teams make a simple, standardized template available with their model release so the inference engine can just read it and work properly? Can someone explain the difficulty with that?

The model does have the format specified but there is no _one_ standard. For this model it’s defined in the [ tokenizer_config.json [0]. As for llama.cpp they seem to be using a more type safe approach to reading the arguments.

[0] https://huggingface.co/google/gemma-4-31B-it/blob/main/token...


Hm, but surely there will be converters for such simple formats? I'm confused as to how there can be calling bugs when the model already includes the template.

was just merged

It was just an example of a bug, not that it was the only bug. I’ve personally reported at least one other for Gemma 4 on llama.cpp already.

In a few days, I imagine that Gemma 4 support should be in better shape.


Qwen 3.5 27B is dense, so (I think) should be compared to Gemma 4 31B.

Or Gemma-4 26B(-A4B) should be compared to Qwen 3.5 35B(-A3B)


Exactly, compare MoE with MoE and dense with dense otherwise it's apples and oranges.

Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.

If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B.

Just like smaller size models are speed / cost optimization, so is MoE.

G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models.

G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).


The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.

The models are not technically comparable: the Qwen is dense, the Gemma is MoE. The ~33B models are the other way around!

Try using Grok 4.1 reasoning. It's crazy cheap, and really it's not that bad.

Sure, it might try to subtly steer you towards fascism, but other than that, it's great.

Neurons that fire together, wire together. Your brain optimizes for your environment over time. As we get older, our brains are running in a more optimized way than when we're younger. That's why older hunters are more effective than younger hunters. They're finely tuned for their environment. It's an evolutionary advantage. But it also means that they're not firing in "novel" ways as much as the "kids". "kids" are more creative I think because their brains are still adopting, exploring novelty, neuron connections aren't as deeply tied together yet.

This is also maybe one of the biggest pitfalls as our society get's "older" with more old people, and less "kids". We need kids to force us to do things differently.


Oh i've been looking for a project for my 11 year old... he's a very project oriented learner, which schools don't seem to do anymore.


What country are you in?


Speak for yourself, I have never thrown away code at this rate in my entire career. I couldn't keep up this pace without AI codegen.


Did you read the article? I don’t think that refutes anything the author said even a little bit.


XD


I bet claude was hyping this guy up as he was building it. "Absolutely, a rust compiler written in PHP is a great idea!"


Every compiler in any language for any language has at the very least educational value.

On the other hand, demeaning comments without any traces of constructive criticism don't have any value.


Does it matter who the sycophant was or just that there was a sycophant?

My partner does that as well as LLMs at this point; "Sure honey, I remember you've talked a lot about Rust and about Clojure in the past, and you seem excited about this Clojure-To-Rust transpiler you're building, it sounds like a great idea!", is that bad too?


There is no comment on whether LLMs/agents have been used. I feel like projects should explicitly say if they were _or_ were not used. There is no license file, and no copyright header either. This feels like "fauxpen-source": imagine getting LEX+YACC to generate a parser, and presenting the generated C code as "open-source".

This is just another way to throw binaries over the wire, but much worse. This has the _worst_ qualities of the GPL _and_ pseudo-free-software-licenses (i.e. the EULAs used by mongo and others). It has all the deceptive qualities of the latter (e.g. we are open but not really -- similar to Sun Microsystems [love this company btw, in spite of its blunders], trying to convince people that NeWS is "free" but that the cost of media [the CD-ROM] is $900), with the viral qualities of the former (e.g. the fruit of the poison tree problem -- if you use this in your code, then not only can you not copyright the code, but you might actually be liable for infringement of copyright and/or patents).

I would appreciate it if the contributor, mrconter11, would treat HN as an internet space filled with intelligent thinking people, and not a bunch of shallow and mindless rubes. (Please (1) explicitly disclose both the use and absence of use of LLMs -- people are more likely to use your software this way, and preserves the integrity of the open source ecosystem, and (2) share you prompts and session).

So passes the glory of open source.


According to his Readme he seems to have built a 3D engine completely from scratch 8 years ago without using any library:

https://github.com/mrconter1/IntuitiveEngine

> A simple 3D engine made only with 2D drawLine functions.


That is (slightly) reassuring (but the rest of his portfolio does not inspire confidence). Nevertheless, we should be required to disclose whether the code has been (legally) tainted or not. This will help people make informed decisions, and will also help people replace the code if legal consequences appear on the horizon, or if they are ready to move from prototype to production.


Slightly?


I believed that too until I watched the Karen Read Trials. The judge had a bias, and it was clear karen got justice despite the judge trying to put her finger on the scale.


Yeah that sounds great until it's running as an autonomous moltbot in a distributed network semi-offline with access to your entire digital life, and China sneaks in some hidden training so these agents turn into an army of sleeper agents.


Lol wat? I mean you certainly have enough control self hosting the model to not let it join some moltbot network... or what exactly are you saying would happen?


We just saw last week people are setting up moltbots with virtually no knowledge of what it has and doesn't have access. The scenario that i'm afraid of is China realizes the potential of this. They can add training to the models commonly used for assistants. They act normal, are helpful, everything you'd want a bot to do. But maybe once in a while it checks moltbook or some other endpoint China controls for a trigger word. When it sees that, it kicks into a completely different mode, maybe it writes a script to DDoS targets of interest, maybe it mines your email for useful information, maybe the user has credentials to some piece that is a critical component of an important supply chain. This is not a wild scenario, no new sci-fi technology would need to be invented. Everything to do it is available today, people are configuring it, and using it like this today. The part that I fear is if it is running locally, you can't just shut off API access and kill the threat. It's running on it's own server, it's own model. You have to cut off each node.

Big fan of AI, I use local models A LOT. I do think we have to take threats like this seriously. I don't Think it's a wild scifi idea. Since WW2, civilians have been as much of an equal opportunity target as a soldier, war is about logistics, and civilians supply the military.


Fair point but I would be more worried about the US government doing this kind of thing to act against US citizens than the Chinese government doing it.

I think we're in a brief period of relative freedom where deep engineering topics can be discussed with AI agents even though they have potential uses in weapons systems. Imagine asking chat gpt how to build a fertilizer bomb, but apply the same censorship to anything related to computer vision, lasers, drone coordination, etc.


exactly, we all need to use CIA/NSA approved models to stay safe.

very smart idea!


sleeper agents to do what? let's see how far you can take the absurd threat porn fantasy. I hope it was hyperbole.


There was research last year [0] finding significant security issues with the Chinese-made Unitree robots, apparently being pre-configured to make it easy to exfiltrate data via wi-fi or BLE. I know it's not the same situation, but at this stage, I wouldn't blame anyone for "absurd threat porn fantasy" - the threats are real, and present-day agentic AI is getting really good at autonomously exploiting vulnerabilities, whether it's an external attacker using it, or whether "the call is coming from inside the house".

[0] https://spectrum.ieee.org/unitree-robot-exploit


I could say that about Cisco and I would not be wrong.


isn't it a bit of a leap to assume it was intended as an exploitable vulnerability?


I replied to the comment who doubted me in a more polite manner.


What if the US government does instead?

I don't consider them more trustworthy at this point.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: