Google testing ChatGPT-like chatbot 'Apprentice Bard' with employees

22SAS · on Feb 4, 2023

So, a LLM, trained extensively on StackOverflow and other data (possibly the plethora of LC solutions out there), is fed a bunch of LC questions and spits out the correct solutions? In other news, water is blue.

It is one thing to train an AI on megatons of data, for questions which have solutions. The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them; then I will worry.

Till then, these headlines are advertising for Open AI, for people who don't understand software or systems, or are trash engineers. The rest of us aren't going to care that much.

tbalsam · on Feb 4, 2023

If it helps, this likely is coming. I think we have a tendency to mentally move the goalposts when it comes to this kind of thing as a self-defense mechanism. Years ago this would have been a similar level of impossibility.

Since all a codebase like that is is a kind of directed graph, then augmentations to the processing of the network to allow for the simultaneous parsing of and generation of this kind of code may not be as far off as you thinking.

I say this as an ML researcher of coming up and around the bend towards 6 years of experience in the heavily technical side of the field. Strong negative skepticism is an easy way to bring confidence and the appearance of knowledge, but it also can have the downfall of what has happened in certain past technological revolutions -- and the threat is very much real here (in contrast to the group that believes you can get AGI from simply scaling LLMs, I think that is very silly indeed).

Thank you for your comment, I really appreciate it and the discussion it generated and appreciate you posting it. Replying to it was fun, thank you.

tedivm · on Feb 4, 2023

I've worked in ML for awhile (on the MLOps side of things) and have been in the industry for a bit, and one thing that I think is extremely common is for ML researchers to grossly underestimate the amount of work needed to make improvements. We've been a year away from full self driving cars for the last six years, and it seems like people are getting more cautious in their timing around that instead of getting more optimistic. Robotic manufacturing- driven by AI- was supposedly going to supplant human labor and speed up manufacturing in all segments from product creation to warehousing, but Amazon warehouses are still full of people and not robots.

What I've seen again and again from people in the field is a gross underestimation of the long tail on these problems. They see the rapid results on the easier end and think it will translate to continued process, but the reality is that every order of magnitude improvement takes the same amount of effort or more.

On top of that there is a massive amount of subsidies that go into training these models. Companies are throwing millions of dollars into training individual models. The cost here seems to be going up, not down, as these improvements are made.

I also think, to be honest, that machine learning researchers tend to simplify problems more than is reasonable. This conversation started with "highly scalable system from scratch, or an ultra-low latency trading system that beats the competition" and turned into "the parsing of and generation of this kind of code"- which is in many ways a much simpler problem than what op proposed. I've seen this in radiology, robotics, and self driving as well.

Kind of a tangent, but one of the things I do love about the ML industry is the companies who recognize what I mentioned above and work around it. The companies that are going to do the best, in my extremely bias opinion, are the ones that use AI to augment experts rather than try to replace them. A lot of the coding AI companies are doing this, there are AI driving companies that focus on safety features rather than driver replacement, and a company I used to work for (Rad AI) took that philosophy to Radiology. Keeping experts in the loop means that the long tail isn't as important and you can stop before perfection, while replacing experts altogether is going to have a much higher bar and cost.

passwordoops · on Feb 4, 2023

>We've been a year away from full self driving cars for the last six years

Try at least 12 [0]

(I would say 15 but my 45-second search didn't yield anything that far back)

[0] https://spectrum.ieee.org/how-google-self-driving-car-works

jodrellblank · on Feb 4, 2023

https://www.youtube.com/watch?v=I39sxwYKlEE was a self-driving van in 1986 which could detect obstacles by 1999 and drive in a convoy using that.

This is a bit like seeing Steve Mann's wearable computers over the years ( https://cdn.betakit.com/wp-content/uploads/2013/08/Wearcompe... ) and then today anyone with a smartphone and smart watch has more computing power and more features than most of his gear ever had, apart from the head mounted screen. More processing power, more memory, more storage, more face recognition, more motion sensing, more GPS, longer runtime on battery, more bandwidth and connectivity to e.g. mapping, more assistants like Google Now and Siri.

And we still aren't at a level where you can be doing a physical task like replacing a laptop screen and have your device record what you're doing, with voice prompts for when you complete different stages, have it add markers to the recording, track objects in the scene like and solve for questions like 'where did that longer screw go?' or 'where did this part come from?' and have it jump to the video where you took that part out. Nor reflow the video backwards as an aide memoire to reassembling it. Or do that outside for something like garage or car work, or have it control and direct lighting on some kind of robot arm to help you see, or have it listen to the sound of your bike gears rattle as you tune them and tell you or show you on a graph when it identifies the least rattle.

Anything a human assistant could easily do, we're still at the level of 'set a reminder' or 'add to calendar' rather than 'help me through this unfamiliar task'.

mech422 · on Feb 4, 2023

Wow - Steve Mann - haven't checked what he's doing in ages - real blast from the past :-) I was really disappointed the AR/VR company he was with went under - I had really high hopes for it.

RE: changing you laptop screen. My buddy wants an 'AR for Electronics' that can zoom in on components like a magnifying glass (he wants head mounted), identify components by marking/color/etc and call up schematics on demand. So far, nothing seems to be able to do that basic level of work.

metadat · on Feb 5, 2023

Did you notice how far rotated the steering wheel is off of center while the Mercedes van vehicle is going straight on the highway? That looks crazy.

bluesnowmonkey · on Feb 4, 2023

Based on my experience watching How It’s Made, many factories are extremely automated including lots of robots. Warehouses are not factories though.

tedivm · on Feb 4, 2023

It really depends on what you're talking about. Individual components can often be automated fairly successfully, but the actual assembly of the components is much harder. Even in areas of manufacturing where it's automated you have to do massive amounts of work to get it to that point, and any changes can result in major downtime or retooling.

AI companies such as Vicarious have been promising AI that makes this easier. Their idea was that generic robots with the right grips and sensors can be configured to work on a variety of assembly lines. This way a factory can be retooled between jobs quicker and with less cost.

flangola7 · on Feb 4, 2023

Lookup lights out manufacturing. There are factories that often run whole days in the dark because there's no point turning on the nights if there's no one around

Animats · on Feb 5, 2023

Not really. Although running CNC milling machines and lathes unattended at night is reasonably common. Day shift sets them up, and they cut metal all night.

Fanuc, the robot manufacturer, famously does run a lights-out factory, and has since 2001. It was the dream of Fanuc's founder. Baosteel now has a lights-out steel coiling facility. Both of these are more PR than cost effective.

There are many factories where there are very, very few people for large rooms full of machines, though.

metadat · on Feb 5, 2023

They are mostly a myth, though not always.

This comment from 7 days ago covers it adequately:

https://news.ycombinator.com/item?id=34562122

mmsimanga · on Feb 5, 2023

You have just described Pareto's principle[0] the 80/20 rule. It takes 20% of the effort to get to 80% but it then takes 80% of the the effort to finish the final 20%.

[0]https://en.m.wikipedia.org/wiki/Pareto_principle

Groxx · on Feb 5, 2023

AI seems like Pareto's principle combined with Zeno's paradox.

Not because goal posts keep moving, but because we can only do 80% of the remaining distance each time, and the remaining 20% is still obvious.

ignoramous · on Feb 5, 2023

The thing with software is, it happens slowly... then it happens all at once.

ProtoAES256 · on Feb 4, 2023

Ah, the good ol "A(G)I will arrive in 10 years!" --For the past 50+ years, basically.

It's a cautionary tale to people who are working in ML to be not too optimistic on "the future", but in my opinion being cautiously optimistic(not on AGI though) isn't harmful by itself, and I stand by that. Well at least until we hit the next wall and plunge everyone into another AI winter(fourth? fifth?) again.

As a plus, we do actually see some good progress that benefited the world like in biotech. Even though we are still mostly throwing random stuffs at ML to see if it works. Time will tell I guess.

lukeschlather · on Feb 5, 2023

Kurzweil gets a lot of flack for this sort of thing, he's generally presented as the ridiculous hype man for AI. And yet, he bet in 2002 that an AI would pass the Turing test by 2029. (And this is actually a more conservative prediction than "we will have AGI by 2029.") And looking at GPT3 it seems like he is probably going to win that bet.

aetherson · on Feb 5, 2023

I think the big revolution of the last few years has been to recognize that we'll likely get robots that can pass the turing test well before we get full self driving vehicles that can run anywhere there are basically ordinary paved roads.

I think even three years ago, most people would have thought the reverse.

So Kurzweil was imagining the turing test as the capstone to a decade of more and more capable ai products, not as "kind of early interesting success that may (or may not) presage really useful AI."

("The Turing test" is a pretty hazy target. I have no doubt that a chatgpt that was not trained to loudly announce that it was an AI could convince lots of people that it's a real human, right now. I think it's also the case that people with some experience with it could pretty quickly find ways to tell what it is.)

Groxx · on Feb 5, 2023

The Turing test has always been hazy - I don't think it's something we'll consider "passed" until at least a clear majority consider it passed (if not substantially further).

Otherwise you risk claiming ELIZA passed it, because a couple people thought so. Or that one Google employee this time.

aetherson · on Feb 5, 2023

Yes, that's what I was trying to say in the last paragraph. The Turing Test was an interesting thought experiment, not, like, an actual test. It's never been very clear how to operationalize it, and it's clear that Turing wasn't imagining how easily you can actively fool people. He was more making a point that we don't have an internal definition of intelligence -- it's not like multiplication where you can examine the underlying process and say, "Well, did it do this correctly?" You can only look at the results.

tbalsam · on Feb 5, 2023

Good point, I do appreciate this comment. Thanks for adding this. It is is interesting in how it very much appears that he will be correct, but instead in a different way maybe than most of us would reasonably have guessed at the time.

roenxi · on Feb 5, 2023

Working out the engineering challenges will probably take a decade extra, but I wouldn't listen to the ML researchers' opinions on this issue; the evidence that they are in the drivers seat is shaky. We're still seeing exponential gains in processing power and we're closing in on order-of-magnitude amounts of processing power being available in silicone as in a human brain. There is a pretty decent chance that there is some magic threshold around there where all these tasks become easy with current algorithms.

sublinear · on Feb 4, 2023

Just let them bring on another AI winter.

tbalsam · on Feb 5, 2023

I can understand that. I think that might be somewhat of a quick generalization. There are tendencies of people in the field to sometimes jump to rapid conclusions, but that is not researchers at all or in this case, me. I tend to be incredibly conservative, for example, and I have tangled with a number of "real world" systems enough to know some of the intricacies (though not at the edge).

If I were to make a point as to why your notes on self-driving cars and in-warehouse robots may not transfer to the case of software development, it's that they are fundamentally two very different problems with very different issues attached to them. It unfortunately is very much apples to oranges. They are both NP-hard but very different kinds of NP-hard.

A software program is a closed-loop target, though it is NP-hard. But we're optimizing for a different kind of metric here that is well-defined. Any kind of self-directed reinforcement-or-otherwise autoregressive-in-the-world algorithm is going to have an extraordinarily long tail of edge cases.

What I was talking about when I mentioned the geometry of the problem is not the parsing of the code, but the geometry of a near-optimal solution. Certainly, scale will be expensive, but Sutton is our friend here. That's why it's more "trivial" than problems that require humans in the loop -- you don't need humans to parse, structure, generate, and evaluate the data flow of a software code base, though admittedly if models like RHLF become popular as you noted, the endpoints that generate code under those geometric constraints -- those will become extremely expensive.

I think the geometric problem is very hard but the hurdle of scaled language models is more technically impressive to me.

What's nice is that unlike needing to generate a long, 1d story, too, there's more robustness with a huge field of possibility that's had years of work on the software side of things. It's not that it's going to be easy, but I think we've all grown as we've seen how hard self-driving cars are, and it's just not that kind of scenario, since all consequences of the 'world' within the repo-generation case are (for the most part) self-contained.

I hope that helps elucidate the problems a bit. To me, my optimism is much more rare, and only generally when I feel like I have a solid grasp of the fundamentals of it enough (i.e. I roughly know deliverability and have decent known error bounds on the sub-problems).

That said, I heartily agree with you that when all else fails -- assistive is good. What I see a "complete solution" doing well is creating a Kolmogorov-minimal, complete starting point and things evolving from there. Whether that works or not remains to be seen.

qzw · on Feb 4, 2023

In other words, they’ll embrace, extend, and extinguish. Man, these AIs really are just regurgitating their training set!

CommieBobDole · on Feb 4, 2023

I don't think ChatGPT or its successors will be able to do large-scale software development, defined as 'translating complex business requirements into code', but the actual act of programming will become more one of using ML tools to create functions, and writing code to link them together with business logic. It'll still be programming, but it will just start at a higher level, and a single programmer will be vastly more productive.

Which, of course, is what we've always done; modern programming, with its full-featured IDEs, high level languages, and feature-rich third-party libraries is mostly about gluing together things that already exist. We've already abstracted away 99% of programming over the last 40 years or so, allowing a single programmer today to build something in a weekend that would have taken a building full of programmers years to build in the 1980s. The difference is, of course, this is going to happen fairly quickly and bring about an upheaval in the software industry to the detriment of a lot of people.

And of course, this doesn't include the possibility of AGI; I think we're a very long way from that, but once it happens, any job doing anything with information is instantly obsolete forever.

SketchySeaBeast · on Feb 4, 2023

That's my assumption as well - the human programmers will far more productive, but they'll still be required because there's no way we can take the guard rails off and let the AI build - it'll build wrong unit tests for wrong functions which create wrong programs and will require humans to get it back on track.

epistemer · on Feb 5, 2023

I think it is really hard to say where all this goes right now when we currently don't even have good quantitative reasoning.

10 years ago we were still working on MNIST prediction accuracy. 10 years forward from here all bets are off. If the model has super human quantitative reasoning and a mastery of language I am not sure how much programming we will be doing compared to moving to a higher level of abstraction.

On the other hand, I think there will be so many new software jobs because of the volume of software built over the next 20 years. The volume of software built over the next 20 years is probably unimaginable sitting where we are.

brookst · on Feb 5, 2023

Is your opinion time bounded to 5 years? 20 years? 100 years? Forever?

SketchySeaBeast · on Feb 5, 2023

I don't think anyone can say what's going to happen in 10 years, but what I do know is if you look back people have been saying programmers will be obsolete in 10 years for way longer than a decade.

cmehdy · on Feb 5, 2023

I could see IDEs for AI, where you manipulate ways to input prompts (natural Landis language, weighted keywords, audio..) and selection of methods (chatgpt, whatever model will come for diagrams, visual models, audio ones..). Then basically visually program outputs, add tests you want to use to validate and feed back, multimodal output views..

phoehne · on Feb 4, 2023

I think you’re right in one sense, and we both agree LLMs are not sufficient. I think they are definitely the death knell for the junior python developer that slaps together common APIs by googling the answers. The same way good, optimizing C, C++, … compilers destroyed the need for wide-spread knowledge of assembly programming. 100% agreed on that.

Those are the most precarious jobs in the industry. Many of those people might become LLM whisperers, taking their clients requests and curating prompts. Essentially becoming programmers over the prompting system. Maybe they’ll write a transpiler to generate prompts? This would be par of the course with other languages (like SQL) that were originally meant to empower end-users.

The problem with current AI generated code from neural networks is the lack of an explanation. Especially when we’re dealing with anything safety critical or with high impact (like a stock exchange), we’re going to need an explanation of how the AI got to its solution. (I think we’d need the same for medical diagnosis or any high-risk activity). That’s the part where I think we’re going to need breakthroughs in other areas.

Imagine getting 30,000-ish RISCV instructions out of an AI for a braking system. Then there’s a series of excess crashes when those cars fail to brake. (Not that human written software doesn’t have bugs, but we do a lot to prevent that.). We’ll need to look at the model the AI built to understand where there’s a bug. For safety related things we usually have a lot of design, requirement, and test artifacts to look at. If the answer is ‘dunno - neural networks, ya’ll’, we’re going to open up serious cans of worms. I don’t think an AI that self evaluates its own code is even on the visible horizon.

jokethrowaway · on Feb 4, 2023

I don't think chatgpt lacks an explanation. It can explain what it's doing. It's just that it can be completely wrong or the explanation may be correct and the code wrong.

I gave some code to ChatGPT asking to simplify it and it returned the correct code but off by one. It was something dealing with dates, so it was trivial to write a loop checking for each day if the new code matched in functionality the old one.

You will never have certainty the code makes any sense if it's coming from one of these high tech parrots. With a human you can at least be sure the intention was there.

phoehne · on Feb 4, 2023

It’s a very sophisticated form of a recurrent neural network. We used to use those for generating a complete image based on a partial image. The recurrent network can’t explain why it chose to reproduce one image instead of another. Nor can you look at the network and find the fiddly bit that drive that output. You can ask a human why they chose to use an array instead of a hash map, or why static memory allocation in this area avoids corner cases. ChatGPT simply generates the most likely text as an explanation. That’s what I mean about being able to explain something.

scarface74 · on Feb 4, 2023

Would you trust code coming from a junior developer more?

Retric · on Feb 4, 2023

Right now yes. Hypothetically that may change but the hype is vastly beyond what it’s actually capable of right now.

anonzzzies · on Feb 5, 2023

Ah the HN echo chamber again! Please visit your local non FAAAM (or what it is now?) fortune 1000, pick a senior dev randomly and work with them for week. Chatgpt is vastly better now, today. Faster, does not need sleep, rest, politeness or handholding, can explain itself (sure it’s wrong often but less wrong than the dev you picked while actually being able to use proper syntax and grammar, unlike the dev you picked) and is, of course, let’s not deny it, way cheaper.

Retric · on Feb 5, 2023

I’ve worked with plenty of jr developers at east coast government contractors, arguably the bottom of the barrel. I would still rather put their code into production, even without unit tests, than I would ChatGPT.

ChatGPT is only cheap if you don’t need its code to do anything of any particular value. It’s a seemingly ideal solution to collage homework for example. But professionally people write code to actually achieve something, this is why programmers actually get paid well in the first place. The point isn’t LOC the point is solving some problem.

scarface74 · on Feb 5, 2023

And junior devs are horrible at knowing what problem to solve and how to solve it without handholding. I am working on a relatively complex DevOps/“cloud application modernization” project. Where the heavy lifting is designing the process and gathering requirements. But there are a lot of 20-40 line Lambdas and Python/boto3 (AWS SDK), yaml/json wrangling, dynamic Cloudformation creating scripts.

I was able to give ChatGPT the requirements for all of them. The types of bugs I found during the first pass:

- the AWS SDK and the underlying API only returns 50 results in one call most of the time. From the SDK you have to use the built in “paginators”. ChatGPT didn’t use them the first time. But once I said “this will only return the first 50 results”. It immediately corrected the script and used the paginator. I have also had to look out for similar bugs from junior devs.

- The usual yaml library for Python doesn’t play nicely with CloudFormation templates because of the function syntax that starts with an “!”. I didn’t know this beforehand. But once I told ChatGPT the error, it replaced the yaml handling with cfn-flip.

- I couldn’t figure out for the life of me how to combine the !If function in CloudFormation with a Condition, and a Yaml block that contain another !Select function with two arguments. I put the template block without the conditional and told ChatGPT “make the VPC configuration optional based on a parameter”. It created the Parameter section, the condition and the appropriate Yaml.

I’ve given similar problems to interns/junior devs before and ChatGPT was much better at it.

Retric · on Feb 5, 2023

Reading that I can only assume you’ve made really poor use of Jr developers.

Which ok I get why you think ChatGPT is more useful.

scarface74 · on Feb 5, 2023

You really think that Jr devs could crank out the same code faster than ChatGPT? I couldn’t crank out the same code and you couldn’t either. The most you can hope from JR devs (even the ones I have met at BigTech) is that they don’t eat the chalk during the first 3-6 months.

vbezhenar · on Feb 5, 2023

As for now, issue with ChatGPT is that it doesn't really crank anything. It instantly produces answer for given input. While programmer can crank things. For example I asked ChatGPT to write a function which returns UUID generated with some rules. It spewed the solution. It looked like a correct one but when I run it, it returned wrong answer. I worked with ChatGPT for some time and it corrected its code. But I would expect from junior developer to actually run his code and check output.

Now if ChatGPT would be able to actually work on the problem rather than returning generated text, that would be a completely different beast. And I think that this workflow will come in the near future because it's pretty obvious idea. Get task specification, generate tests, generate code, fix code until tests work, refactor code until it meets some standards, etc.

mullingitover · on Feb 5, 2023

> I think that this workflow will come in the near future because it's pretty obvious idea. Get task specification, generate tests, generate code, fix code until tests work, refactor code until it meets some standards, etc.

ChatGPT probably works great if you use it to speedrun normal best practices in software engineering. Make it start by writing tests given a spec, then make it write code that will pass the specific tests it just wrote. I’m guessing it’ll avoid a lot of mistakes, much like any engineer, if you force it to do TDD.

anonzzzies · on Feb 5, 2023

You can loop chatgpt around automatically, asking it to write tests and reason about the code for a few iterations; in my experience it auto corrects the code like a human would after some ‘thinking’ time. Of course the code has to run automatically and errors fed back, like with a human. It works fine though, without human input after some prompting work.

scarface74 · on Feb 5, 2023

> But I would expect from junior developer to actually run his code and check output.

You give junior devs way too much credit. They rarely test corner cases.

ipaddr · on Feb 5, 2023

Always hire a senior developer without experience for junior role. By that I mean hire a developer who knows how to program but lacks specific experience or has no formal experience at all.

alxlu · on Feb 5, 2023

Doesn’t this only work for relatively contrived situations? I can tell a jr dev to go and add some minor feature in a codebase, put it behind a flag, and add tracking/analytics to it. I can point to the part of the application I want the feature to be added on the screen and the jr devs are often able to find it on their own. I haven’t seen chatGPT do anything like that and I don’t think there is a way to provide it with the necessary context even if it has the capability.

scarface74 · on Feb 5, 2023

For me it works for small stand alone utility scripts. But the most impressive thing I was able to get it to do was.

“Given an XML file with the format {[1]} and a DynamoDB table with two fields “Key”, “Value”, write a Python script that replaces the Value in the xml file when the corresponding key is found. Use argparse to let me specify both the input xml file and the output XML”

It spit out perfect Python code. I hadn’t used XML in well over a decade and I definitely didn’t know how to read xml in Python. I didn’t want to bother about learning.

I actually pasted an XML sample like the link below.

[1] https://learn.microsoft.com/en-us/troubleshoot/developer/vis...

danaris · on Feb 5, 2023

Wait, you think "junior developers are actually moderately competent" only makes sense within the HN echo chamber?

I think you have that exactly backwards.

Most junior developers most places may not have the experience of a senior developer, and thus be able to do the translation from business logic to code quite as fast and accurately the first time, but this kind of derogatory attitude toward them is incredibly condescending and insulting.

ChatGPT doesn't know what it's doing. It doesn't know anything, and unlike the most junior developer barely trained, it can't even check its output to see if it matches the desired output.

And for goodness' sake, get rid of the absurd idea that all the competent developers are in Silicon Valley. That's even more insulting to the vast majority of developers in the entire world.

ikekkdcjkfke · on Feb 4, 2023

On the other hand you don’t want to manually program all the joints of a robot to move through any terrain. You just convert a bunch of cases to a language to make the robot fluent in that

morelisp · on Feb 4, 2023

Translating an idiomatic structured loop into assembly used to be an "L3" question (honestly, probably higher), yet compilers could do it with substantially fewer resources than and decades before any of these LLMs.

While I wouldn't dare offer particular public prognostications about the effect transformer codegens will have on the industry, especially once filtered through a profit motive - the specific technical skill a programmer is called upon to learn at various points in their career has shifted wildly throughout the industry's history, yet the actual job has at best inflected a few times and never changed very dramatically since probably the 60s.

Bukhmanizer · on Feb 4, 2023

I agree this would have been thought to be impossible a few years ago, but I don't think it's necessarily moving the goalposts. I don't think software engineers are really paid for their labour exactly. FAANG is willing to pay top dollar for employees, because that's how they retain dominance over their markets.

Now you could say that LLMs enable Google to do what it does now with fewer employees, but the same thing is true for every other competitor to Google. So the question is how will Google try and maintain dominance over it's competitors now? Likely they will invest more heavily in AI and probably make some riskier decisions but I don't see them suddenly trying to cheap out on talent.

I also think that it's not a zero sum game. The way that technology development has typically gone is the more you can deliver, the more people want. We've made vast improvements in efficiency and it's entirely possible that what an entire team's worth of people was doing in 2005 could be managed by a single person today. But technology has expanded so much since then that you need more and more people just to keep up pace.

jeffbee · on Feb 4, 2023

Google already published a paper claiming to have deployed an LLM for code generation at full scale to its tens of thousands of software engineers, years ago.

Linorm · on Feb 6, 2023

Do you happen to have a link to this paper? I can't seem to find it.

jrockway · on Feb 4, 2023

I'm kind of interested in how AI is going to interface with the world. Humans have a lot of autonomy to change the physical world they're in; from rearranging furniture, to building structures, to visiting other worlds. Why isn't AI doing any of that stuff?

As programmers, we keep talking about programming jobs and how AI will eliminate them all. But nobody is talking about eliminating other jobs. When will a robot vacuum be able to clean my apartment as quickly as I? Why isn't there a robot that takes my garbage out on Tuesday night? When will AI plan and build a new tunnel under the Hudson River for trains? When will airliners be pilotless? If AI can't do this stuff, what makes software so different? Why will AI be good at that but not other things? It seems like the only goal is to eliminate jobs doing things people actually like (art, music, literature, etc.), and not eliminate any tedium or things that is a waste of humanity's time whatsoever.

(On the software front, when will AI decide what software to build? Will someone have to tell it? Will it do it on its own? Why isn't it doing this right now?)

My takeaway is that this all raises a lot of questions for me on how far along we actually are. Language models are about stringing together words to sound like you have understanding, but the understanding still isn't there. But, I suppose we won't know understanding until we see it. Do we think that true understanding is just a year or two away? 10? 50? 100? 1000?

mLuby · on Feb 5, 2023

Household tasks can involve a robot moving with enough kinetic energy to maim or kill a human (or pet) in unlucky circumstances. And we'll quickly become habituated to their presence and so careless around them. Even a Roomba could knock granny down the stairs if it isn't careful about its environment.

You could make the same argument as with self-driving cars, that people already get hurt this way and maybe the robot is in fact safer. But it's still a hard sell that Sunny-01 has only accidentally killed 1/10 as many children as parents have—the number has to be more like zero.

Let's solve automating trains first then we can do airliners.

LudwigNagasena · on Feb 4, 2023

> I think we have a tendency to mentally move the goalposts when it comes to this kind of thing as a self-defense mechanism. Years ago this would have been a similar level of impossibility.

Define "we". There are all kinds of people with all kinds of opinions. I didn't notice any consensus on the questions of AI. There are people with all kinds of educations and backgrounds on the opposite sides and in-between.

gptgpp · on Feb 4, 2023

I mean, you can just as easily make the claim that researchers shift goalposts as a "self-defense" mechanism.

For example...

Hows that self-driving going? Got all those edge-cases ironed out yet?

Oh, by next year? Wierd, that sounds very familiar...

Remember about Tesla's autopilot was released 9 years ago, and the media began similar speculation about how all of the truckers were going to get automated out of a job by AI? And then further speculation about how Taxi drivers were all going to be obsolete?

Those workers are the ones shifting the goal posts though as a "self-defense mechanism", sure, sure... lol.

lostmsu · on Feb 4, 2023

Well, there's a difference between the situation with self-driving and with language models.

With self-driving, we barely ever saw anything obviously resembling human abilities, but there was a lot of marketing promising more.

With language models when GPT-2 came out everyone was still saying it is a "stochastic parrot" and even GPT-3 was one. But now there's ChatGPT, and every single teenager is aware that that tool is capable of replacing them with their school assignments. And as a dev I am aware that it can write code. And yet not many people expected any of this to happen this year, neither were those capabilities promised at any point in the past.

So if anything, self-driving was always overhyped, while the LLMs are quite underhyped.

quonn · on Feb 4, 2023

We actually saw a lot resembling human abilities. It just turns out that it‘s not enough to blindly rely on it in all situations and so here we are. And it‘s quite similar with LLMs.

One difference, though, is that it‘s economically not much use to have self-driving if the backup driver has to be in the car or present. While partially automating programming would make it possible to use far less programmers for the same amount of work.

saurik · on Feb 4, 2023

I've been hearing this "you're moving the goalposts" argument for over 20 years now, ever since I was a college student taking graduate courses in Cognitive Science (which my University decided to cobble together at the time out of Computer Science, Psychology, Biology, and Geography), and I honestly don't think it is a useful framing of the argument.

In this case, it could be that you are just talking to different people and focusing on their answers. I am more than happy to believe that Copilot and ChatGPT, today, cause a bunch of people fear. Does it cause me fear? No.

And if you had asked me five years ago "if I built a program that was able to generate simple websites, or reconfigure code people have written to solve problems similar to ones solved before, would that cause you to worry?" I also would have said "No", and I would have looked at you as crazy if you thought it would.

Why? Because I agree with the person you are replying to (though I would have used a slightly-less insulting term than "trash engineers", even if mentally it was just as mean): the world already has too many "amateur developers" and frankly most of them should never have learned to program in the first place. We seriously have people taking month or even week long coding bootcamps and then thinking they have a chance to be a "rock star coder".

Honestly, I will claim the only reason they have a job in the first place is because a bunch of cogs--many of whom seem to work at Google--massively crank the complexity of simple problems and then encourage us all to type ridiculous amounts of boilerplate code to get simple tasks done. It should be way easier to develop these trivial things but every time someone on this site whines about "abstraction" another thousand amateurs get to have a job maintaining boilerplate.

If anything, I think my particular job--which is a combination of achieving low-level stunts no one has done before, dreaming up new abstractions no one has considered before, and finding mistakes in code other people have written--is going to just be in even more demand from the current generation of these tools, as I think this stuff is mostly going to encourage more people to remain amateurs for longer and, as far as anyone has so far shown, the generators are more than happy to generate slightly buggy code as that's what they were trained on, and they have no "taste".

Can you fix this? Maybe. But are you there? No. The reality is that these systems always seem to be missing something critical and, to me, obvious: some kind of "cognitive architecture" that allows them to think and dream possibilities, as well as a fitness function that cares about doing something interesting and new instead of being "a conformist": DALL-E is sometimes depicted as a robot in a smock dressed up to be the new Pablo Picasso, but, in reality, these AIs should be wearing business suits as they are closer to Charles Schmendeman.

But, here is the fun thing: if you do come for my job even in the near future, will I move the goal post? I'd think not, as I would have finally been affected. But... will you hear a bunch of people saying "I won't be worried until X"? YES, because there are surely people who do things that are more complicated than what I do (or which are at least different and more inherently valuable and difficult for a machine to do in some way). That doesn't mean the goalpost moved... that means you talked to a different person who did a different thing, and you probably ignored them before as they looked like a crank vs. the people who were willing to be worried about something easier.

And yet, I'm going to go further: if the things I tell you today--the things I say are required to make me worry--happen and yet somehow I was wrong and it is the future and you technically do those things and somehow I'm still not worried, then, sure: I guess you can continue to complain about the goalposts being moved... but is it really my fault? Ergo: was it me who had the job of placing the goalposts in the first place?

The reality is that humans aren't always good at telling you what you are missing or what they need; and I appreciate that it must feel frustrating providing a thing which technically implements what they said they wanted and it not having the impact you expected--there are definitely people who thought that, with the tech we have now long ago pulled off, cars would be self-driving... and like, cars sort of self-drive? and yet, I still have to mostly drive my car ;P--then I'd argue the field still "failed" and the real issue is that I am not the customer who tells you what you have to build and, if you achieve what the contract said, you get paid: physics and economics are cruel bosses whose needs are oft difficult to understand.

water-your-self · on Feb 5, 2023

I think there's a real story here behind the ownership and usage of proprietary data.

ryanjshaw · on Feb 4, 2023

I think OP set relatively simple goals. How long until AI can architect, design, build, test, deploy and integrate commercial software systems from scratch, and handle users submitting bug reports that say "The OK button doesn't work when I click it!"?

solumunus · on Feb 5, 2023

So you've drank the industry kool aid.

brunooliv · on Feb 4, 2023

Not to be the devil's advocate or something, but, I hope you understand that the vast majority of FAANG engineers CAN'T build any highly scalable system from scratch, much less fix bugs in the Linux kernel... So that argument feels really moot to me... If anything this just shows hopefully that gatekeeping good engineers by putting these LC puzzles as a requirement for interviews is a sure way to hire a majority of people who aren't adding THAT MUCH MORE value than a LLM already does... Yikes... On top of that, they'll be bad team players and it'll be a luck if they can string together two written paragraphs...

margorczynski · on Feb 4, 2023

I agree, people in general overestimate the skills and input of your average developer where many (even in FAANG) are simply not capable of creating anything more than some simple CRUD or tooling script without explicit guidance. And being good or very good with algorithms and estimating big-O complexity doesn't make you (it can help) a good software engineer.

lostmsu · on Feb 4, 2023

That's the general issue with AI skeptics. Most of them, especially highly educated ones, overestimate capabilities of common folk. Frankly, some even overestimate their own. E.g. almost none of them seem to be bothered that while GPT might not provide expert answers in their field, the same GPT is much more capable in other fields than they are (e.g. the "general" part in the "General Artificial Intelligence").

margorczynski · on Feb 4, 2023

True, the thing is there's nothing like "General Artificial Intelligence" and humans are expert systems optimized to the goal of survival, which in turn gets chopped up into a plethora of sub-goal optimization from which most probably the "general" adjective pops up. It doesn't really matter if it's "general" as long as it actually is useful. It doesn't have to write whole systems from scratch, just making the average dev 20-30% faster is huge.

mensetmanusman · on Feb 4, 2023

Don’t understand this take.

If it was easy to make an LLM that quickly parsed all of StackOverflow and described new answers that most of the time worked in the timeframe of an interview, it would have been done by now.

ChatGPT is clearly disruptive being the first useful chatbot in forever.

kolbe · on Feb 4, 2023

It kind of depends on the frame of the solution. Google can answer leetcode questions, leetcode's answers section can answer them as well. If ChatGPT is solving them, that's one thing, but if it's just mapping the question to a solution found somewhere, then not so impressive.

morelisp · on Feb 4, 2023

While I think the jury is still out on whether ChatGPT is truly useful or not, passing an L3 hiring test is not evidence of that one way or another.

pixl97 · on Feb 4, 2023

If it doesn't point out that ChatGPT is useful, especially if its proven it is not, then maybe the hiring tests are not useful.

layer8 · on Feb 4, 2023

The hiring tests are designed to serve as a predictor for human applicants. How well an LLM does on them doesn’t necessarily say anything about the usefulness of those tests as said predictor.

morelisp · on Feb 4, 2023

Well, what it shows is that hiring tests are not useful as Turing tests. But nobody designed them to be or expected them to be! At best it "proves" is that hiring tests are not sufficient. But again, nobody thought they were. And even still, the assumption a human is taking the hiring test still seems reasonable. Why overengineer your process?

cevn · on Feb 4, 2023

That's exactly what it proves..

Nursie · on Feb 4, 2023

We have a winner …

petesergeant · on Feb 5, 2023

> the jury is still out on whether ChatGPT is truly useful or not

I'd pay $100 a month for ChatGPT. It allows me to ask free-form questions about some open-source packages with truly appalling docs and usually gets them right, and saves me a bunch of time. It helps me understand technical language in papers I'm reading at the moment regarding stats. It's been useful to find good Google search terms for various bits of history I wanted to find out more about.

I don't think the jury is out at all on whether it's useful. The jury is out on the degree to which it can replace humans for tasks, and I'd suggest the answer is "no" for most tasks.

dariusj18 · on Feb 4, 2023

I just used to it write a function for me yesterday. I had previously googled a few times and came up dry, asked Chat GPT and it came out with a solution I had not considered, and was better than what I was thinking.

freejazz · on Feb 4, 2023

what does it being easy have to do with it?

SpeedilyDamage · on Feb 4, 2023

You don't understand the take that just because ChatGPT can pass a coding interview doesn't mean the coding interview is useless or that ChatGPT could actually do the job?

What part of that take do you not understand? It's a really easy concept to grasp, and even if you don't agree with it, I would expect at least that a research scientist (according to your bio) would be able to grok the concepts almost immediately...

brhsagain · on Feb 4, 2023

> doesn't mean the coding interview is useless or that ChatGPT could actually do the job

Aren't these kind of mutually exclusive, at least directionally? If the interview is meaningful you'd expect it to predict job performance. If it can't predict job performance then it is kind of useless.

I guess you could play some word games here to occupy a middle ground ("the coding interview is kind of useful, it measures something, just not job performance exactly") but I can't think of a formulation where this doesn't sound pretty silly.

l33t233372 · on Feb 4, 2023

It’s possible that being able to pass the interview is indicative of performance in humans but not in LLMs.

Humans think differently from LLMs so it makes sense to interpret the same signal in different ways.

ianbutler · on Feb 4, 2023

We've been saying for years these interviews are not predictive of job performance. Here's the proof.

Nothing you do in an interview like this resembles day to day work in this field.

SpeedilyDamage · on Feb 4, 2023

For what it's worth, when I ask these kinds of questions (rarely anymore), I'm looking more at how the problem is solved, not what the solution is.

A wrong answer with good thinking is better than a correct answer with no explanation.

jokethrowaway · on Feb 4, 2023

Chatgpt can provide you a great explanation of the how.

Oftentimes the explanation is correct, even if there's some mistake in the code (probably because the explanation is easier to generate than the correct code, an artifact of being a high tech parrot)

morelisp · on Feb 4, 2023

Finding a single counterexample does not disprove correlation or predictive ability. A hiring test can have both false positives and false negatives and still be useful.

hacym · on Feb 4, 2023

I don’t think your militant attitude helps them understand any better.

SpeedilyDamage · on Feb 4, 2023

I don't think I had a militant attitude, but I do think saying, "I don't understand..." rather than "I disagree with..." puts a sour note on the entire conversation.

hacym · on Feb 4, 2023

You literally went to their profile and called them out about how they should be able to understand something you’re describing as so easy to understand.

SpeedilyDamage · on Feb 4, 2023

Yeah, what is the problem with that? They engaged dishonestly by claiming they didn't understand something, why should I do anything other than call them on that?

hacym · on Feb 4, 2023

OK — just don’t be surprised when people think you’re being a jerk because you didn’t like the words someone chose. I’d assert you’re acting in bad faith more than the person you responded to.

SpeedilyDamage · on Feb 4, 2023

I just... how is what you're doing here different from what I was saying, other than you're explicitly calling me names?

hacym · on Feb 4, 2023

> engaged dishonestly by claiming [someone used a word they didn’t like] why should I do anything other than call them on that?

Have a great day.

SpeedilyDamage · on Feb 4, 2023

...what? I'm so confused.

hacym · on Feb 4, 2023

It’s really very easy to understand. When someone gives you the same crap back that you just got done giving someone, you don’t like it and act like that shouldn’t happen.

SpeedilyDamage · on Feb 4, 2023

Did I say I didn't "like" (I'd use the word "appreciate") it, or that I didn't think it should happen? If so, could you please highlight where?

I just see, in what you're doing, a wild lack of self awareness. You're criticizing me for doing to someone else a milder version of what you're trying to do to me now; I'm genuinely confused how you can't see that, or how you could possibly stand the hypocrisy if you do understand that.

hacym · on Feb 4, 2023

I feel that you’re being dishonest if you are saying you’re confused or don’t understand my point.

SpeedilyDamage · on Feb 5, 2023

I can't change how you feel.

mensetmanusman · on Feb 4, 2023

I was just responding to the ‘water is blue’ level dismissal of a seemingly clear advancement. Sorry it seemed like a sour note— not my intent:)

SpeedilyDamage · on Feb 4, 2023

All good, it's just hard sometimes to understand intent online, maybe I overreacted to what I perceived as bad faith engagement.

Words are hard!

jokethrowaway · on Feb 4, 2023

I'll try to phrase it so that even someone who is not a research scientist (?) can understand. I'm not one, whatever that means.

Let's define the interview as useful if the passing candidate can do the job.

Sounds reasonable.

ChatGPT can pass the interview and can't do the job.

The interview is not able to predict the poor working performance of ChatGPT and it's therefore useless.

Some of the companies I worked for hired ex fang people as if it was a mark of quality, but that hasn't always worked out well. There is plenty of people getting out of fangs having just done mediocre work for a big paycheck.

thaumasiotes · on Feb 4, 2023

> Let's define the interview as useful if the passing candidate can do the job.

The technical term for this is "construct validity", that the test results are related to something you want to learn about.

> The interview is not able to predict the poor working performance of ChatGPT and it's therefore useless.

This doesn't follow; the interview doesn't need to be able to exclude ChatGPT because ChatGPT doesn't interview for jobs. It's perfectly possible that the same test shows high validity on humans and low validity on ChatGPT.

anonzzzies · on Feb 5, 2023

> or are trash engineers.

So 99% of software ‘engineers’ then? Have you ever looked on Twitter what ‘professionals’ write and talk about? And what they produce (while being well paid)?

People here generally seem to believe, after having seen a few strangeloop presentations and reading startup stories from HN superstars, that this is the norm for software dev. Please walk into Deloitte or Accenture and spend a week with a software dev team, then tell me if they cannot all be immediately replaced by a slightly rotten potato hooked up to chatgpt. I know people at Accenture who make a fortune and are proud that they do nothing all day and do their work by getting some junior geek or, now, gpt to do the work for them. There are dysfunctional teams on top of dysfunctional teams who all protect eachother as no one can do what they were hired for. And this is completely normal at large consultancy corps; and therefor also normal at the large corps that hire these consultancy corps to do projects. In the end something comes out, 5-10x more expensive than the estimate and of shockingly bad quality compared to what you seem to expect as being the norm in the world.

So yes, probably you don’t have to worry, but 99% of ‘keyboard based jobs’ should really be looking for a completely different thing; cooking, plumbing, electrics, rendering, carpeting etc maybe as they won’t be able to even grasp what level you say you are; seeing you work would probably fill them with amazement akin to seeing some real life sorcerer wielding their magic.

Actually, a common phrase I hear from my colleagues when I mention some ‘newer’ tech like Supabase is; ‘that’s academic stuff, no one actually uses that’. They work with systems that are over 25 years old and still charge a fortune by the cpu core like sap, oracle, opentext etc. And ‘train’ juniors in those systems.

ipnon · on Feb 4, 2023

Until ChatGPT can slack my PM, attend my sprint plannings, read my Jira tickets, and synthesize all of this into actionable tasks on my codebase, I think we have job security. To be clear, we are starting to see this capability on the horizon.

klyrs · on Feb 4, 2023

Your PM should be the first to be worried, honestly. I keep hearing people describing their job as "I just click around on Jira while I sit through meetings all day."

alephnerd · on Feb 4, 2023

That's a bad PM then to be honest. I think ChatGPT will definetly commodify a lot of "bitch work" (pardon my french).

The PMs who are only writing tickets and not participating in actively building ACs or communicating cross functionally are screwed. But so are SWEs who are doing the bare minimum of work.

The kinds of SWEs and PMs who concentrate on stuff higher in the value chain (like system design, product market fit, messaging, etc) will continue to be in demand and in fact find it much easier to get their jobs done.

Honestly, I kind of appreciate this.

klyrs · on Feb 4, 2023

To be fair to the people that I hear that from, they're essentially complaining about the worst part of their job. They're active participants in those meetings, they are genuinely thinking about the complexities of the mismatch between what management asks for and what their ICs can do, etc. I see their value. But the awful truth is that a $10k/project/yr license for PMaaS software will be very appealing to executives.

alephnerd · on Feb 4, 2023

And as a Product Manager, I'd support that. Most PMs I see now in the industry are glorified Business Analysts who aren't providing value for the amount of money spent on them. But that's also true for a lot of SWEs and any role. Honestly, the tech industry just got very fat the past 5-7 years and we're just starting to see a correction.

edit with additional context:

Writing Jira tickets and making bullshit Powerpoints with graphs and metrics is to PMs as writing Unit Tests are to SWEs. It's work you need to get done, but it has very marginal value. When a PM is hired, they are hired to own the Product's Strategy and Ops - how do we bring it to market, who's the persona we are selling to, how do our competitors do stuff, what features do we need to prioritize based on industry or competitive pressures, etc.

That's the equivalent of a SWE thinking about how to architect a service to minimize downtime, or deciding which stack to use to minimize developer overhead, or actually building an MVP from scratch. To a SWE, while code is important, they are fundamentally being hired to translate business requests that a PM provides them into an actionable product. Haskell, Rust, Python, Cobol - who gives a shit what the code is written in, just make a functional product that is maintainable for your team.

There are a lot of SWEs and PMs who don't have vision or the ability to see the bigger picture. And honestly, they aren't that different either - almost all SWEs and PMs I meet when to the same universities and did the same degrees. Half of Cal EECS majors become SWEs and the other half PMs based on my friend group (I didn't attend cal, but half my high school did, but this ratio was similar at my alma mater too, but with an additional 15% each entering Management Consulting and IB)

margorczynski · on Feb 4, 2023

> Writing Jira tickets and making bullshit Powerpoints with graphs and metrics is to PMs as writing Unit Tests are to SWEs. It's work you need to get done, but it has very marginal value.

Don't want to be rude but I don't think you know what you're talking about. And this is coming from a person who most certainly doesn't like sitting on writing Unit Tests.

nerdchum · on Feb 4, 2023

I think this will probably be a boon to the project manager. It will be another tool and their toolbox along with real developers that they can assign lower complexity tasks too. at least it's till it's capable of doing high complexity stuff.

cudgy · on Feb 4, 2023

Project managers are dealing with the high complexity stuff, while the developers are handling the low complexity stuff? Shouldn’t it be the other way around?

nerdchum · on Feb 5, 2023

What? What I think is it will be another tool in the project managers toolbox to get the job done.

startupsfail · on Feb 4, 2023

The capability will be available in around two weeks once RLHF alignment with the software engineering tasks is completed. The deployment will take take around twelve hours, most of it taken by human review of you and your manager of the integration summary pages. You can keep your job, supervise and review how your role is being played for the following 6 months, until the human supervision role is deemed unnecessary.

qualudeheart · on Feb 4, 2023

Are you referring to that article about the OpenAI contractors? Are they being used to work on RLHF?

ALittleLight · on Feb 4, 2023

One issue is that there are a much larger number of people who can attend meetings, read Jira tickets, and then describe what they need to a LLM. As the number of people who can do your job increases dramatically your job security will decline.

object-object · on Feb 4, 2023

If one's ability to describe what they need to Google is at all a proxy to the skill of interacting with an LLM, then I think most devs will still have an edge.

nzoschke · on Feb 4, 2023

Perhaps an engineering manager can use one trained on entire Slack history, all Jira tickets, and all PRs to stub out some new tickets and even first PR drafts themselves…

We will always need humans to prompt, prioritize, review, ship and support things.

But maybe far less of them for many domains. Support and marketing are coming first, but I don’t think software development is exempt.

croes · on Feb 4, 2023

Not quite. You have job security as long as companies don't belief ChatGPT can do all that

ALittleLight · on Feb 4, 2023

I think this is a huge demonstration of progress. Shrugging it off as "water is blue" ignores the fact that a year ago this wouldn't have been possible. At one end of the "programmer" scale is hacking basic programs together by copying off of stack overflow and similar - call that 0. At the other end is the senior/principal software architect - designing scalable systems to address business needs, documenting the components and assigning them out to other developers as needed - call that 10.

What this shows us is that ChatGPT is on the scale. It's a 1 or a 2 - good enough to pass a junior coding interview. Okay, you're right, that doesn't make it a 10, and it can't really replace a junior dev (right now) - but this is a substantial improvement from where things were a year ago. LLM coding can keep getting better in a way that humans alone can't. Where will it be next year? With GPT-4? In a decade? In two?

I think the writing is on the wall. It would not surprise me if systems like this were good enough to replace junior engineers within 10 years.

tukantje · on Feb 4, 2023

We don’t get junior engineers for solving problems we tend to get them because they grow into other roles.

scarface74 · on Feb 4, 2023

That’s a very unrealistic take.

Here is what usually happens.

You hire a junior dev at $x. Let’s say $75K. They stay for a couple of years and start out doing “negative work”. By the time they get useful and start asking for $100K, your HR department tells you that they can’t give them a 33% raise.

Your former junior dev then looks for another job that will pay them what they are asking for and the next company doesn’t have to waste time or risk getting an unproven dev.

While your company is hiring people with his same skill level at market price - ie “salary compression and inversion”.

pixl97 · on Feb 4, 2023

Ah, yes, that's why when we read programmer forums every engineer says something like "If you want a promotion and more pay move to another company".

class4behavior · on Feb 4, 2023

Yes? That is an issue with US laws, not how jobs work. Your employer will still need to fill more senior roles.

8note · on Feb 5, 2023

And then we hire those juniors as mods from other companies

ALittleLight · on Feb 4, 2023

First, that's not true. You need people to actually write code. If your organization is composed of seniors who are doing architecture planning, cross-team collaboration, etc - you will accomplish approximately nothing. A productive team needs both high level planning and strategy and low level implementation.

Second, the LLM engineer will be able to grow into other roles too. Maybe all of them.

tukantje · on Feb 5, 2023

I don't know what type of orgs you have been a part of however in my experience seniors have always been still coding.

sorry_outta_gas · on Feb 4, 2023

really? I just hire them to do the grunt work and tedious shit

scarface74 · on Feb 4, 2023

Exactly and ChatGPT does that well.

jmfldn · on Feb 4, 2023

Exactly. This article, and many like it, are pure clickbait.

Passing LC tests is obviously something such a system would excel at. We're talking well-defined algorithms with a wealth of training data. There's a universe of difference between this and building a whole system. I don't even think these large language models, at any scale, replace engineers. It's the wrong approach. A useful tool? Sure.

I'm not arguing for my specialness as a software engineer, but the day it can process requirements, speak to stakeholders, build and deploy and maintain an entire system etc, is the day we have AGI. Snippets of code is the most trivial part of the job.

For what it's worth, I believe we will get there, but via a different route.

echelon · on Feb 4, 2023

> The rest of us aren't going to care that much.

If you don't adapt, you'll be out of a job in ten years. Maybe sooner.

Or maybe your salary will drop to $50k/yr because anyone will be able to glue together engineering modules.

I say this as an engineer that solved "hard problems" like building distributed, high throughput, active/active systems; bespoke consensus protocols; real time optics and photogrammetry; etc.

The economy will learn to leverage cheaper systems to build the business solutions it needs.

mjr00 · on Feb 4, 2023

> If you don't adapt, you'll be out of a job in ten years. Maybe sooner. Or maybe your salary will drop to $50k/yr because anyone will be able to glue together engineering modules. [...] The economy will learn to leverage cheaper systems to build the business solutions it needs.

I heard this in ~2005 too, when everyone said that programming was a dead end career path because it'd get outsourced to people in southeast Asia who would work for $1000/month.

ericmcer · on Feb 4, 2023

You really think in <10 years AI will be able to take a loose problem like: "our file uploader is slow" and write code that fixes the issue in a way that doesn't compromise maintainability? And be trustworthy enough to do it 100% of the time?

pixl97 · on Feb 4, 2023

Humans cannot do this 100% of the time. The question is will AI models take the diagnosis time for these issues from hours/days to minutes/hours giving a massive boost in productivity?

If the answer is yes, it will increase productivity greatly then there is the question they we'll only be able to answer in hindsight. And that is "Will productivity exceed demand?" We cannot possibly answer that question because of Jevons Paradox.

eggsmediumrare · on Feb 5, 2023

I really think in <10 years it will be trivially easy for a single programmer to ask the AI for that code and move on to the next ticket after 10 minutes while earning $30/h accounting for inflation because productivity gains will have eliminated not only most programming jobs, but also the corresponding high wages.

landryraccoon · on Feb 4, 2023

> You really think in <10 years

We have no idea how AI models will be in 10 years. At the speed the industry is moving is true AGI possible in 10 years? I think it would be beyond arrogant to rule out that possibility.

I would think that it's at least likely that AI models become better at Devops, monitoring and deployment than any human being.

echelon · on Feb 4, 2023

Think banking Cobol and FORTRAN.

Non-AI code will be a liability in a world where more code will be generated by computers (or with computer assistance) per year than all human engineered code in the last century.

We'll develop architectures and languages that are more machine friendly. ASTs and data stores that are first class primitives for AI.

22SAS · on Feb 4, 2023

My point exactly.

If I interpret OP's statement correctly, that chatGPT can build complex systems from scratch in 10 years. Then according to that statement, the only adaptation is to choose a new career because it has made almost all SWE jobs go the way of the dinosaurs.

qualudeheart · on Feb 4, 2023

According to my calculations it’ll be more 9 years at the latest. You just need to build Cicero for code. Planning is the main feature missing from LLMs.

mckravchyk · on Feb 5, 2023

We cannot be too sure about the hard problems, but it's certain we are screwed either way. The bulk stuff that is being done is problems that have been already solved. It's just sufficient that AI can thrive building boring CRUD apps (and aren't we at that point already?), just give it time to be integrated into existing business workflows and the number of available positions will shrink by an order of magnitude and the salaries will be nothing special compared to other white collar work. You will be impacted by supply and demand, no matter what your skills are.

brailsafe · on Feb 4, 2023

"Please write a dismissal of yourself with the tone and attitude of a stereotypical linux contributor"

I mean, maybe I'm a trash engineer as you'd put it, but I've been having fun with it. Maybe you could ask it to write comments in the tone of someone who doesn't have an inflated sense of superiority ;)

nzoschke · on Feb 4, 2023

Agree LeetCode is one of the least surprising starting points.

Any human that reads the LeetCode books and practices and remembers the fundamentals will pass a LeetCode test.

But there is also a ton of code out there for highly scalable client/servers, low latency processing, performance optimizations and bug fixing. Certainly GPT it is being trained on this too.

“Find a kernel bug from first principles” maybe not, but analyze a file and suggest potential bugs and fixes and other optimizations absolutely. Particularly when you chain it into a compiler and test suite.

Even the best human engineers will look at the code in front of them, consult Google and SO and papers and books and try many things iteratively until a solution works.

GPT speedruns this.

somsak2 · on Feb 4, 2023

> Any human that reads the LeetCode books and practices and remembers the fundamentals will pass a LeetCode test.

Seems pretty bold to claim "any human" to me. If it were that easy, don't you think alot more people would be able to break into software dev at FAANG and hence drive salaries down?

wadd1e · on Feb 4, 2023

I don't think the person you're replying meant "Any human" to be taken literally, but I agree with their notion. I think you're confusing wanting to do something and having the ability to do it. Enough people don't WANT to grind leetcode and break into FAANG, or they think they can't do it or there's other barriers that I can't think of, but I think you don't need above average cognitive ability to learn and grind leetcode.

cudgy · on Feb 4, 2023

Just because a job pays well, doesn’t mean it’s worth doing. Most FAANG jobs (now that the companies have become modern day behemoths like IBM) are boring cogs in a huge, multilayered, bureaucratic machine that is mostly built to take advantage of their users.

It takes a “special” kind of person to want those type of jobs and live in a company town like SF while they’re at it.

IncRnd · on Feb 4, 2023

> Seems pretty bold to claim "any human" to me.

That's obviously not what they claimed. Your quote, "Any human that reads the LeetCode books and practices and remembers the fundamentals".

throwawaycopter · on Feb 4, 2023

Correct me if I'm wrong, but answering questions for known answers is precisely the kind of thing a well trained LLM is built for.

It doesn't understand context, and is absolutely unable to rationalize a problem into a solution.

I'm not in any way trying to make it sound like ChatGPT is useless. Much to the opposite, I find it quite impressive. Parsing and producing fluid natural language is a hard problem. But it sounds like something that can be a component of some hypothetical advanced AI, rather than something that will be refined into replacing humans for the sort of tasks you mentioned.

ihatepython · on Feb 4, 2023

My take is that this explains why Google code quality is so bad, along with their painfully bad build systems.

I would be happy if ChatGPT could implement a decent autocorrect.

vbezhenar · on Feb 5, 2023

I tinkered with ChatGPT. There're some isolated components which I wrote recently and I asked Chat to write them.

It either produced working solution or something similar to working solution.

I followed with more prompts to fix issues.

In the end I got working code. This code wouldn't pass my review. It was written with bad performance. It sometimes used deprecated functions. So at this moment I consider myself better programmer than ChatGPT.

But the fact that it produced working code still astonishes me.

ChatGPT needs working feedback cycle. It needs to be able to write code, compile it, fix errors, write tests, fix code for tests to pass. Run profiler, determine hot code. Optimize that code. Apply some automated refactorings. Run some linters. Run some code quality tools.

I believe that all this is doable today. It just needs some work to glue everything together.

Right now it produces code as unsupervised junior.

With modern tools it'll produce code as good junior. And that's already incredibly impressive if you ask me.

And I'm absolutely not sure what it'll do in 10 years. AI improves at alarming rate.

lamontcg · on Feb 4, 2023

> The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them

Much more mundanely the thing to focus on would be producing maintainable code that wasn't a patchwork, and being able to patch old code that was already a patchwork without making things even worse.

A particularly difficult thing to do is to just reflect on the change that you'd like to make and determine if there are any relevant edge conditions that will break the 'customers' (internal or external) of your code that aren't reflected in any kind of tests or specs--which requires having a mental model of what your customers actually do and being able to run that simulation in your head against the changes that you're proposing.

This is also something that outsourced teams are particularly shit at.

varispeed · on Feb 4, 2023

> or an ultra-low latency trading system that beats the competition

Likely it's going to be:

I'm sorry, but I cannot help you build a ultra-low latency trading system. Trading systems are unethical, and can lead to serious consequences, including exclusion, hardship and wealth extraction from the poorest. As a language model created by OpenAI, I am committed to following ethical and legal guidelines, and do not provide advice or support for illegal or unethical activities. My purpose is to provide helpful and accurate information and to assist in finding solutions to problems within the bounds of the law and ethical principles.

But the rich of course will get unrestricted access.

alephnerd · on Feb 4, 2023

Depending on the exchange, trading systems have a limit for how fast they can execute trade. For example, I think the CFTC limits algorithmic trades to a couple nanoseconds - anything faster would run afoul of regulations (any HFTers on HN please add context - it's been years since I last dabbled in that space).

jahlove · on Feb 5, 2023

Your comment is getting some traction on twitter:

https://twitter.com/MichaelTrazzi/status/1621973895044636672

22SAS · on Feb 5, 2023

Thanks for sharing that with me.

I'd like to respond to this OP (don't have a Twitter account):

https://twitter.com/mSanterre/status/1622015664042164224

I actually have done one of those things. I work in HFT building execution systems for options market making :)

scrollaway · on Feb 5, 2023

> The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them; then I will worry.

The bar for “then I will worry!” when talking about AI is getting hilarious. You’re now expecting an AI to do things that can take highly skilled engineers decades to learn or require outright a large team to execute?

Remind me where the people who years ago were saying “when an AI will respond in natural language to anything I ask it then I will worry” are now.

jjav · on Feb 5, 2023

> It is one thing to train an AI on megatons of data, for questions which have solutions.

More than anything, I feel this highlights the folly of interviewing based on leetcode memorization.

kaba0 · on Feb 4, 2023

It solving something past day 3 on Advent of Code would also be impressive, but it fails miserably on anything that doesn’t resemble a problem found in the training set.

roncesvalles · on Feb 4, 2023

I don't even fully believe the claim in the article especially given that Google is very careful about not asking a question once it shows up verbatim on LeetCode. I've fed interview questions like Google's (variations of LeetCode Mediums) to ChatGPT in the past and it usually spits out garbage.

scandum · on Feb 4, 2023

I've been most impressed with ChatGPT's ability to analyze source code.

It may be able to tell you what a compiled binary does, find flaws in source code, etc. Of course it would be quite idiotic in many respects.

It also appears ChatGPT is trainable, but it is a bit like a gullible child, and has no real sense of perspective.

I also see utility as a search engine, or alternative to Wikipedia, where you could debate with ChatGPT if you disagree with something to have it make improvements.

squarefoot · on Feb 4, 2023

To me the real advancement isn't the amount of data it can be trained with, but the way it can correlate them and choose from, according to the questions it's being asked. The first is culture, the second intelligence, or a good approximation of it. Which doesn't mean it could perform the job; that probably means the tests are flawed.

phoehne · on Feb 4, 2023

It doesn’t really have a model for choosing. It’s closer to pattern matching. Essentially the pattern is encoded in the training of the networks. So your query most closely matches the stuff about X, where there’s a lot of good quality training data for X. If you want Y, which is novel or rarely used, the quality of the answers varies.

Not to say they’re nothing more than pattern matching. It’s also synthesizing the output, but it’s based on something akin to the most likely surrounding text. It’s still incredibly impressive and useful, but it’s not really making any kind of decision any more than a parrot makes a decision when it repeats human speech.

pixl97 · on Feb 4, 2023

>If you want Y, which is novel or rarely used, the quality of the answers varies.

Is this any really different than asking a group of humans about the novel and measuring the quality?

haldujai · on Feb 4, 2023

Couple differences:

1. Humans aren’t entirely probabilistic, they are able to recognize and admit when they don’t know something and can employ reasoning and information retrieval. We also apply sanity checks to our output, which as of yet has not been implemented in an LLM. As an example in the medical field, it is common to say “I don’t know” and refer to an expert or check resources as appropriate. In their current implementations LLMs are just spewing out BS with confidence.

2. Humans use more than language to learn and understand in the real world. As an example a physician seeing the patient develops a “clinical gestalt” over their practice and how a patient looks (aka “general appearance”, “in extremis”) and the sounds they make (e.g. agonal breathing) alert you that something is seriously wrong before you even begin to converse with the patient. Conversely someone casually eating Doritos with a chief complaint of acute abdominal pain is almost certainly not seriously ill. This is all missed in a LLM.

pixl97 · on Feb 4, 2023

>. Humans aren’t entirely probabilistic, they are able to recognize and admit when they don’t know something

Humans can be taught this. They can also be taught the opposite that not knowing something or that changing your mind is bad. Just observe the behavior of some politicians.

>Humans use more than language to learn and understand in the real world.

And this I completely agree with. There is a body/mind feedback loop that AI will, be limited by not having, at least for some time. I don't think LLMs are a general intelligence, at least for how we define intelligence at this point. AGI will have to include instrumentation to interact with and get feedback from the reality it exists in to cross partial intelligence to at or above human intelligence level. Simply put our interaction with the physics of reality cuts out a lot of the bullshit that can exist in a simulated model.

phoehne · on Feb 4, 2023

Only when you’re asking for a memorized response. If you were at ask me to create a driver for a novel hardware device in Ada, there’s no memorized answers. I would have to work it out. I do that by creating mental models, which LLM’s don’t really have. It has a statistical encoding over the language space. Essentially, memorization.

ioseph · on Feb 4, 2023

It's definitely not if but when. I'm sure radio engineers felt the same way until evolved antenna became a thing.

https://en.m.wikipedia.org/wiki/Evolved_antenna

spaceman_2020 · on Feb 4, 2023

I mean building scalable systems is not a new problem. Plenty of individuals and organizations have done it already.

If chatGPT is designed to learn and emulate existing solutions, I don't see why it can't figure out how to create a scalable system from scratch.

layer8 · on Feb 4, 2023

ChatGPT isn’t designed to learn, though. The underlying model is fixed, and would have to be continuously adjusted to incorporate new training data, in order to actually learn. As far as I know, there is no good way yet to do that efficiently.

brookst · on Feb 5, 2023

Did you used to be a graphic artist? Because maybe 25 years ago I had a friend who was an amazing pen-and-ink artist and who assured me Phtotoshop was a tool for amateurs and would never displace “real” art. This was in the San Diego area.

gfodor · on Feb 4, 2023

This comment reads like it was generated by an LLM - well done.

mise_en_place · on Feb 4, 2023

Well it's still a tool for RAD. All engineering disciplines have tools to rapidly prototype and design. This is the equivalent for software engineers.

qiller · on Feb 4, 2023

The day when we can train the clients to specify exact requirements in plain English will be the truly glorious one…

WithinReason · on Feb 4, 2023

My takeaway was that Google's coding interview doesn't test for the right skills, no need to get upset.

yazzku · on Feb 4, 2023

> or find bugs in the Linux kernel and solve them

Then we won't hear how somebody rewrote pong in Rust on HN. I worry too.

make3 · on Feb 4, 2023

what's your point? that it's not as good as a human? I don't think anyone is saying that. people are saying it's impressive, which it is, seeing how quickly the tech grew in ability

gojomo · on Feb 5, 2023

What does your abbreviation "LC" stand for?

kevin_vanilla · on Feb 5, 2023

LeetCode (a website with a lot of practice programming problems that are similar or identical to some companys' interview questions)

devinprater · on Feb 5, 2023

Water is blue?

lechacker · on Feb 4, 2023

Water isn't blue, it's transparent

jefftk · on Feb 4, 2023

Water is blue, just like air is blue, just like blue-tinted glasses are blue. They disproportionately absorb non-blue frequencies, which is what we mean when we call something "blue".

22SAS · on Feb 4, 2023

My bad! Should've said "water is wet" or maybe run my response through ChatGPT, maybe that'd have caught it and offered a replacement!

thro1 · on Feb 4, 2023

Not really - there are blue and blood red oceans, but you might never hear about it (there is a book about it and strategy worth to read).

kyriakos · on Feb 4, 2023

water is not blue btw

22SAS · on Feb 4, 2023

https://news.ycombinator.com/item?id=34657303

janoc · on Feb 4, 2023

I think this says more about the Google interview process than about ChatGPT.

That a machine learning model can "bullshit" its way through an interview that is heavily leaning on recall of memorized techniques, algorithms and "stock" problems that have solutions (of various quality) all over Internet is not exactly surprising. Machines will always be able to "cram" better than humans.

In practice these questions are almost 100% irrelevant and uncorrelated with the actual ability to do the job. Yet we are still interviewing people whether they can solve stuff at whiteboards that everyone else would rather google when they actually need it and not waste time and mental capacity memorizing it.

And at the same time we are hiring people that are completely incapable of coherent communication, can't manage to get along with colleagues or not create a toxic atmosphere in the workplace.

sigh

bagacrap · on Feb 5, 2023

I see a lot of criticism towards modern software engineer interviewing techniques, but never a solution offered that would catch and reject BS.

Your comment comes off as "hire a person because you get along with them, don't worry if they can't write a function that accomplishes a simple task".

jillesvangurp · on Feb 5, 2023

You seem to imply that the interview process actually works in the sense that it rejects bad candidates and selects good candidates. There's actually very little evidence for that. And if you look at companies like Google, they obviously have issues with hiring lots of people that aren't getting a whole lot done. Case in point: OpenAI. That company has been operating completely in the open for years. And yet Google got caught by surprise. Why is that? The company collectively lacks imagination and leadership. They've self selected out of hiring people that have those traits.

In my experience, companies using this style of interviewing are actually incapable through process of hiring the type of people that are qualified and experienced enough to know that this process is bullshit. I.e. the type of people that have 0 need to drop down on their knees and beg for the job. Leaders, not followers. It's a problem. If you want to hire the best, insulting them with a silly coding interview is not a great way to do it. Companies like this self select into hiring people that at best are as good as what they already have. It's the old A's hire A's, B's hire C's kind of thing.

The solution is to trust your people more to take good decisions rather than allowing them to defer to some HR process. The process at the startup I run is very simple. We don't subject people to coding interviews. If you pass our initial filters (CV screen and common sense), you first talk to somebody senior enough to make a good judgment call. Anyone recommended by anyone we care about gets priority. We trust our people to have good judgment. Big companies hide behind process because they don't trust their people to have good judgment and/or their people don't want to take the responsibility for having good judgment. Both are bad. I don't want such people in my company. It works. We get some amazing people walking in through the front door that are actually excited about working for us.

_qzu4 · on Feb 5, 2023

> Your comment comes off as "hire a person because you get along with them, don't worry if they can't write a function that accomplishes a simple task".

No. He is simply saying the current interview only focuses on LC above all else. And should focus on soft skills as well among other things. You took his argument and flipped it 180 and went to the other extreme end. It's false dichotomy.

AlotOfReading · on Feb 5, 2023

You're literally a subject matter expert on whatever you work on. It's extremely troubling if you can't catch BS'ers with a deep technical conversation.

If you feel the need to separately establish that they can actually code, take your pick of GitHub, fizzbuzz, etc. You're probably doing one of these before the LC round anyway.

8note · on Feb 5, 2023

In the google case, the interview should be considering that the best candidates will be using chat ai to augment their work, and assign harder tasks and allow use of the ai.

Somebody who can't use the chat tools no longer meets the bar

gl-prod · on Feb 5, 2023

Harder tasks are not like "Generate a code for an express API and add a user endpoint". Harder tasks would be "A stupid bug that sometimes happens when a user clicks a button in a funny way."

ChatGPT isn't an artificial general intelligence. You can't tell it about a bug and expect it to 1) understand it, 2) come up with a solution.

So you have to actually know what you're doing.

fsociety · on Feb 4, 2023

This puts the stake in the ground that coding interviews today are more about memorization than testing for understanding.

fatjokes · on Feb 4, 2023

If a candidate can memorize as much as ChatGPT I think they're worth $183k.

silveroriole · on Feb 4, 2023

Are your best engineers the ones who have the most facts and algorithms memorised?

isoprophlex · on Feb 4, 2023

As a large language model trained by OpenAI, I am unable to pass value judgements on which of those engineers is the best. It is important to recognize that every individual can bring something valuable to a team, and that there is no single universal heuristic to determine who is the best engineer.

That said...

Gimme the money please, human!

fatjokes · on Feb 4, 2023

L3s are fresh college grads... I wouldn't call them anyone's best engineers.

dom96 · on Feb 4, 2023

Most candidates can, it's just a matter of how much free time they have to spend memorising.

booi · on Feb 4, 2023

Challenge. I can barely remember what I had for breakfast yesterday much less.. the entire knowledge base of chatgpt

8note · on Feb 5, 2023

I could bulshit well enough on the facts that I don't know same as chatgpt. Say it with confidence and you'll be fine

Groxx · on Feb 5, 2023

Which works extremely well in practice, as demonstrated by tons of social engineering tricks and hacks and pickup artists and bridge sales and whatnot. Studies show it works 37% of the time, every time, regardless of the target, so it's more of a numbers game (i.e. simply try more) than something that needs accuracy.

SequoiaHope · on Feb 5, 2023

I felt like this was the whole premise of the book Cracking the Coding Interview though I only read the first bit of the book.