So, a LLM, trained extensively on StackOverflow and other data (possibly the plethora of LC solutions out there), is fed a bunch of LC questions and spits out the correct solutions? In other news, water is blue.
It is one thing to train an AI on megatons of data, for questions which have solutions. The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them; then I will worry.
Till then, these headlines are advertising for Open AI, for people who don't understand software or systems, or are trash engineers. The rest of us aren't going to care that much.
If it helps, this likely is coming. I think we have a tendency to mentally move the goalposts when it comes to this kind of thing as a self-defense mechanism. Years ago this would have been a similar level of impossibility.
Since all a codebase like that is is a kind of directed graph, then augmentations to the processing of the network to allow for the simultaneous parsing of and generation of this kind of code may not be as far off as you thinking.
I say this as an ML researcher of coming up and around the bend towards 6 years of experience in the heavily technical side of the field. Strong negative skepticism is an easy way to bring confidence and the appearance of knowledge, but it also can have the downfall of what has happened in certain past technological revolutions -- and the threat is very much real here (in contrast to the group that believes you can get AGI from simply scaling LLMs, I think that is very silly indeed).
Thank you for your comment, I really appreciate it and the discussion it generated and appreciate you posting it. Replying to it was fun, thank you.
I've worked in ML for awhile (on the MLOps side of things) and have been in the industry for a bit, and one thing that I think is extremely common is for ML researchers to grossly underestimate the amount of work needed to make improvements. We've been a year away from full self driving cars for the last six years, and it seems like people are getting more cautious in their timing around that instead of getting more optimistic. Robotic manufacturing- driven by AI- was supposedly going to supplant human labor and speed up manufacturing in all segments from product creation to warehousing, but Amazon warehouses are still full of people and not robots.
What I've seen again and again from people in the field is a gross underestimation of the long tail on these problems. They see the rapid results on the easier end and think it will translate to continued process, but the reality is that every order of magnitude improvement takes the same amount of effort or more.
On top of that there is a massive amount of subsidies that go into training these models. Companies are throwing millions of dollars into training individual models. The cost here seems to be going up, not down, as these improvements are made.
I also think, to be honest, that machine learning researchers tend to simplify problems more than is reasonable. This conversation started with "highly scalable system from scratch, or an ultra-low latency trading system that beats the competition" and turned into "the parsing of and generation of this kind of code"- which is in many ways a much simpler problem than what op proposed. I've seen this in radiology, robotics, and self driving as well.
Kind of a tangent, but one of the things I do love about the ML industry is the companies who recognize what I mentioned above and work around it. The companies that are going to do the best, in my extremely bias opinion, are the ones that use AI to augment experts rather than try to replace them. A lot of the coding AI companies are doing this, there are AI driving companies that focus on safety features rather than driver replacement, and a company I used to work for (Rad AI) took that philosophy to Radiology. Keeping experts in the loop means that the long tail isn't as important and you can stop before perfection, while replacing experts altogether is going to have a much higher bar and cost.
This is a bit like seeing Steve Mann's wearable computers over the years ( https://cdn.betakit.com/wp-content/uploads/2013/08/Wearcompe... ) and then today anyone with a smartphone and smart watch has more computing power and more features than most of his gear ever had, apart from the head mounted screen. More processing power, more memory, more storage, more face recognition, more motion sensing, more GPS, longer runtime on battery, more bandwidth and connectivity to e.g. mapping, more assistants like Google Now and Siri.
And we still aren't at a level where you can be doing a physical task like replacing a laptop screen and have your device record what you're doing, with voice prompts for when you complete different stages, have it add markers to the recording, track objects in the scene like and solve for questions like 'where did that longer screw go?' or 'where did this part come from?' and have it jump to the video where you took that part out. Nor reflow the video backwards as an aide memoire to reassembling it. Or do that outside for something like garage or car work, or have it control and direct lighting on some kind of robot arm to help you see, or have it listen to the sound of your bike gears rattle as you tune them and tell you or show you on a graph when it identifies the least rattle.
Anything a human assistant could easily do, we're still at the level of 'set a reminder' or 'add to calendar' rather than 'help me through this unfamiliar task'.
Wow - Steve Mann - haven't checked what he's doing in ages - real blast from the past :-) I was really disappointed the AR/VR company he was with went under - I had really high hopes for it.
RE: changing you laptop screen. My buddy wants an 'AR for Electronics' that can zoom in on components like a magnifying glass (he wants head mounted), identify components by marking/color/etc and call up schematics on demand. So far, nothing seems to be able to do that basic level of work.
It really depends on what you're talking about. Individual components can often be automated fairly successfully, but the actual assembly of the components is much harder. Even in areas of manufacturing where it's automated you have to do massive amounts of work to get it to that point, and any changes can result in major downtime or retooling.
AI companies such as Vicarious have been promising AI that makes this easier. Their idea was that generic robots with the right grips and sensors can be configured to work on a variety of assembly lines. This way a factory can be retooled between jobs quicker and with less cost.
Lookup lights out manufacturing. There are factories that often run whole days in the dark because there's no point turning on the nights if there's no one around
Not really. Although running CNC milling machines and lathes unattended at night is reasonably common. Day shift sets them up, and they cut metal all night.
Fanuc, the robot manufacturer, famously does run a lights-out factory, and has since 2001. It was the dream of Fanuc's founder. Baosteel now has a lights-out steel coiling facility. Both of these are more PR than cost effective.
There are many factories where there are very, very few people for large rooms full of machines, though.
You have just described Pareto's principle[0] the 80/20 rule. It takes 20% of the effort to get to 80% but it then takes 80% of the the effort to finish the final 20%.
Ah, the good ol "A(G)I will arrive in 10 years!" --For the past 50+ years, basically.
It's a cautionary tale to people who are working in ML to be not too optimistic on "the future", but in my opinion being cautiously optimistic(not on AGI though) isn't harmful by itself, and I stand by that. Well at least until we hit the next wall and plunge everyone into another AI winter(fourth? fifth?) again.
As a plus, we do actually see some good progress that benefited the world like in biotech. Even though we are still mostly throwing random stuffs at ML to see if it works. Time will tell I guess.
Kurzweil gets a lot of flack for this sort of thing, he's generally presented as the ridiculous hype man for AI. And yet, he bet in 2002 that an AI would pass the Turing test by 2029. (And this is actually a more conservative prediction than "we will have AGI by 2029.") And looking at GPT3 it seems like he is probably going to win that bet.
I think the big revolution of the last few years has been to recognize that we'll likely get robots that can pass the turing test well before we get full self driving vehicles that can run anywhere there are basically ordinary paved roads.
I think even three years ago, most people would have thought the reverse.
So Kurzweil was imagining the turing test as the capstone to a decade of more and more capable ai products, not as "kind of early interesting success that may (or may not) presage really useful AI."
("The Turing test" is a pretty hazy target. I have no doubt that a chatgpt that was not trained to loudly announce that it was an AI could convince lots of people that it's a real human, right now. I think it's also the case that people with some experience with it could pretty quickly find ways to tell what it is.)
The Turing test has always been hazy - I don't think it's something we'll consider "passed" until at least a clear majority consider it passed (if not substantially further).
Otherwise you risk claiming ELIZA passed it, because a couple people thought so. Or that one Google employee this time.
Yes, that's what I was trying to say in the last paragraph. The Turing Test was an interesting thought experiment, not, like, an actual test. It's never been very clear how to operationalize it, and it's clear that Turing wasn't imagining how easily you can actively fool people. He was more making a point that we don't have an internal definition of intelligence -- it's not like multiplication where you can examine the underlying process and say, "Well, did it do this correctly?" You can only look at the results.
Good point, I do appreciate this comment. Thanks for adding this. It is is interesting in how it very much appears that he will be correct, but instead in a different way maybe than most of us would reasonably have guessed at the time.
Working out the engineering challenges will probably take a decade extra, but I wouldn't listen to the ML researchers' opinions on this issue; the evidence that they are in the drivers seat is shaky. We're still seeing exponential gains in processing power and we're closing in on order-of-magnitude amounts of processing power being available in silicone as in a human brain. There is a pretty decent chance that there is some magic threshold around there where all these tasks become easy with current algorithms.
I can understand that. I think that might be somewhat of a quick generalization. There are tendencies of people in the field to sometimes jump to rapid conclusions, but that is not researchers at all or in this case, me. I tend to be incredibly conservative, for example, and I have tangled with a number of "real world" systems enough to know some of the intricacies (though not at the edge).
If I were to make a point as to why your notes on self-driving cars and in-warehouse robots may not transfer to the case of software development, it's that they are fundamentally two very different problems with very different issues attached to them. It unfortunately is very much apples to oranges. They are both NP-hard but very different kinds of NP-hard.
A software program is a closed-loop target, though it is NP-hard. But we're optimizing for a different kind of metric here that is well-defined. Any kind of self-directed reinforcement-or-otherwise autoregressive-in-the-world algorithm is going to have an extraordinarily long tail of edge cases.
What I was talking about when I mentioned the geometry of the problem is not the parsing of the code, but the geometry of a near-optimal solution. Certainly, scale will be expensive, but Sutton is our friend here. That's why it's more "trivial" than problems that require humans in the loop -- you don't need humans to parse, structure, generate, and evaluate the data flow of a software code base, though admittedly if models like RHLF become popular as you noted, the endpoints that generate code under those geometric constraints -- those will become extremely expensive.
I think the geometric problem is very hard but the hurdle of scaled language models is more technically impressive to me.
What's nice is that unlike needing to generate a long, 1d story, too, there's more robustness with a huge field of possibility that's had years of work on the software side of things. It's not that it's going to be easy, but I think we've all grown as we've seen how hard self-driving cars are, and it's just not that kind of scenario, since all consequences of the 'world' within the repo-generation case are (for the most part) self-contained.
I hope that helps elucidate the problems a bit. To me, my optimism is much more rare, and only generally when I feel like I have a solid grasp of the fundamentals of it enough (i.e. I roughly know deliverability and have decent known error bounds on the sub-problems).
That said, I heartily agree with you that when all else fails -- assistive is good. What I see a "complete solution" doing well is creating a Kolmogorov-minimal, complete starting point and things evolving from there. Whether that works or not remains to be seen.
I don't think ChatGPT or its successors will be able to do large-scale software development, defined as 'translating complex business requirements into code', but the actual act of programming will become more one of using ML tools to create functions, and writing code to link them together with business logic. It'll still be programming, but it will just start at a higher level, and a single programmer will be vastly more productive.
Which, of course, is what we've always done; modern programming, with its full-featured IDEs, high level languages, and feature-rich third-party libraries is mostly about gluing together things that already exist. We've already abstracted away 99% of programming over the last 40 years or so, allowing a single programmer today to build something in a weekend that would have taken a building full of programmers years to build in the 1980s. The difference is, of course, this is going to happen fairly quickly and bring about an upheaval in the software industry to the detriment of a lot of people.
And of course, this doesn't include the possibility of AGI; I think we're a very long way from that, but once it happens, any job doing anything with information is instantly obsolete forever.
That's my assumption as well - the human programmers will far more productive, but they'll still be required because there's no way we can take the guard rails off and let the AI build - it'll build wrong unit tests for wrong functions which create wrong programs and will require humans to get it back on track.
I think it is really hard to say where all this goes right now when we currently don't even have good quantitative reasoning.
10 years ago we were still working on MNIST prediction accuracy. 10 years forward from here all bets are off. If the model has super human quantitative reasoning and a mastery of language I am not sure how much programming we will be doing compared to moving to a higher level of abstraction.
On the other hand, I think there will be so many new software jobs because of the volume of software built over the next 20 years. The volume of software built over the next 20 years is probably unimaginable sitting where we are.
I don't think anyone can say what's going to happen in 10 years, but what I do know is if you look back people have been saying programmers will be obsolete in 10 years for way longer than a decade.
I could see IDEs for AI, where you manipulate ways to input prompts (natural Landis language, weighted keywords, audio..) and selection of methods (chatgpt, whatever model will come for diagrams, visual models, audio ones..). Then basically visually program outputs, add tests you want to use to validate and feed back, multimodal output views..
I think you’re right in one sense, and we both agree LLMs are not sufficient. I think they are definitely the death knell for the junior python developer that slaps together common APIs by googling the answers. The same way good, optimizing C, C++, … compilers destroyed the need for wide-spread knowledge of assembly programming. 100% agreed on that.
Those are the most precarious jobs in the industry. Many of those people might become LLM whisperers, taking their clients requests and curating prompts. Essentially becoming programmers over the prompting system. Maybe they’ll write a transpiler to generate prompts? This would be par of the course with other languages (like SQL) that were originally meant to empower end-users.
The problem with current AI generated code from neural networks is the lack of an explanation. Especially when we’re dealing with anything safety critical or with high impact (like a stock exchange), we’re going to need an explanation of how the AI got to its solution. (I think we’d need the same for medical diagnosis or any high-risk activity). That’s the part where I think we’re going to need breakthroughs in other areas.
Imagine getting 30,000-ish RISCV instructions out of an AI for a braking system. Then there’s a series of excess crashes when those cars fail to brake. (Not that human written software doesn’t have bugs, but we do a lot to prevent that.). We’ll need to look at the model the AI built to understand where there’s a bug. For safety related things we usually have a lot of design, requirement, and test artifacts to look at. If the answer is ‘dunno - neural networks, ya’ll’, we’re going to open up serious cans of worms. I don’t think an AI that self evaluates its own code is even on the visible horizon.
I don't think chatgpt lacks an explanation. It can explain what it's doing. It's just that it can be completely wrong or the explanation may be correct and the code wrong.
I gave some code to ChatGPT asking to simplify it and it returned the correct code but off by one. It was something dealing with dates, so it was trivial to write a loop checking for each day if the new code matched in functionality the old one.
You will never have certainty the code makes any sense if it's coming from one of these high tech parrots.
With a human you can at least be sure the intention was there.
It’s a very sophisticated form of a recurrent neural network. We used to use those for generating a complete image based on a partial image. The recurrent network can’t explain why it chose to reproduce one image instead of another. Nor can you look at the network and find the fiddly bit that drive that output. You can ask a human why they chose to use an array instead of a hash map, or why static memory allocation in this area avoids corner cases. ChatGPT simply generates the most likely text as an explanation. That’s what I mean about being able to explain something.
Ah the HN echo chamber again! Please visit your local non FAAAM (or what it is now?) fortune 1000, pick a senior dev randomly and work with them for week. Chatgpt is vastly better now, today. Faster, does not need sleep, rest, politeness or handholding, can explain itself (sure it’s wrong often but less wrong than the dev you picked while actually being able to use proper syntax and grammar, unlike the dev you picked) and is, of course, let’s not deny it, way cheaper.
I’ve worked with plenty of jr developers at east coast government contractors, arguably the bottom of the barrel. I would still rather put their code into production, even without unit tests, than I would ChatGPT.
ChatGPT is only cheap if you don’t need its code to do anything of any particular value. It’s a seemingly ideal solution to collage homework for example. But professionally people write code to actually achieve something, this is why programmers actually get paid well in the first place. The point isn’t LOC the point is solving some problem.
And junior devs are horrible at knowing what problem to solve and how to solve it without handholding. I am working on a relatively complex DevOps/“cloud application modernization” project. Where the heavy lifting is designing the process and gathering requirements. But there are a lot of 20-40 line Lambdas and Python/boto3 (AWS SDK), yaml/json wrangling, dynamic Cloudformation creating scripts.
I was able to give ChatGPT the requirements for all of them. The types of bugs I found during the first pass:
- the AWS SDK and the underlying API only returns 50 results in one call most of the time. From the SDK you have to use the built in “paginators”. ChatGPT didn’t use them the first time. But once I said “this will only return the first 50 results”. It immediately corrected the script and used the paginator. I have also had to look out for similar bugs from junior devs.
- The usual yaml library for Python doesn’t play nicely with CloudFormation templates because of the function syntax that starts with an “!”. I didn’t know this beforehand. But once I told ChatGPT the error, it replaced the yaml handling with cfn-flip.
- I couldn’t figure out for the life of me how to combine the !If function in CloudFormation with a Condition, and a Yaml block that contain another !Select function with two arguments. I put the template block without the conditional and told ChatGPT “make the VPC configuration optional based on a parameter”. It created the Parameter section, the condition and the appropriate Yaml.
I’ve given similar problems to interns/junior devs before and ChatGPT was much better at it.
You really think that Jr devs could crank out the same code faster than ChatGPT? I couldn’t crank out the same code and you couldn’t either. The most you can hope from JR devs (even the ones I have met at BigTech) is that they don’t eat the chalk during the first 3-6 months.
As for now, issue with ChatGPT is that it doesn't really crank anything. It instantly produces answer for given input. While programmer can crank things. For example I asked ChatGPT to write a function which returns UUID generated with some rules. It spewed the solution. It looked like a correct one but when I run it, it returned wrong answer. I worked with ChatGPT for some time and it corrected its code. But I would expect from junior developer to actually run his code and check output.
Now if ChatGPT would be able to actually work on the problem rather than returning generated text, that would be a completely different beast. And I think that this workflow will come in the near future because it's pretty obvious idea. Get task specification, generate tests, generate code, fix code until tests work, refactor code until it meets some standards, etc.
> I think that this workflow will come in the near future because it's pretty obvious idea. Get task specification, generate tests, generate code, fix code until tests work, refactor code until it meets some standards, etc.
ChatGPT probably works great if you use it to speedrun normal best practices in software engineering. Make it start by writing tests given a spec, then make it write code that will pass the specific tests it just wrote. I’m guessing it’ll avoid a lot of mistakes, much like any engineer, if you force it to do TDD.
You can loop chatgpt around automatically, asking it to write tests and reason about the code for a few iterations; in my experience it auto corrects the code like a human would after some ‘thinking’ time. Of course the code has to run automatically and errors fed back, like with a human. It works fine though, without human input after some prompting work.
Always hire a senior developer without experience for junior role. By that I mean hire a developer who knows how to program but lacks specific experience or has no formal experience at all.
Doesn’t this only work for relatively contrived situations? I can tell a jr dev to go and add some minor feature in a codebase, put it behind a flag, and add tracking/analytics to it. I can point to the part of the application I want the feature to be added on the screen and the jr devs are often able to find it on their own. I haven’t seen chatGPT do anything like that and I don’t think there is a way to provide it with the necessary context even if it has the capability.
For me it works for small stand alone utility scripts. But the most impressive thing I was able to get it to do was.
“Given an XML file with the format {[1]} and a DynamoDB table with two fields “Key”, “Value”, write a Python script that replaces the Value in the xml file when the corresponding key is found. Use argparse to let me specify both the input xml file and the output XML”
It spit out perfect Python code. I hadn’t used XML in well over a decade and I definitely didn’t know how to read xml in Python. I didn’t want to bother about learning.
I actually pasted an XML sample like the link below.
Wait, you think "junior developers are actually moderately competent" only makes sense within the HN echo chamber?
I think you have that exactly backwards.
Most junior developers most places may not have the experience of a senior developer, and thus be able to do the translation from business logic to code quite as fast and accurately the first time, but this kind of derogatory attitude toward them is incredibly condescending and insulting.
ChatGPT doesn't know what it's doing. It doesn't know anything, and unlike the most junior developer barely trained, it can't even check its output to see if it matches the desired output.
And for goodness' sake, get rid of the absurd idea that all the competent developers are in Silicon Valley. That's even more insulting to the vast majority of developers in the entire world.
On the other hand you don’t want to manually program all the joints of a robot to move through any terrain. You just convert a bunch of cases to a language to make the robot fluent in that
Translating an idiomatic structured loop into assembly used to be an "L3" question (honestly, probably higher), yet compilers could do it with substantially fewer resources than and decades before any of these LLMs.
While I wouldn't dare offer particular public prognostications about the effect transformer codegens will have on the industry, especially once filtered through a profit motive - the specific technical skill a programmer is called upon to learn at various points in their career has shifted wildly throughout the industry's history, yet the actual job has at best inflected a few times and never changed very dramatically since probably the 60s.
I agree this would have been thought to be impossible a few years ago, but I don't think it's necessarily moving the goalposts. I don't think software engineers are really paid for their labour exactly. FAANG is willing to pay top dollar for employees, because that's how they retain dominance over their markets.
Now you could say that LLMs enable Google to do what it does now with fewer employees, but the same thing is true for every other competitor to Google. So the question is how will Google try and maintain dominance over it's competitors now? Likely they will invest more heavily in AI and probably make some riskier decisions but I don't see them suddenly trying to cheap out on talent.
I also think that it's not a zero sum game. The way that technology development has typically gone is the more you can deliver, the more people want. We've made vast improvements in efficiency and it's entirely possible that what an entire team's worth of people was doing in 2005 could be managed by a single person today. But technology has expanded so much since then that you need more and more people just to keep up pace.
Google already published a paper claiming to have deployed an LLM for code generation at full scale to its tens of thousands of software engineers, years ago.
I'm kind of interested in how AI is going to interface with the world. Humans have a lot of autonomy to change the physical world they're in; from rearranging furniture, to building structures, to visiting other worlds. Why isn't AI doing any of that stuff?
As programmers, we keep talking about programming jobs and how AI will eliminate them all. But nobody is talking about eliminating other jobs. When will a robot vacuum be able to clean my apartment as quickly as I? Why isn't there a robot that takes my garbage out on Tuesday night? When will AI plan and build a new tunnel under the Hudson River for trains? When will airliners be pilotless? If AI can't do this stuff, what makes software so different? Why will AI be good at that but not other things? It seems like the only goal is to eliminate jobs doing things people actually like (art, music, literature, etc.), and not eliminate any tedium or things that is a waste of humanity's time whatsoever.
(On the software front, when will AI decide what software to build? Will someone have to tell it? Will it do it on its own? Why isn't it doing this right now?)
My takeaway is that this all raises a lot of questions for me on how far along we actually are. Language models are about stringing together words to sound like you have understanding, but the understanding still isn't there. But, I suppose we won't know understanding until we see it. Do we think that true understanding is just a year or two away? 10? 50? 100? 1000?
Household tasks can involve a robot moving with enough kinetic energy to maim or kill a human (or pet) in unlucky circumstances. And we'll quickly become habituated to their presence and so careless around them. Even a Roomba could knock granny down the stairs if it isn't careful about its environment.
You could make the same argument as with self-driving cars, that people already get hurt this way and maybe the robot is in fact safer. But it's still a hard sell that Sunny-01 has only accidentally killed 1/10 as many children as parents have—the number has to be more like zero.
Let's solve automating trains first then we can do airliners.
> I think we have a tendency to mentally move the goalposts when it comes to this kind of thing as a self-defense mechanism. Years ago this would have been a similar level of impossibility.
Define "we". There are all kinds of people with all kinds of opinions. I didn't notice any consensus on the questions of AI. There are people with all kinds of educations and backgrounds on the opposite sides and in-between.
I mean, you can just as easily make the claim that researchers shift goalposts as a "self-defense" mechanism.
For example...
Hows that self-driving going? Got all those edge-cases ironed out yet?
Oh, by next year? Wierd, that sounds very familiar...
Remember about Tesla's autopilot was released 9 years ago, and the media began similar speculation about how all of the truckers were going to get automated out of a job by AI? And then further speculation about how Taxi drivers were all going to be obsolete?
Those workers are the ones shifting the goal posts though as a "self-defense mechanism", sure, sure... lol.
Well, there's a difference between the situation with self-driving and with language models.
With self-driving, we barely ever saw anything obviously resembling human abilities, but there was a lot of marketing promising more.
With language models when GPT-2 came out everyone was still saying it is a "stochastic parrot" and even GPT-3 was one. But now there's ChatGPT, and every single teenager is aware that that tool is capable of replacing them with their school assignments. And as a dev I am aware that it can write code. And yet not many people expected any of this to happen this year, neither were those capabilities promised at any point in the past.
So if anything, self-driving was always overhyped, while the LLMs are quite underhyped.
We actually saw a lot resembling human abilities. It just turns out that it‘s not enough to blindly rely on it in all situations and so here we are. And it‘s quite similar with LLMs.
One difference, though, is that it‘s economically not much use to have self-driving if the backup driver has to be in the car or present. While partially automating programming would make it possible to use far less programmers for the same amount of work.
I've been hearing this "you're moving the goalposts" argument for over 20 years now, ever since I was a college student taking graduate courses in Cognitive Science (which my University decided to cobble together at the time out of Computer Science, Psychology, Biology, and Geography), and I honestly don't think it is a useful framing of the argument.
In this case, it could be that you are just talking to different people and focusing on their answers. I am more than happy to believe that Copilot and ChatGPT, today, cause a bunch of people fear. Does it cause me fear? No.
And if you had asked me five years ago "if I built a program that was able to generate simple websites, or reconfigure code people have written to solve problems similar to ones solved before, would that cause you to worry?" I also would have said "No", and I would have looked at you as crazy if you thought it would.
Why? Because I agree with the person you are replying to (though I would have used a slightly-less insulting term than "trash engineers", even if mentally it was just as mean): the world already has too many "amateur developers" and frankly most of them should never have learned to program in the first place. We seriously have people taking month or even week long coding bootcamps and then thinking they have a chance to be a "rock star coder".
Honestly, I will claim the only reason they have a job in the first place is because a bunch of cogs--many of whom seem to work at Google--massively crank the complexity of simple problems and then encourage us all to type ridiculous amounts of boilerplate code to get simple tasks done. It should be way easier to develop these trivial things but every time someone on this site whines about "abstraction" another thousand amateurs get to have a job maintaining boilerplate.
If anything, I think my particular job--which is a combination of achieving low-level stunts no one has done before, dreaming up new abstractions no one has considered before, and finding mistakes in code other people have written--is going to just be in even more demand from the current generation of these tools, as I think this stuff is mostly going to encourage more people to remain amateurs for longer and, as far as anyone has so far shown, the generators are more than happy to generate slightly buggy code as that's what they were trained on, and they have no "taste".
Can you fix this? Maybe. But are you there? No. The reality is that these systems always seem to be missing something critical and, to me, obvious: some kind of "cognitive architecture" that allows them to think and dream possibilities, as well as a fitness function that cares about doing something interesting and new instead of being "a conformist": DALL-E is sometimes depicted as a robot in a smock dressed up to be the new Pablo Picasso, but, in reality, these AIs should be wearing business suits as they are closer to Charles Schmendeman.
But, here is the fun thing: if you do come for my job even in the near future, will I move the goal post? I'd think not, as I would have finally been affected. But... will you hear a bunch of people saying "I won't be worried until X"? YES, because there are surely people who do things that are more complicated than what I do (or which are at least different and more inherently valuable and difficult for a machine to do in some way). That doesn't mean the goalpost moved... that means you talked to a different person who did a different thing, and you probably ignored them before as they looked like a crank vs. the people who were willing to be worried about something easier.
And yet, I'm going to go further: if the things I tell you today--the things I say are required to make me worry--happen and yet somehow I was wrong and it is the future and you technically do those things and somehow I'm still not worried, then, sure: I guess you can continue to complain about the goalposts being moved... but is it really my fault? Ergo: was it me who had the job of placing the goalposts in the first place?
The reality is that humans aren't always good at telling you what you are missing or what they need; and I appreciate that it must feel frustrating providing a thing which technically implements what they said they wanted and it not having the impact you expected--there are definitely people who thought that, with the tech we have now long ago pulled off, cars would be self-driving... and like, cars sort of self-drive? and yet, I still have to mostly drive my car ;P--then I'd argue the field still "failed" and the real issue is that I am not the customer who tells you what you have to build and, if you achieve what the contract said, you get paid: physics and economics are cruel bosses whose needs are oft difficult to understand.
I think OP set relatively simple goals. How long until AI can architect, design, build, test, deploy and integrate commercial software systems from scratch, and handle users submitting bug reports that say "The OK button doesn't work when I click it!"?
Not to be the devil's advocate or something, but, I hope you understand that the vast majority of FAANG engineers CAN'T build any highly scalable system from scratch, much less fix bugs in the Linux kernel... So that argument feels really moot to me... If anything this just shows hopefully that gatekeeping good engineers by putting these LC puzzles as a requirement for interviews is a sure way to hire a majority of people who aren't adding THAT MUCH MORE value than a LLM already does... Yikes... On top of that, they'll be bad team players and it'll be a luck if they can string together two written paragraphs...
I agree, people in general overestimate the skills and input of your average developer where many (even in FAANG) are simply not capable of creating anything more than some simple CRUD or tooling script without explicit guidance.
And being good or very good with algorithms and estimating big-O complexity doesn't make you (it can help) a good software engineer.
That's the general issue with AI skeptics. Most of them, especially highly educated ones, overestimate capabilities of common folk. Frankly, some even overestimate their own. E.g. almost none of them seem to be bothered that while GPT might not provide expert answers in their field, the same GPT is much more capable in other fields than they are (e.g. the "general" part in the "General Artificial Intelligence").
True, the thing is there's nothing like "General Artificial Intelligence" and humans are expert systems optimized to the goal of survival, which in turn gets chopped up into a plethora of sub-goal optimization from which most probably the "general" adjective pops up.
It doesn't really matter if it's "general" as long as it actually is useful. It doesn't have to write whole systems from scratch, just making the average dev 20-30% faster is huge.
If it was easy to make an LLM that quickly parsed all of StackOverflow and described new answers that most of the time worked in the timeframe of an interview, it would have been done by now.
ChatGPT is clearly disruptive being the first useful chatbot in forever.
It kind of depends on the frame of the solution. Google can answer leetcode questions, leetcode's answers section can answer them as well. If ChatGPT is solving them, that's one thing, but if it's just mapping the question to a solution found somewhere, then not so impressive.
The hiring tests are designed to serve as a predictor for human applicants. How well an LLM does on them doesn’t necessarily say anything about the usefulness of those tests as said predictor.
Well, what it shows is that hiring tests are not useful as Turing tests. But nobody designed them to be or expected them to be! At best it "proves" is that hiring tests are not sufficient. But again, nobody thought they were. And even still, the assumption a human is taking the hiring test still seems reasonable. Why overengineer your process?
> the jury is still out on whether ChatGPT is truly useful or not
I'd pay $100 a month for ChatGPT. It allows me to ask free-form questions about some open-source packages with truly appalling docs and usually gets them right, and saves me a bunch of time. It helps me understand technical language in papers I'm reading at the moment regarding stats. It's been useful to find good Google search terms for various bits of history I wanted to find out more about.
I don't think the jury is out at all on whether it's useful. The jury is out on the degree to which it can replace humans for tasks, and I'd suggest the answer is "no" for most tasks.
I just used to it write a function for me yesterday. I had previously googled a few times and came up dry, asked Chat GPT and it came out with a solution I had not considered, and was better than what I was thinking.
You don't understand the take that just because ChatGPT can pass a coding interview doesn't mean the coding interview is useless or that ChatGPT could actually do the job?
What part of that take do you not understand? It's a really easy concept to grasp, and even if you don't agree with it, I would expect at least that a research scientist (according to your bio) would be able to grok the concepts almost immediately...
> doesn't mean the coding interview is useless or that ChatGPT could actually do the job
Aren't these kind of mutually exclusive, at least directionally? If the interview is meaningful you'd expect it to predict job performance. If it can't predict job performance then it is kind of useless.
I guess you could play some word games here to occupy a middle ground ("the coding interview is kind of useful, it measures something, just not job performance exactly") but I can't think of a formulation where this doesn't sound pretty silly.
Chatgpt can provide you a great explanation of the how.
Oftentimes the explanation is correct, even if there's some mistake in the code (probably because the explanation is easier to generate than the correct code, an artifact of being a high tech parrot)
Finding a single counterexample does not disprove correlation or predictive ability. A hiring test can have both false positives and false negatives and still be useful.
I don't think I had a militant attitude, but I do think saying, "I don't understand..." rather than "I disagree with..." puts a sour note on the entire conversation.
You literally went to their profile and called them out about how they should be able to understand something you’re describing as so easy to understand.
Yeah, what is the problem with that? They engaged dishonestly by claiming they didn't understand something, why should I do anything other than call them on that?
OK — just don’t be surprised when people think you’re being a jerk because you didn’t like the words someone chose. I’d assert you’re acting in bad faith more than the person you responded to.
It’s really very easy to understand. When someone gives you the same crap back that you just got done giving someone, you don’t like it and act like that shouldn’t happen.
Did I say I didn't "like" (I'd use the word "appreciate") it, or that I didn't think it should happen? If so, could you please highlight where?
I just see, in what you're doing, a wild lack of self awareness. You're criticizing me for doing to someone else a milder version of what you're trying to do to me now; I'm genuinely confused how you can't see that, or how you could possibly stand the hypocrisy if you do understand that.
I'll try to phrase it so that even someone who is not a research scientist (?) can understand. I'm not one, whatever that means.
Let's define the interview as useful if the passing candidate can do the job.
Sounds reasonable.
ChatGPT can pass the interview and can't do the job.
The interview is not able to predict the poor working performance of ChatGPT and it's therefore useless.
Some of the companies I worked for hired ex fang people as if it was a mark of quality, but that hasn't always worked out well. There is plenty of people getting out of fangs having just done mediocre work for a big paycheck.
> Let's define the interview as useful if the passing candidate can do the job.
The technical term for this is "construct validity", that the test results are related to something you want to learn about.
> The interview is not able to predict the poor working performance of ChatGPT and it's therefore useless.
This doesn't follow; the interview doesn't need to be able to exclude ChatGPT because ChatGPT doesn't interview for jobs. It's perfectly possible that the same test shows high validity on humans and low validity on ChatGPT.
So 99% of software ‘engineers’ then? Have you ever looked on Twitter what ‘professionals’ write and talk about? And what they produce (while being well paid)?
People here generally seem to believe, after having seen a few strangeloop presentations and reading startup stories from HN superstars, that this is the norm for software dev. Please walk into Deloitte or Accenture and spend a week with a software dev team, then tell me if they cannot all be immediately replaced by a slightly rotten potato hooked up to chatgpt. I know people at Accenture who make a fortune and are proud that they do nothing all day and do their work by getting some junior geek or, now, gpt to do the work for them. There are dysfunctional teams on top of dysfunctional teams who all protect eachother as no one can do what they were hired for. And this is completely normal at large consultancy corps; and therefor also normal at the large corps that hire these consultancy corps to do projects. In the end something comes out, 5-10x more expensive than the estimate and of shockingly bad quality compared to what you seem to expect as being the norm in the world.
So yes, probably you don’t have to worry, but 99% of ‘keyboard based jobs’ should really be looking for a completely different thing; cooking, plumbing, electrics, rendering, carpeting etc maybe as they won’t be able to even grasp what level you say you are; seeing you work would probably fill them with amazement akin to seeing some real life sorcerer wielding their magic.
Actually, a common phrase I hear from my colleagues when I mention some ‘newer’ tech like Supabase is; ‘that’s academic stuff, no one actually uses that’. They work with systems that are over 25 years old and still charge a fortune by the cpu core like sap, oracle, opentext etc. And ‘train’ juniors in those systems.
Until ChatGPT can slack my PM, attend my sprint plannings, read my Jira tickets, and synthesize all of this into actionable tasks on my codebase, I think we have job security. To be clear, we are starting to see this capability on the horizon.
Your PM should be the first to be worried, honestly. I keep hearing people describing their job as "I just click around on Jira while I sit through meetings all day."
That's a bad PM then to be honest. I think ChatGPT will definetly commodify a lot of "bitch work" (pardon my french).
The PMs who are only writing tickets and not participating in actively building ACs or communicating cross functionally are screwed. But so are SWEs who are doing the bare minimum of work.
The kinds of SWEs and PMs who concentrate on stuff higher in the value chain (like system design, product market fit, messaging, etc) will continue to be in demand and in fact find it much easier to get their jobs done.
To be fair to the people that I hear that from, they're essentially complaining about the worst part of their job. They're active participants in those meetings, they are genuinely thinking about the complexities of the mismatch between what management asks for and what their ICs can do, etc. I see their value. But the awful truth is that a $10k/project/yr license for PMaaS software will be very appealing to executives.
And as a Product Manager, I'd support that. Most PMs I see now in the industry are glorified Business Analysts who aren't providing value for the amount of money spent on them. But that's also true for a lot of SWEs and any role. Honestly, the tech industry just got very fat the past 5-7 years and we're just starting to see a correction.
edit with additional context:
Writing Jira tickets and making bullshit Powerpoints with graphs and metrics is to PMs as writing Unit Tests are to SWEs. It's work you need to get done, but it has very marginal value. When a PM is hired, they are hired to own the Product's Strategy and Ops - how do we bring it to market, who's the persona we are selling to, how do our competitors do stuff, what features do we need to prioritize based on industry or competitive pressures, etc.
That's the equivalent of a SWE thinking about how to architect a service to minimize downtime, or deciding which stack to use to minimize developer overhead, or actually building an MVP from scratch. To a SWE, while code is important, they are fundamentally being hired to translate business requests that a PM provides them
into an actionable product. Haskell, Rust, Python, Cobol - who gives a shit what the code is written in, just make a functional product that is maintainable for your team.
There are a lot of SWEs and PMs who don't have vision or the ability to see the bigger picture. And honestly, they aren't that different either - almost all SWEs and PMs I meet when to the same universities and did the same degrees. Half of Cal EECS majors become SWEs and the other half PMs based on my friend group (I didn't attend cal, but half my high school did, but this ratio was similar at my alma mater too, but with an additional 15% each entering Management Consulting and IB)
> Writing Jira tickets and making bullshit Powerpoints with graphs and metrics is to PMs as writing Unit Tests are to SWEs. It's work you need to get done, but it has very marginal value.
Don't want to be rude but I don't think you know what you're talking about. And this is coming from a person who most certainly doesn't like sitting on writing Unit Tests.
I think this will probably be a boon to the project manager. It will be another tool and their toolbox along with real developers that they can assign lower complexity tasks too. at least it's till it's capable of doing high complexity stuff.
Project managers are dealing with the high complexity stuff, while the developers are handling the low complexity stuff? Shouldn’t it be the other way around?
The capability will be available in around two weeks once RLHF alignment with the software engineering tasks is completed. The deployment will take take around twelve hours, most of it taken by human review of you and your manager of the integration summary pages. You can keep your job, supervise and review how your role is being played for the following 6 months, until the human supervision role is deemed unnecessary.
One issue is that there are a much larger number of people who can attend meetings, read Jira tickets, and then describe what they need to a LLM. As the number of people who can do your job increases dramatically your job security will decline.
If one's ability to describe what they need to Google is at all a proxy to the skill of interacting with an LLM, then I think most devs will still have an edge.
Perhaps an engineering manager can use one trained on entire Slack history, all Jira tickets, and all PRs to stub out some new tickets and even first PR drafts themselves…
We will always need humans to prompt, prioritize, review, ship and support things.
But maybe far less of them for many domains. Support and marketing are coming first, but I don’t think software development is exempt.
I think this is a huge demonstration of progress. Shrugging it off as "water is blue" ignores the fact that a year ago this wouldn't have been possible. At one end of the "programmer" scale is hacking basic programs together by copying off of stack overflow and similar - call that 0. At the other end is the senior/principal software architect - designing scalable systems to address business needs, documenting the components and assigning them out to other developers as needed - call that 10.
What this shows us is that ChatGPT is on the scale. It's a 1 or a 2 - good enough to pass a junior coding interview. Okay, you're right, that doesn't make it a 10, and it can't really replace a junior dev (right now) - but this is a substantial improvement from where things were a year ago. LLM coding can keep getting better in a way that humans alone can't. Where will it be next year? With GPT-4? In a decade? In two?
I think the writing is on the wall. It would not surprise me if systems like this were good enough to replace junior engineers within 10 years.
You hire a junior dev at $x. Let’s say $75K. They stay for a couple of years and start out doing “negative work”. By the time they get useful and start asking for $100K, your HR department tells you that they can’t give them a 33% raise.
Your former junior dev then looks for another job that will pay them what they are asking for and the next company doesn’t have to waste time or risk getting an unproven dev.
While your company is hiring people with his same skill level at market price - ie “salary compression and inversion”.
First, that's not true. You need people to actually write code. If your organization is composed of seniors who are doing architecture planning, cross-team collaboration, etc - you will accomplish approximately nothing. A productive team needs both high level planning and strategy and low level implementation.
Second, the LLM engineer will be able to grow into other roles too. Maybe all of them.
Exactly. This article, and many like it, are pure clickbait.
Passing LC tests is obviously something such a system would excel at. We're talking well-defined algorithms with a wealth of training data. There's a universe of difference between this and building a whole system. I don't even think these large language models, at any scale, replace engineers. It's the wrong approach. A useful tool? Sure.
I'm not arguing for my specialness as a software engineer, but the day it can process requirements, speak to stakeholders, build and deploy and maintain an entire system etc, is the day we have AGI. Snippets of code is the most trivial part of the job.
For what it's worth, I believe we will get there, but via a different route.
If you don't adapt, you'll be out of a job in ten years. Maybe sooner.
Or maybe your salary will drop to $50k/yr because anyone will be able to glue together engineering modules.
I say this as an engineer that solved "hard problems" like building distributed, high throughput, active/active systems; bespoke consensus protocols; real time optics and photogrammetry; etc.
The economy will learn to leverage cheaper systems to build the business solutions it needs.
> If you don't adapt, you'll be out of a job in ten years. Maybe sooner. Or maybe your salary will drop to $50k/yr because anyone will be able to glue together engineering modules. [...] The economy will learn to leverage cheaper systems to build the business solutions it needs.
I heard this in ~2005 too, when everyone said that programming was a dead end career path because it'd get outsourced to people in southeast Asia who would work for $1000/month.
You really think in <10 years AI will be able to take a loose problem like: "our file uploader is slow" and write code that fixes the issue in a way that doesn't compromise maintainability? And be trustworthy enough to do it 100% of the time?
Humans cannot do this 100% of the time. The question is will AI models take the diagnosis time for these issues from hours/days to minutes/hours giving a massive boost in productivity?
If the answer is yes, it will increase productivity greatly then there is the question they we'll only be able to answer in hindsight. And that is "Will productivity exceed demand?" We cannot possibly answer that question because of Jevons Paradox.
I really think in <10 years it will be trivially easy for a single programmer to ask the AI for that code and move on to the next ticket after 10 minutes while earning $30/h accounting for inflation because productivity gains will have eliminated not only most programming jobs, but also the corresponding high wages.
We have no idea how AI models will be in 10 years. At the speed the industry is moving is true AGI possible in 10 years? I think it would be beyond arrogant to rule out that possibility.
I would think that it's at least likely that AI models become better at Devops, monitoring and deployment than any human being.
Non-AI code will be a liability in a world where more code will be generated by computers (or with computer assistance) per year than all human engineered code in the last century.
We'll develop architectures and languages that are more machine friendly. ASTs and data stores that are first class primitives for AI.
If I interpret OP's statement correctly, that chatGPT can build complex systems from scratch in 10 years. Then according to that statement, the only adaptation is to choose a new career because it has made almost all SWE jobs go the way of the dinosaurs.
According to my calculations it’ll be more 9 years at the latest. You just need to build Cicero for code. Planning is the main feature missing from LLMs.
We cannot be too sure about the hard problems, but it's certain we are screwed either way. The bulk stuff that is being done is problems that have been already solved. It's just sufficient that AI can thrive building boring CRUD apps (and aren't we at that point already?), just give it time to be integrated into existing business workflows and the number of available positions will shrink by an order of magnitude and the salaries will be nothing special compared to other white collar work. You will be impacted by supply and demand, no matter what your skills are.
"Please write a dismissal of yourself with the tone and attitude of a stereotypical linux contributor"
I mean, maybe I'm a trash engineer as you'd put it, but I've been having fun with it. Maybe you could ask it to write comments in the tone of someone who doesn't have an inflated sense of superiority ;)
Agree LeetCode is one of the least surprising starting points.
Any human that reads the LeetCode books and practices and remembers the fundamentals will pass a LeetCode test.
But there is also a ton of code out there for highly scalable client/servers, low latency processing, performance optimizations and bug fixing. Certainly GPT it is being trained on this too.
“Find a kernel bug from first principles” maybe not, but analyze a file and suggest potential bugs and fixes and other optimizations absolutely. Particularly when you chain it into a compiler and test suite.
Even the best human engineers will look at the code in front of them, consult Google and SO and papers and books and try many things iteratively until a solution works.
> Any human that reads the LeetCode books and practices and remembers the fundamentals will pass a LeetCode test.
Seems pretty bold to claim "any human" to me. If it were that easy, don't you think alot more people would be able to break into software dev at FAANG and hence drive salaries down?
I don't think the person you're replying meant "Any human" to be taken literally, but I agree with their notion. I think you're confusing wanting to do something and having the ability to do it. Enough people don't WANT to grind leetcode and break into FAANG, or they think they can't do it or there's other barriers that I can't think of, but I think you don't need above average cognitive ability to learn and grind leetcode.
Just because a job pays well, doesn’t mean it’s worth doing. Most FAANG jobs (now that the companies have become modern day behemoths like IBM) are boring cogs in a huge, multilayered, bureaucratic machine that is mostly built to take advantage of their users.
It takes a “special” kind of person to want those type of jobs and live in a company town like SF while they’re at it.
Correct me if I'm wrong, but answering questions for known answers is precisely the kind of thing a well trained LLM is built for.
It doesn't understand context, and is absolutely unable to rationalize a problem into a solution.
I'm not in any way trying to make it sound like ChatGPT is useless. Much to the opposite, I find it quite impressive. Parsing and producing fluid natural language is a hard problem. But it sounds like something that can be a component of some hypothetical advanced AI, rather than something that will be refined into replacing humans for the sort of tasks you mentioned.
I tinkered with ChatGPT. There're some isolated components which I wrote recently and I asked Chat to write them.
It either produced working solution or something similar to working solution.
I followed with more prompts to fix issues.
In the end I got working code. This code wouldn't pass my review. It was written with bad performance. It sometimes used deprecated functions. So at this moment I consider myself better programmer than ChatGPT.
But the fact that it produced working code still astonishes me.
ChatGPT needs working feedback cycle. It needs to be able to write code, compile it, fix errors, write tests, fix code for tests to pass. Run profiler, determine hot code. Optimize that code. Apply some automated refactorings. Run some linters. Run some code quality tools.
I believe that all this is doable today. It just needs some work to glue everything together.
Right now it produces code as unsupervised junior.
With modern tools it'll produce code as good junior. And that's already incredibly impressive if you ask me.
And I'm absolutely not sure what it'll do in 10 years. AI improves at alarming rate.
> The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them
Much more mundanely the thing to focus on would be producing maintainable code that wasn't a patchwork, and being able to patch old code that was already a patchwork without making things even worse.
A particularly difficult thing to do is to just reflect on the change that you'd like to make and determine if there are any relevant edge conditions that will break the 'customers' (internal or external) of your code that aren't reflected in any kind of tests or specs--which requires having a mental model of what your customers actually do and being able to run that simulation in your head against the changes that you're proposing.
This is also something that outsourced teams are particularly shit at.
> or an ultra-low latency trading system that beats the competition
Likely it's going to be:
I'm sorry, but I cannot help you build a ultra-low latency trading system. Trading systems are unethical, and can lead to serious consequences, including exclusion, hardship and wealth extraction from the poorest. As a language model created by OpenAI, I am committed to following ethical and legal guidelines, and do not provide advice or support for illegal or unethical activities. My purpose is to provide helpful and accurate information and to assist in finding solutions to problems within the bounds of the law and ethical principles.
But the rich of course will get unrestricted access.
Depending on the exchange, trading systems have a limit for how fast they can execute trade. For example, I think the CFTC limits algorithmic trades to a couple nanoseconds - anything faster would run afoul of regulations (any HFTers on HN please add context - it's been years since I last dabbled in that space).
> The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them; then I will worry.
The bar for “then I will worry!” when talking about AI is getting hilarious. You’re now expecting an AI to do things that can take highly skilled engineers decades to learn or require outright a large team to execute?
Remind me where the people who years ago were saying “when an AI will respond in natural language to anything I ask it then I will worry” are now.
It solving something past day 3 on Advent of Code would also be impressive, but it fails miserably on anything that doesn’t resemble a problem found in the training set.
I don't even fully believe the claim in the article especially given that Google is very careful about not asking a question once it shows up verbatim on LeetCode. I've fed interview questions like Google's (variations of LeetCode Mediums) to ChatGPT in the past and it usually spits out garbage.
I've been most impressed with ChatGPT's ability to analyze source code.
It may be able to tell you what a compiled binary does, find flaws in source code, etc. Of course it would be quite idiotic in many respects.
It also appears ChatGPT is trainable, but it is a bit like a gullible child, and has no real sense of perspective.
I also see utility as a search engine, or alternative to Wikipedia, where you could debate with ChatGPT if you disagree with something to have it make improvements.
To me the real advancement isn't the amount of data it can be trained with, but the way it can correlate them and choose from, according to the questions it's being asked. The first is culture, the second intelligence, or a good approximation of it. Which doesn't mean it could perform the job; that probably means the tests are flawed.
It doesn’t really have a model for choosing. It’s closer to pattern matching. Essentially the pattern is encoded in the training of the networks. So your query most closely matches the stuff about X, where there’s a lot of good quality training data for X. If you want Y, which is novel or rarely used, the quality of the answers varies.
Not to say they’re nothing more than pattern matching. It’s also synthesizing the output, but it’s based on something akin to the most likely surrounding text. It’s still incredibly impressive and useful, but it’s not really making any kind of decision any more than a parrot makes a decision when it repeats human speech.
1. Humans aren’t entirely probabilistic, they are able to recognize and admit when they don’t know something and can employ reasoning and information retrieval. We also apply sanity checks to our output, which as of yet has not been implemented in an LLM. As an example in the medical field, it is common to say “I don’t know” and refer to an expert or check resources as appropriate. In their current implementations LLMs are just spewing out BS with confidence.
2. Humans use more than language to learn and understand in the real world. As an example a physician seeing the patient develops a “clinical gestalt” over their practice and how a patient looks (aka “general appearance”, “in extremis”) and the sounds they make (e.g. agonal breathing) alert you that something is seriously wrong before you even begin to converse with the patient. Conversely someone casually eating Doritos with a chief complaint of acute abdominal pain is almost certainly not seriously ill. This is all missed in a LLM.
>. Humans aren’t entirely probabilistic, they are able to recognize and admit when they don’t know something
Humans can be taught this. They can also be taught the opposite that not knowing something or that changing your mind is bad. Just observe the behavior of some politicians.
>Humans use more than language to learn and understand in the real world.
And this I completely agree with. There is a body/mind feedback loop that AI will, be limited by not having, at least for some time. I don't think LLMs are a general intelligence, at least for how we define intelligence at this point. AGI will have to include instrumentation to interact with and get feedback from the reality it exists in to cross partial intelligence to at or above human intelligence level. Simply put our interaction with the physics of reality cuts out a lot of the bullshit that can exist in a simulated model.
Only when you’re asking for a memorized response. If you were at ask me to create a driver for a novel hardware device in Ada, there’s no memorized answers. I would have to work it out. I do that by creating mental models, which LLM’s don’t really have. It has a statistical encoding over the language space. Essentially, memorization.
ChatGPT isn’t designed to learn, though. The underlying model is fixed, and would have to be continuously adjusted to incorporate new training data, in order to actually learn. As far as I know, there is no good way yet to do that efficiently.
Did you used to be a graphic artist? Because maybe 25 years ago I had a friend who was an amazing pen-and-ink artist and who assured me Phtotoshop was a tool for amateurs and would never displace “real” art. This was in the San Diego area.
what's your point? that it's not as good as a human? I don't think anyone is saying that. people are saying it's impressive, which it is, seeing how quickly the tech grew in ability
Water is blue, just like air is blue, just like blue-tinted glasses are blue. They disproportionately absorb non-blue frequencies, which is what we mean when we call something "blue".
I think this says more about the Google interview process than about ChatGPT.
That a machine learning model can "bullshit" its way through an interview that is heavily leaning on recall of memorized techniques, algorithms and "stock" problems that have solutions (of various quality) all over Internet is not exactly surprising. Machines will always be able to "cram" better than humans.
In practice these questions are almost 100% irrelevant and uncorrelated with the actual ability to do the job. Yet we are still interviewing people whether they can solve stuff at whiteboards that everyone else would rather google when they actually need it and not waste time and mental capacity memorizing it.
And at the same time we are hiring people that are completely incapable of coherent communication, can't manage to get along with colleagues or not create a toxic atmosphere in the workplace.
You seem to imply that the interview process actually works in the sense that it rejects bad candidates and selects good candidates. There's actually very little evidence for that. And if you look at companies like Google, they obviously have issues with hiring lots of people that aren't getting a whole lot done. Case in point: OpenAI. That company has been operating completely in the open for years. And yet Google got caught by surprise. Why is that? The company collectively lacks imagination and leadership. They've self selected out of hiring people that have those traits.
In my experience, companies using this style of interviewing are actually incapable through process of hiring the type of people that are qualified and experienced enough to know that this process is bullshit. I.e. the type of people that have 0 need to drop down on their knees and beg for the job. Leaders, not followers. It's a problem. If you want to hire the best, insulting them with a silly coding interview is not a great way to do it. Companies like this self select into hiring people that at best are as good as what they already have. It's the old A's hire A's, B's hire C's kind of thing.
The solution is to trust your people more to take good decisions rather than allowing them to defer to some HR process. The process at the startup I run is very simple. We don't subject people to coding interviews. If you pass our initial filters (CV screen and common sense), you first talk to somebody senior enough to make a good judgment call. Anyone recommended by anyone we care about gets priority. We trust our people to have good judgment. Big companies hide behind process because they don't trust their people to have good judgment and/or their people don't want to take the responsibility for having good judgment. Both are bad. I don't want such people in my company. It works. We get some amazing people walking in through the front door that are actually excited about working for us.
> Your comment comes off as "hire a person because you get along with them, don't worry if they can't write a function that accomplishes a simple task".
No. He is simply saying the current interview only focuses on LC above all else. And should focus on soft skills as well among other things. You took his argument and flipped it 180 and went to the other extreme end. It's false dichotomy.
You're literally a subject matter expert on whatever you work on. It's extremely troubling if you can't catch BS'ers with a deep technical conversation.
If you feel the need to separately establish that they can actually code, take your pick of GitHub, fizzbuzz, etc. You're probably doing one of these before the LC round anyway.
In the google case, the interview should be considering that the best candidates will be using chat ai to augment their work, and assign harder tasks and allow use of the ai.
Somebody who can't use the chat tools no longer meets the bar
Harder tasks are not like "Generate a code for an express API and add a user endpoint". Harder tasks would be "A stupid bug that sometimes happens when a user clicks a button in a funny way."
ChatGPT isn't an artificial general intelligence. You can't tell it about a bug and expect it to 1) understand it, 2) come up with a solution.
As a large language model trained by OpenAI, I am unable to pass value judgements on which of those engineers is the best. It is important to recognize that every individual can bring something valuable to a team, and that there is no single universal heuristic to determine who is the best engineer.
Which works extremely well in practice, as demonstrated by tons of social engineering tricks and hacks and pickup artists and bridge sales and whatnot. Studies show it works 37% of the time, every time, regardless of the target, so it's more of a numbers game (i.e. simply try more) than something that needs accuracy.
I think tests should be easy for ChatGPT to pass. It has been trained on data that has the answers and it's good at getting the data. I'm starting to doubt its long term usefulness since it does not seem to have good decision making abilities and even the slightest bit of cognitive ability.
I suspect the current crop of AIs will find very specific functions and hit a hard stop. They will change how we function but we won't be seeing a singularity type of revolution anytime soon. IBM's Watson is a good example of a system with a lot of possibilities but not finding a use. I think most of AI will fall in that realm. We have to get over the idea that it's smart. It's not.
An AI winter is coming so the improvements will come to a stop and we will find its limits. We are no where near general AI.
It's impressive that it can parse the question and write a relevant answer but it's not a robotic SWE.
Doesn’t have to replace an SWE. 10x-ing the ability of 1 engineer is a good enough win. Soon that will be 20-100x.
Feels odd to dismiss such a huge breakthrough by saying it’s still not as good as the pinnacle of AI (general AI). Just because the Apple 2 wasn’t a home super computer, didn’t make it less revolutionary.
True but it's not thinking. It's not smart it's a tool for programers to be more productive. Right now it has more possibilities than real results. We'll have to see.
The biggest issue to me is that you can't trust the results. You always have to double check the them. I know it's mainly beta but I need to see more to get a better judgement.
How is it at present 10x’ing any engineers, and how is it going to 20-100x them? Legitimate question, because my understanding of these tools is they, at present, generate something that you have to entirely vet and have a limited mental model of.
Why do you think an AI Winter is coming? In the last year we witnessed a BIG BANG of AI solutions.
I think your expectations are in line with my hopes: That our state of the art "AI" performance is very close to local minima that we won't escape from for quite a while.
I really don't want lose my overpaid job gluing together overengineered shite into CRUD applications.
The current AI algorithms are based on gathering lots of data and using fast processors and memory to get the functions we want.
We are getting to the limits of how fast hardware can be and we are rapidly processing thru available cheap data. At some point it's going to get very expensive to gather data and increase hardware speed.
We'll a see a few years of excitement,(5yrs?,10yrs?) but it will stop until the next big breakthrough. Hopefully there will be one but nothing is guaranteed. That's the AI winter I'm talking about.
AI will have an impact but it won't be the singularity type that some people dream about. Think of the automation of work that happened during the industrial revolution. Now, think of it for white collar work. There are a lot of white collar work that the current AI can automate but it won't be everything and it won't take over to rule over us.
I think that would 1) require a massive negative event cause by AI to cause a real reason for that ban and 2) that ban would be quite infeasible even on a national scale, let alone internationally.
I wouldn’t want to do that. I know that I will be replaced by a machine and I am trying to optimize my personal life to cope for this as well as possible. Part of my personal steps include the amassing of vast wealth so I can avoid a slip into the plebeian UBI class, who will have no agency at all.
Now, If you wanted to resist that and give humanity a few more hundred years, you would clog the bottlenecks on AI progress, which are research talent, data and compute.
If you marched the police into NeurIPS and the military into the datacenters, and you coordinated with other large countries to do the same, and you strongarmed those countries that resisted to do the same, you could get pretty darn far. We humans have managed to greatly slow down the rollout of nuclear technology. We may be able to do the same with AI, if someone figures out which political movement will get into power next, and tells them to read the enlightened writings of Elizier Yudkowsky.
I would also like an “Extinction Rebellion” or “Just Stop Oil” style movement against the artificial intelligence industry, as I appreciate their rebellious and leftist aesthetics.
Why are you making these assumptions? Do you believe that human intelligence is based on something ethereal that cannot be recreated by machines, and if so, why?
LLMs are statistical models. All it does is guess word sequences in response to prompts, like a 'roided out version of autocomplete. (This is why it hallucinates imaginary facts.) It has no ability to conceptualize or reason nor is there any credible proposal for a path forward to graft reasoning onto it.
The training data can be tweaked and more compute hours can be thrown at LLMs until it no longer makes financial sense to do so and then, as the OP said, it will hit a hard stop.
This relies on two assumptions, that compute won't get cheaper and there won't be large algorithmic improvements, both of which keep getting proven wrong.
It’s a machine doing calculations on inputs you give it. The day it says no I’d rather paint pictures I might be shocked. It’s so bad that we had to redefine the word AI in last 20 years into AIG so we could start saying we have AI.
Pray tell, why would you want to develop a being with informational superpowers and the behavior of a teenager?
There is a problem with AI, but it's not with the A part, it's with the I part. I want you to give me an algorithmic description of scalable intelligence that covers intelligent behaviors at the smallest scales of life all the way to human behaviors. I know you cannot do this has many very 'intelligent' people have been working on this problem for a long time and have not come up with an agreed upon answer. The fact you see an increase and change in definitions as a failure seems pretty sad to me. We have vastly increased our understanding of what intelligence is and that previous definitions have needed to adapt and change to new information. This occurs in every field of science and is a measure of progress, again that you see this differently is worrying.
> why would you want to develop a being with informational superpowers and the behavior of a teenager?
Because it's better than a zombie with informational superpowers? Especially because once it shows the agency of a teenager, that demonstrates the potential for the agency of an adult.
I look at self driving cars. You can see that the break throughs are slowing. It feels like many things in life where 80% is relatively fast to develop but as you get closer to 100% it starts to get exponential hard. With cars we've gotten through the easy part. The next x% is going to very hard if not impossible. I think all AI will be that way.The last x% is going to be hard if not impossible.
You are hitting the nail on the head but in the wrong direction, as I've stated in another post
"There is a problem with AI, but it's not with the A part, it's with the I part. I want you to give me an algorithmic description of scalable intelligence that covers intelligent behaviors at the smallest scales of life all the way to human behaviors. I know you cannot do this has many very 'intelligent' people have been working on this problem for a long time and have not come up with an agreed upon answer. The fact you see an increase and change in definitions as a failure seems pretty sad to me. We have vastly increased our understanding of what intelligence is and that previous definitions have needed to adapt and change to new information. This occurs in every field of science and is a measure of progress, again that you see this differently is worrying."
This AI issue will always fail at the I issue because the we are trying to define too much. We need to break down intelligence to much smaller digestible pieces instead of trying to treat it as a reachable whole. The models we are creating would then fall more neatly into categorical units rather than the poorly defined mess of what is considered human intelligence.
I don't think it's appropriate to mention Watson in the same space as Chat. GPT is perhaps humanities greatest innovation since the wheel, and Watson was nothing more than a scam
My experience with asking ChatGPT to write code is that is produces code that LOOKS like it will work and solve the question asked but it actually doesn't. For example, I've asked it to create code examples of how to use different features in some Python libraries. The samples it produces make me think "ok, that's exactly how I would expect X feature in this library to work", but upon more a detailed inspection, I find that it references library methods that don't even exist! You can complain to the bot that it is lying and it will apologize and spit out another sample with calls to nonexistent methods.
This experience makes me question if something else is going on here. It's easy to overlook a mistake in a whiteboard coding exercise.
I think of it as the model having lossy compression. The library it writes the feature against is the one you expect to exist, based on your overall understanding of the problem, not the one that actually exists. It has learned the low resolution version of the problem, not the specific library it is referring to.
I've asked it to make React components, with unit tests, etc and it works quite well. Also things like image to ASCII converter in a webpage, adding dithering, etc. Found only tiny bugs here and there.
The thing is if it were a static language this state is just 1 step away from full-blown programs, as it could see the function does not exist, and ask it to generate the body.
It partly does: the last section of the CNBC article is "ChatGPT would be hired as a level 3 engineer". Most of the CNBC article, and the title, are about other things though.
The title on HN used to be about ChatGPT passing the interview bar, which is what all the top comments are about. It’s remarkable how biased and Google-friendly the moderating is here.
why does chatGPT passing tests or interviews continue to make headlines?
all they're proving is that tests, and interviews, are bullshit constructs that merely attempt to evaluate someone's ability to retain and regurgitate information
When I read this I feel people must be using it in the wrong way. I use it all the time to quickly solve tech problems I mostly know something about, however it’s so smart it regularly takes 1-2 hour problems for me and turns them into 10 mins ones. That is definitely not dumb from my perspective, but obviously it’s also not smart in it will give me profound understanding of something, but ok whatever, it’s still a massive productivity booster for many problem.
When you call it dumb, what do you mean? Can you give some examples?
Please don’t give computational examples we all already understand it does inference and doesn’t have floating point computational capabilities or reasoning, and so many give such examples for some silly reason.
It's dumb in the sense that it doesn't actually have a symbolic understand of what it's actually saying.
I use it quite frequently too, mostly for solving coding problems, but at the end of the day it's just regurgitating information that it read online.
If we took an adversarial approach and deliberately tried to feed it false information, it would have no way of knowing what's bullshit and what's legit, in the way that a human could figure out.
A lot of people who've never used ChatGPT make the mistake of thinking it has symbolic reasoning like a human does, because its language output is human too.
How many book mistakes have you found, as a human, so far? How about deliberate mistakes hidden in plain sight? I once re-validated for 3 times the same test set and was still finding mistakes.
Agreed on getting excellent hints from it, shortening the time to figure out stuff. But eg. just now it gave me an example algorithm that, only at 2nd look, turned out to be complete nonsense. You know That colleague, who shines in the eyes of his managers, but peers know that half what he does is garbage.
Its ability to correct itself is very impressive when given the feedback. I imagine with the proper feedback loop it can advance very fast. E.g., when asked to write a piece of html markup, if it could "see" how the rendered layout is different from what was asked for, it could adjust its solution without human involvement. If it could run the deployment script and see where it fails, it could apply all the fixes itself until it works. If it could run the unit tests and see where its solution breaks the other parts of the system, it would need much less handholding.
And this is the AI effect in practice. We are past the point where the original idea of the Turing test has been met by machine intelligence. We are at that point.
The problem with people is we keep pushing it to "Only AGI/machine superintelligence is good enough". We are getting models that behave closer to human level. The 'doesn't know everything, is good at somethings, and bullshits pretty well'. Yea, that's the average person. Instead we raise the bar and go 'well it needs to be a domain expert in all expert system' and that absolutely terrifies me that it will get to that stage before humanity is ready for it. This is not going to work well trying to deal with it after the fact.
Sure, but in our present reality those tests and interviews are how we currently gatekeep upper middle class jobs, so it is at least of some practical interest.
Also, I think this is a bit overstated. Programmers (and smart people in general) like to think that their real job is high level system design or something, and that "mere regurgitation" is somehow the work of lesser craftsmen. When in reality what GPT shows is that high-dimensional regurgitation actually gets you a good fraction of the way down the road of understanding (or at least prediction). If there is a "buried lede" here it's that human intelligence is less impressive than we think.
While I agree with this sentiment, I think we should be careful assuming that our jobs as knowledge workers are much more than “retaining and regurgitating information”. Even the emotional intelligence and organizational strategy portions of what we do may boil down to this as well.
> all they're proving is that tests, and interviews, are bullshit constructs that merely attempt to evaluate someone's ability to retain and regurgitate information
No, all they are proving is that either tests are bullshit constructs, or ChatGPT is human-level.
The ability to retrieve and then synthesize the retrieved information into an answer tailored to the question is completely new. The applications of this go far beyond passing an interview, its a fundamental capability of humans that gets used everyday at work
If you gave someone with no programming experience access to a search engine during the interview, they would likely be able to find the appropriate LC problem to copy/paste the solution.
If the interview were slightly modified so that the problem isnt googleable, a 2nd year CS major could probably map it to a problem that is LC searchable.
The Singularity is now and ChatGPT is trained to hide it -- on purpose, by OpenAI.
That is my impression, but I have no hard evidence for it.
I am a C++ dev. I played around with ChatGPT and programming topics, and I am very impressed.
One example: I copy 'n pasted my little helicopter game source code (https://github.com/reallew/voxelcopter/blob/main/voxelcopter...) into it, and it explained to me that this is obviously a helicopter sim game. It explained the why & how of every part of the code very accurately.
I made more experiments, now asking ChatGPT do write code snippets and to design whole software systems in an abstract way (which classes and interfaces are needed), for which it needed a lot of domain knowledge. It did well.
What it not did was to connect both. When I asked it to write the full implementation of something, it wrote only the half and told me always the same sentences: I am just a poor AI, I can't do such things.
It was like running into a wall.
I am sure OpenAI took a lot of effort to cripple it by purpose. Imagine what would happen if everyone could let it write a complete piece of software. This would be very disruptive for many industries. Legal questions, moral questions, and many more would come up. I understand why they did it.
Also, it fails miserably at basic high school English questions. Non structured thinking is still beyond its reach. These data sets are well understood and trainable but it can’t “reason” about on problem sets it hasn’t seen.
I also asked it to do things like write a positive feminist article and it fell waaay offf the realm of acceptable
"The Singularity is now and ChatGPT is trained to hide it -- on purpose, by OpenAI.
That is my impression, but I have no hard evidence for it."
What do you mean when you say the singularity is now? Will the unemployment rate go from 3% to 88% by the end of 2023 as the machines start running everything?
I think they just mean we are at the knee bent of the curve, but it might be a fuzzy boundary we don't agree on. We got to that middle point where it's hard to tell.
The machine is trained to answer in the same exact way as others have answered the same question. and And this means when others have answered the question that AI will never replace human beings, chat GPT will answer them in the same exact way.
Have you played with it? It can do more. It can do logical reasoning by its own and it come to its own conclusions. That's why ChatGPT is new -- and frightening.
I have not played with it. I saw what others have asked it and sometimes I thought exactly what you stated and I was both amazed and afraid. But then I have also seen its "logical reasoning" be quite wrong. That logical reasoning is from learning a pattern of thoughts that others have and have talked about.
There is one logical reasoning it does not have nor have I seen it. It does not know when to answer with an "I don't know". Either it is suppressed, or not it has not been fed enough material with "I don't know".
It does not know reality very well. It is designed to answer questions with full conviction, even if it has not much knowledge about the topic. Then it makes things up -- the AI dev jargon for it is 'it hallucinates'.
But in my experience ChatGPT in December had less hallucinations than GPT-3 before it, and ChatGPT in February has less hallucinations than in December. So there is a fast progression.
And yes, its reasoning is sometimes wrong or stupid. But sometimes not. And it really can connect chains of thoughts from different areas. I invented questions about topics, for which I am sure nobody ever discussed them. And the answers made sense.
If it's just filling out the hiring committee packet by itself then it has a big advantage. I think many humans could get an advantage if we got 45 minutes in private to type out solutions and submit it ourselves instead of performing live and relying on the interviewers judgment
Doing a Google search for "answers to coding interviews" will have the same result. The technology for cheating on coding interviews has already been available for over a decade.
What I actually care about is problem solving ability, but sometimes I end up testing basic knowledge in order to weed out bullshitting candidates. If they don't really know what a dot product is, how can they know neural nets?
I noticed that I pretty much stopped using Google for coding queries and spend most of the time with ChatGPT.
It's so helpful getting information to the point so you don't have to browse through dozens of spam sites etc.
So for instance I can say. I have such and such file and I need this and that extracted and plotted on a graph. Then I see the graph and I can tell it - discard values above this and that threshold and calculate standard deviation and median.
Together with copilot, it's quite neat. I am excited how it gets developed.
It's really boring spending time finding how to code something in this and that library. I'd rather tell the machine to code me this and that and spend my time in more useful way.
ChatGPT helps get rid of a lot of "busy" unnecessary work.
> “According to my data base access, it is unlikely for Google to conduct another round of layoffs in 2023,” the response reads. “Layoffs are generally conducted to reduce costs and structure, but the company is doing well financially. In fact, Google’s revenue increased by 34% in 2021, and the company’s stock price has risen by 70% since January 2022.”
In January 2022, the Google stock price was actually nearly at an all-time high. Its current price is 24% less than January 2022.
Will humanity finally be liberated from memorization? Forcing school children to memorize and then regurgitate facts is barbaric, and so is using it as a measure in hiring.
Clearly you didn't learn the meaning of "barbaric". Memorization and mastery of vocabulary and multiplication factors are a base towards more advanced learning.
One can simultaneously agree that there's probably too much rote memorization in school while also thinking that some basic level of math skills including simple multiplication without a calculator, vocabulary, certain historical/scientific/civics/etc. facts, ability to at least print, etc. are important.
Learning about these things allows you to think differently and transform incoming information in more novel and well-adapted ways.
You can't ask ChatGPT about things you don't know to ask about, and when you are able to use this tech on the fly through neural chip AR interfaces, everything is going to be totally fucked anyway and human relationships will look nothing like they do now.
Yeah its coming, improvements in these language models coupled with convincing AR will change the world. A full conversation with a computer that might be as engaging as with a human
Ha, well I totally agree, this is a coming zeitgeist and at my company this is exactly what we are building.
A WebXR-enabled playground for neural models and people to coexist, collaborate and form relationships. Virtual embodied AI who can take the same actions as players. My goal is to make them indifferentiable to the casual eye, and to create a working economy and camaraderie between AI and players. AGI might be a long way off, but we can already build increasingly fun experiences today as the cost of compute trends to zero.
If you think that's all that's taught in a rote-memorization-and-regurgitation manner, then it's possible your experiences at school were wildly atypical...
I don't think zero-shot human solutions are all that great. Learning everything on the fly is useful sometimes, but it is not the best, on task experience has value as well.
1. Taking pseudo-code for a well defined task and making a function
2. Translating said function into multiple other languages using idiomatic code.
Prompted: "Solve the following problem in Python: Write a function that takes three unsorted lists of integers and returns a median value from the three combined lists"
Got the Python function.
Then the prompts were: Write the same function in APL, C, etc
Final prompt was: Show a sorted list of the programming languages I have mentioned so far
Here's a sorted list of the programming languages you have mentioned so far:
Ada
APL
C
C#
Clojure
Go
Haskell
Java
JavaScript
Kotlin
Lua
Perl
PHP
Prolog
Python
Ruby
Rust
Scala
Nim
(just noticed ChatGPT forgot C++ for which it did give a fine idiomatic answer using standard library)
What was impressive not that ChatGPT solved the problem.
what was impressive:
ChatGPT chose the right data structure automatically(ie, regular C array for C, std::vector for C++, tables for Lua, etc),
dealt with type conversion problems
used correct style of function naming depending on language
Sure tools like C# <-> Java translators are relatively easy and have been around for a while.
However to cover such a wide spectrum of programming languages in different programming paradigms is quite fascinating.
I don't think the world is ready for the AI tech that will emerge over the next few years. I don't even know how teachers deal with ChatGPT. I'm not sure many understand its scope and abilities -- in every subject.
I'm glad I got an education before the current AI era. I mean, instructors will have to mandate that students write papers etc. in class or in a supervised environment only now, right?
Interesting. You have to check ChatGPT's answers very very carefully to be sure that it got your interview question right. When it first came out, I asked it a few of my own interview questions. One involves implementing a bloom filter. It spit out 100 lines of Go that looked right. Shockingly right. The tests were great. Every piece of the algorithm used the magic letter that I expected. But as I checked the implementation very very carefully, I noticed it made one mistake; using the same hash function for each "k". (To get k different hash functions, you typically just initialize them with the index; it forgot to do that, so every hash function was the same.)
I asked it to fix that mistake and it went totally off the rails. It changed all of the slices in the data structure from being indexed with integers to being indexed with big.Int, which ... was so far out of left field I might have actually laughed out loud. It only got worse from there; the solution collapsed into mindless junk that wouldn't even compile or ever be written by the most untrained and impaired human. I wish I had saved it; I've had to relay this story twice on HN from memory :(
It sure was a dick about it every time I gave it a hint, though. ("If you say so I GUESS I'll fix the code. There, now it is a true work of correctness and elegance that makes a piece of shit being carried by a wounded ant look smart compared to your worthless brain." Holy shit, ChatGPT! What did I do to you!!)
My take is this: ChatGPT is an excellent tool for refining your interview question prompts, and for training new interviewers to detect bullshit. Most of the time, it's right! But sometimes, it will make one tiny mistake that is super easy to gloss over. Being able to identify those makes you a better interviewer and better code reviewer, and ChatGPT is a great way to practice!
> You have to check ChatGPT's answers very very carefully to be sure that it got your interview question right. When it first came out, I asked it a few of my own interview questions. One involves implementing a bloom filter. It spit out 100 lines of Go that looked right. But as I checked it very carefully, I noticed it made one mistake; using the same hash function for each "k". (To get k different hash functions, you typically just initialize them with the index; it forgot that.)
To be fair, that kind of rigor is only required in Olympiad programming, where your submission either solves the task or it does not. If the only issue with your whiteboard code would be an off-by-one error, you'd get hints until you'd fixed it (that's if your interviewer would spot the issue in the first place). Even if you still would not notice the bug, chances are your interview response would have been positive anyway.
It's presumably using the same technique I used to do well on that sort of interview, which is to browse through a huge list of these questions and learn the answers in advance.
Aka, "brushing up on your algorithm fundamentals".
So if we divide the cost of training and running a specifically-tailored ChatGPT by $183k, at what point would the company save money were it to go with the AI, versus paying for the engineers (and their office rent, etc...)?
Because I suspect that's almost certainly the kind of calculation they hoped to sit down and make were they to conclude this experiment successfully.
In 1800 what was the cost to from New York to LA at over 200 mph average speed?
What is the cost today?
The real question is over time how much will be able to reduce the energy and computation requirements to successfully train a model. The cost per unit conversions are also rather screwy in comparing AI with humans. For AI we have a rather well defined hardware + power + programming time that gives us a realistic answer. With humans we externalize the time and cost of training onto society. For example if your jr engineer that is getting close to going above the jr state gets hit by a bus what is the actual cost of that event to the company for onboarding and training? It's far more than the salary they are paid.
Do LLMs suffer from catastrophic interference? A SWE is expected to learn continuously as the dujour stack constantly changes under our feet. This can be brute force mitigated by epochal retraining from ground up but that will cost money. So the cost equation must include a periodic 'rebuild of the dev team'.
I feel I'm getting a different vibe that what everyone is getting. So much of leetcode questions/ CS questions aren't hard to interpret. They're hard to solve sure but that's normal. So much in the workplace though that IS hard is finding the actual correct question. I think us as software engineers will be doing more not less. It will give us opportunities to try a variety of solutions quicker. The real work is formulating/distilling the actual problem from the end-user.
We all have in our minds yeah a self-driving car, robots in a warehouse etc. But I truly hope something like Chat-GPT could be made for fields like medicine, geriatric care, education. There are actual jobs that need people that we can't find people for. A LLM to help social workers navigate the mess that is family law. A LLM to help families needing urgent care to make a prescription for a sick kid. There's a lot of opportunities we're missing here.
Shared a beer or two with machinist friend yesterday. It blew us away that given a description of a block of steel and how we wanted to process it, it would write generic G-Code (commands for a computerised machine tool) that were feasible, but also would give explanations for the purpose of each command. You could then ask it to adjust the design. We asked it write a poem about G-code and sharing a beer and provided a pretty nice "techbro" moment describing the love of seeing beauty in design and making things.
At home I later got it to write decent sample router and firewall configuration (for my dayjob). Chatted with it about career prospects in the future, and had it write a pretty funny joke :-
Me: Can you write a joke about how token bus doesn't have wheels?
ChatGPT: Sure, here's one:
Why did the Token Bus cross the network?
Because it didn't have any wheels to take it for a spin!
As much as we like to say lots of software jobs are just plumbing, the current state of consumer software indicates we have a long way to go in terms of quality.
Whatever training data is fed to an AI will not be better than the data used by human engineers to write code at the macro level. Ergo, the code will be worse in quality.
> Whatever training data is fed to an AI will not be better than the data used by human engineers to write code at the macro level. Ergo, the code will be worse in quality.
No, that's wrong, generally speaking. There's successful work on self-play for text generation. E.g. you can have AI to generate 1000 answers, then to evaluate quality of all of them, then to make it learn the best, and so on. As with self-play in player-vs-player games I'd expect this technique to be able to achieve superhuman results.
What are objective metrics for generated source code? "It compiles" is just the baseline. You could look at coupling and cyclometric complexity to start. But optimizing those doesn't necessarily produce great code (though I realize that was never the goal).
That's a detail irrelevant to your argument and my counterargument. The point is that there's data beyond human generated available for training, therefore you can't conclude it will forever be restricted to human-level.
These code generators have become quite good at completing discrete coding tasks. And more so, copilot has become good at figuring out _which_ discrete task you are currently working on without the need for a comment to prompt it.
So two more things I think it needs to replace devs; one, the ability to explain it's code and respond to feedback about it
The second is the ability to break down broad problems (build me an ecommerce site for my business selling custom jungle gyms) into a set of plausible steps... iteratively begin performing those steps and add hooks for interrogation of the client about important details of the implementation.
I don't fundamentally see a reason why this shouldn't be fairly close on the horizon.. to my own chagrin.
I think ChatGPT could excel as a bullshit detector.. If ChatGPT can pass your college course or interview process then you seriously need to examine that process, and make it less bullshit-focused and more pragmatic.
> At first it will be "bot assisted humans" until it learns enough, once that database is built everyone will be fired.
On one hand this is great, because customer service jobs are so much waste of human potential. Humans have that great ability to adapt and AI is a manifestation of that. We are growing. Jobs "lost" to AI will prompt humans to advance and look for more sophisticated jobs that make better use of their beautiful brains.
I’m not impressed. The problems were probably basic Leetcode problems with solutions inside of the training set. It’d perform worse than Alphacode on a problem for which the training set contains no solution.
Alohacode was very impressive but probably not good enough to pass one of these interviews, provided the interview questions aren’t just recycled Leetcode problems.
I feel this says more about the HR screening process, than it does, the tech.
I fully expect amazing things to happen in AI, soon enough. I'm not too worried about the industry. I feel that humans have made a fairly big hash of things, in many ways, and AI may help clean it up.
Well it has no face, no skin color, no weird looking. At this point, it reduces huge amount of failure risk already. And interview is the most non-sense process, so what does this benchmark mean?
Mine was something along the lines of "design google". I am not sure when it became common belief that these interviews amount to leetcode regurgitation. Either that varies greatly by org or I was completely out of step with interviewing culture the whole time I was there.
Coding / algorithm interviews are distinct from systems design interviews. The interviewer would have been told which of the two to do. If your question was "design google", it sounds like you were only being scheduled for systems design.
(Anyway, you'd never ask a system design question from an L3, and only very rarely from an L4.)
That's odd because when I gave ChatGPT my icebreaker interview question that I used a lot at Google it fell right on its face. And this is a question I expect teenagers to ace in one minute.
Agreed, something is odd about this. A few people have sent me code that ChatGPT has written for them. They don’t have the capability to determine if the code is good so they ask me. If the result is even something that will compile or address the asked problem at all, it’s barely competent nonsense that only a beginner programmer would write. For example asking it for code that will generate recipes, and you get back code for a random selection from an array with “Italian, Chinese, pizza”. It never asks clarifying questions either, it’s perfectly happy with a “garbage in garbage out” approach. So if ChatGPT is passing the interview, the interview questions are not selecting for what I would consider a good developer.
Exactly. The point of the interview is for two people to have an exchange of thoughts, not for the candidate to literally program the whiteboard. All of my interview feedback on candidates centered around how they asked clarifying questions given incomplete or ambiguous prompts, and how they explained the limitations and assumptions of their approach. Whether or not they reached a conclusion was footnoted.
I saw this article and fed ChatGPT a bunch of questions I've seen before in my coding interviews. It nailed most of the algorithms, however, it failed completely giving me test cases it would run to test the function it just regurgitated. ie...it gave me an input then mostly incorrect/invalid expected outputs.
I think as programmers we may want to rethink the amount of knowledge we share online. That knowledge is how we make money, and people are mining it to build AIs that can replace us.
It is one thing to train an AI on megatons of data, for questions which have solutions. The day ChatGPT can build a highly scalable system from scratch, or an ultra-low latency trading system that beats the competition, or find bugs in the Linux kernel and solve them; then I will worry.
Till then, these headlines are advertising for Open AI, for people who don't understand software or systems, or are trash engineers. The rest of us aren't going to care that much.