This is a lot harder than the author wants to realize. Elastic infrastructure deployment in sub second latency is not likely to ever happen. The problem is simply a lot harder than serving http requests. It's not receive some bytes of text, send back some bytes of text. VMs are just files and can be resized, but optimal resource allocation on multitenant systems is effectively solving the knapsack problem. It's NP hard, and at least some of the time, it involves actually powering devices on and off and moving physical equipment around. There is still metal and silicon back there. In order to just be able to grow on demand, the infrastructure provider needs to overprovision for worst case, as the year of Covid should have taught us the entire idea of just in time logistics for electronics is a house of cards. Ships still need to cross oceans.
My wife was the QA lead for a data center provisioner for a US government agency a few years back and the prime contractor refused to eat into their own capital and overprovision, instead waiting for a contract to be approved before they would purchase anything. They somehow made it basically work through what may as well have been magic, but damn the horror stories she could tell. This is where Amazon is eating everyone else's lunch by taking the risk and being right on every bet so far so if they build it, the customers will come.
But even so, after all of that to complain that deploying infrastructure changes takes more than a second, my god man! Do you have any idea how far we've come? Try setting up a basic homelab some time or go work in a legacy shop where every server is manually configured at enterprise scale, which is probably every company that wasn't founded in the past 20 years. Contrast that with the current tooling for automated machine image builds, automated replicated server deployment, automated network provisioning, infrastructure as code. Try doing any of this by hand to see how hard it really is.
Modern data centers and networking infrastructure may as well be a 9th wonder of the world. It is unbelievable to see how underappreciated they are sometimes. Also, hardware is not software. Software is governed solely by math. It doesn't exactly what it says it will do, every single time. Failures of software are entirely the fault of bad program logic. Hardware isn't like that at all. It is governed by physics and it is guaranteed to fail. Changes in temperature and air pressure can change results or cause a system to shut down completely. Hiding that from application developers is absolutely the goal, but you're never going to make a complex problem simple. It's always going to be complex.
Remember my wife? Let me tell you one of her better stories. I can't even name the responsible parties because of NDAs, but one day a while back an entire network segment just stopped working. Connectivity was lost for several applications considered "critical" the US defense industrial base, after they had done nothing but pre-provision reserved equipment for a program that hadn't even gone into ops yet and wasn't using the equipment. It turned there was some obscure bug in a particular model of switch that caused it to report a packet exchange as successful even though it wasn't so that failover never happened, even though alternative routes were up and available. Nobody on site could tell why this was happening because the cause ended up being locked away in chip level debug logs that only the vendor could access. The vendor claimed this could only happen one a million times, so they didn't consider it a bug. Guess how often one in a million times happens when you're routing a hundred million packets a second? Maybe your little startup isn't doing that, but the entire US military industrial complex or Amazon is absolutely doing that, and those are the problems they are concerned with addressing, not making the experience better for some one man shop paying a few bucks a month who think yaml dsls make software-defined infrastructure too complicated for his app that can plausibly run on a single laptop.
It’s true that we’ve made a ton of progress. But I LOVE it when developers demand more and I will bet good money that it’s only a matter of time when what the author describes will actually happen.
We’ve been making tremendous strides in making production software systems easy and accessible for developers (and not sysadmins) to understand and operate. We’re currently in the stage where the devops principle has been somewhat realized; devs love the ability to deploy when they want but absolutely hate that infrastructure provisioning takes so long compared to the tools they use for software development. This is a good thing! I want more devs to demand this; someone may actually solve it.
> We’ve been making tremendous strides in making production software systems easy and accessible for developers (and not sysadmins) to understand and operate.
in my experience developers are terrible at doing operations. (for good reason mind you).
Devops only really became possible because a lot of operational problems have been outsourced to cloud providers, but they still persist. Or do developers also want to build the networks which provide fault tolerance, redundancy and scalability while also building software?
Running a large, multi tenant system for different workloads takes a mind boggling amount of work and skill, not to mention deep understanding of how networks and systems operate in detail and at scale.
> The vendor claimed this could only happen one a million times, so they didn't consider it a bug. Guess how often one in a million times happens when you're routing a hundred million packets a second?
This reminds of people on HN who claim BGP is outdated and should be replaced.
They have no idea how mind boggling large and complex the global internet routing is. I would even argue that BGP in the default free zone (aka, the public internet) is the largest distributed system in the world.
> VMs are just files and can be resized, but optimal resource allocation on multitenant systems is effectively solving the knapsack problem.
as someone who works in a datacenter/cloud enviroment. Having optimal resource distribution is simply impossible to get 100% correct. Their is always some waste or over/underallocation. The same goes for large scale networks aswell.
My wife was the QA lead for a data center provisioner for a US government agency a few years back and the prime contractor refused to eat into their own capital and overprovision, instead waiting for a contract to be approved before they would purchase anything. They somehow made it basically work through what may as well have been magic, but damn the horror stories she could tell. This is where Amazon is eating everyone else's lunch by taking the risk and being right on every bet so far so if they build it, the customers will come.
But even so, after all of that to complain that deploying infrastructure changes takes more than a second, my god man! Do you have any idea how far we've come? Try setting up a basic homelab some time or go work in a legacy shop where every server is manually configured at enterprise scale, which is probably every company that wasn't founded in the past 20 years. Contrast that with the current tooling for automated machine image builds, automated replicated server deployment, automated network provisioning, infrastructure as code. Try doing any of this by hand to see how hard it really is.
Modern data centers and networking infrastructure may as well be a 9th wonder of the world. It is unbelievable to see how underappreciated they are sometimes. Also, hardware is not software. Software is governed solely by math. It doesn't exactly what it says it will do, every single time. Failures of software are entirely the fault of bad program logic. Hardware isn't like that at all. It is governed by physics and it is guaranteed to fail. Changes in temperature and air pressure can change results or cause a system to shut down completely. Hiding that from application developers is absolutely the goal, but you're never going to make a complex problem simple. It's always going to be complex.
Remember my wife? Let me tell you one of her better stories. I can't even name the responsible parties because of NDAs, but one day a while back an entire network segment just stopped working. Connectivity was lost for several applications considered "critical" the US defense industrial base, after they had done nothing but pre-provision reserved equipment for a program that hadn't even gone into ops yet and wasn't using the equipment. It turned there was some obscure bug in a particular model of switch that caused it to report a packet exchange as successful even though it wasn't so that failover never happened, even though alternative routes were up and available. Nobody on site could tell why this was happening because the cause ended up being locked away in chip level debug logs that only the vendor could access. The vendor claimed this could only happen one a million times, so they didn't consider it a bug. Guess how often one in a million times happens when you're routing a hundred million packets a second? Maybe your little startup isn't doing that, but the entire US military industrial complex or Amazon is absolutely doing that, and those are the problems they are concerned with addressing, not making the experience better for some one man shop paying a few bucks a month who think yaml dsls make software-defined infrastructure too complicated for his app that can plausibly run on a single laptop.