We actually did those steps as part of our overall assessment, and you're right that we could have taken an outage window for that long and called it a day. We decided the tradeoff wasn't worth it for our situation, but taking the outage window is definitely a viable option.
I'm sympathetic to your comment that 15 minutes of planned downtime is fine for approximately 100% of SaaS companies. That's probably true here too, and maybe the work of doing this kind of upgrade was a waste in that regard. But, in considering the kind of product experience we would want for ourselves, zero downtime seems better than no downtime. The opportunity cost of feature work over the same window is real, but so is the reputation we hope to build as a platform that "just works" even if it seems crazy the lengths we might go to so that our customers don't have to think about it.
> The opportunity cost of feature work over the same window is real, but so is the reputation we hope to build as a platform that "just works" even if it seems crazy the lengths we might go to so that our customers don't have to think about it
This part can definitely make sense, and if nothing else it can foster an engineering culture of "we care", which is great. I just wanted to show the other side but from your answers it seems like the team weighted the options. It's definitely a cool project to work on. Thanks a lot for engaging with a random grumpy guy on HN!
Random comment, but just wanted to say I really appreciate your blog post, but also I appreciate the informative and helpful discussion between you and vasco here. Feel like this could have easily devolved into defensiveness on either side, but instead I learned a lot from both of your responses - I feel like these kinds of interations are HN at its best. Thanks!
I'm sympathetic to your comment that 15 minutes of planned downtime is fine for approximately 100% of SaaS companies. That's probably true here too, and maybe the work of doing this kind of upgrade was a waste in that regard. But, in considering the kind of product experience we would want for ourselves, zero downtime seems better than no downtime. The opportunity cost of feature work over the same window is real, but so is the reputation we hope to build as a platform that "just works" even if it seems crazy the lengths we might go to so that our customers don't have to think about it.