Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So the upgrade took a few hours to complete and it didn't happen instantly in defiance to the laws of physics. Next time play it safe and upgrade during the weekend.

The post is dated May 27, is Google planning to announce a new feature for Apps this week and this is some sort of a preemptive PR attack?



To play devil's advocate:

1. There is no indication in the upgrade process that the upgrade would "take a few hours to complete". Quite the opposite, it indicated there would be no downtime.

2. The customer service reps seems to have little insight into what was happening with the account which is a bit scary. I'm always nervous of black hole "processing windows" where all you can do is wait and hope for the best. It's bad when it is customer facing, it's worse when it is customer service facing.

3. Upgrading on the weekend could potentially be more troubling given that customer service may not be staffed with the quantity/quality to fix issues if they occur. Not saying this is or is not the case for Google, but running into problems on a Saturday morning and getting a "call us back during normal business hours" is just as troublesome.

What should have happened in my opinion?

1. Google should document that there can be a temporary delay of x - x hours while the upgrade happens.

2. Florian should have scheduled the upgrade ahead of time with the team, letting them know that worse case there may be an outage of 'x' minutes/hours/days.

3. Google customer service should have better tools as to upgrade process for both the admin (customer) and customer service rep so that it is not a black hole of 'wait and hope'. Even a step 1 of x, or you are number # in queue, or estimated time, etc.


It wasn't that it took a long time. It's that the interface showed a state as if all the mail was deleted. That could have been devastating.

We recently upgraded to paid Google Apps for Business and we didn't get any downtime. This is how it is supposed to work.

Unfortunately we also had irritating problems after the upgrade. We upgraded because our customer support email address had run out of mail quota. Paid google apps has a 10x higher quota, but upgrading to premium didn't increase it.

After ringing customer service they refused to increase it immediately and said that it would be increased "in a couple of months time".

Meanwhile hundreds of our customers were going unanswered.

I suppose I can imagine a scenario in which they would want to wait until after credit card chargeback window before lifting a quota limit, but their support department should understand the paralysis of a business offer to solve the problem.

We had to create a temporary support2@mycompany.com email and manually port email across which wasn't fun and played hell with our ticketing system.


Why not pull down/delete old mail off your support@example.com email address?

I can't imagine you have 5 gigs (or however many they give you these days) of tickets that can't deal with a few days of cold storage.


So the upgrade took a few hours to complete and it didn't happen instantly in defiance to the laws of physics.

What law of physics necessitates that such an upgrade takes 6 hours?


> What law of physics necessitates that such an upgrade takes 6 hours?

the one where the customer isn't paying enough money to have somebody working 24/7?

If your business depends on a particular service, you don't accept something vague - you get things signed in contract with legally enforcible SLA's. If you can't afford that, then you just have to live with shitty service.


Have we just transitioned from the "It's free, you can't expect not to get screwed" trope to "It's inexpensive, you can't expect not to get screwed"?


It's Google. You never see the Google front page go down. There's an implied availability that Google uses to their advantage. People are comfortable with upgrading _because_ it's Google. Google wouldn't fuck you.

Except when they do. Lesson learned.


You do realize we live in a world of automation, and that Google... automates?


6 hours... We run a 4 person company and 6 hours during the weekend would still be a problem for us.

Given a company larger than 10 might actually upgrade (small amount of shared email accounts) I could see it causing big problems for some.


How do you get 24x7x365 coverage with only 4 people? The "standard" is it takes about 5 to fill a position 24x7x365 over very long term on a larger average with absolutely no failure (unstaffed) tolerated. As in, 50 people can fill 10 positions absolutely positively all the time, but it doesn't downscale well at all to just 5 people and one desk. You can "fill" ten positions with a lot less than 50, but even at scale there's going to be a heck of a lot of time when there's only 8 or 9 people there hence the quotes around "fill". For instance you can "fill" 10 positions with 30 people but there's going to be a heck of a lot of time where the supposed 10 positions only have 7 (or even fewer) bodies actually on deck.

Depending on the business sector of course. If you're a stereotypical weekday business then a 6 hour outage at 9am on a weekday would be a disaster, but a 6 hour outage from midnight to 6am on Sunday morning wouldn't even be blinked at because no one cares.


I agree with you midnight to 6am Sunday would probably be okay. Just would be a really bad time to get an email stating an issue with a payment gateway or marketplace listings or an issue with the website.


Really? Do you run a highly reliable email service yourself then? I certainly would not use gmail if that were the case. You should seriously look at why you are so reliant on email, as it is not really a reliable service, the mail servers at the other side could easily delay mail for 6 hours, it is only a best efforts eventually consistent protocol with very long delays allowed (you don't normally get bounces for days if the mail is queued for retry).


Well the world of business tends to rely pretty heavily on mail, not sure what other methods of reliable messaging there is?

Sure there is phone calls etc, but emails are used when you need an actual record of something. Also relying on mailservers to not bounce the emails and actually deliver in the end is laughable, when mailservers do go down this hardly ever happens properly.


Well, websites (or APIs) are generally much more reliable forms of messaging, because you can get immediate feedback on processing. The OP said it was business critical; most corporate email is not business critical, and missing business critical email in a channel with spam, high volume unimportant messages, and also relying on a human to answer stuff is all pretty unreliable. Sure you can feed email into a ticketing system, but at least there there is also a web interface (so you do not rely on email).

If you need a record of something, I do not think that unsigned emails are legally binding anyway, again you should probably submit signed contracts over the web not email.

If it is not time critical, email can work, but the OP said a six hour outage would be a problem, and I still think that is one they have created themselves and is a business risk.


It sounds like email isn't critical for the corner of the universe that you operate in -- good for you.

For lots of people in different roles, email is an essential tool for getting work done. Not everyone has a role that can be readily translated into an API. Business Dev/Sales for example, depends on email to communicate with the various folks that they need to engage in. Whatever those folks do, it ends with the company getting a check, so it's important.

Generally speaking, it is pretty inexpensive to deliver a 99.9% available mailbox with a 100% guarantee of external mail delivery. The fact that Google bungles a conversion from free to paid service so poorly is a sad statement when they are supposed to be a shiny alternative to the traditional Micrsoft messaging stack.


Bus dev/sales can survive a day without email every now and again (in my experience, far more than a day without a phone system). Everything will resume the next day, sure it pays the bills but a six hour outage is not "mission critical", it wont stop the "check" from arriving (email is not a payment method after all).

I have known large (email dependent) businesses have 2 day outages on the traditional Microsoft mail stack too. There are coping strategies. It is annoying not mission critical.


Are you serious? Running my small business, I used email to communicate with my coworkers, potential hires, customers, prospects, partners, potential partners, reporters, accountants, lawyers, banks, web hosts, PayPal, UPS, various government departments etc etc. Being completely cut off from that for effectively an entire business day would be destroy my ability to get anything outside "solo hacker in basement" coding done, and there really isn't a sensible mitigation plan or alternative to e-mail for this (unless you'd like to convince my bank to start posting, say, notifications of incoming wire transfers to a web form of my own design?).


Your bank provides a website you can poll to find out about stuff. Not ideal, would be nice to have a proper API, but so far we only seem to be getting these for credit card payments but this is changing (eg see gocardless). For most of those things you list you could manage in an email outage, as you have phone numbers or other alternatives.

If it is that critical do you have 99.99%+ (52 minutes a year) uptime guarantees on your mail service? That is what you are asking for, and to actually deliver that (rather than an empty SLA promise) is something that very few businesses actually try to work to, especially small companies. Gmail certainly does not try to provide this.


Google's enterprise Apps SLA is 99.9%, and last year they hit 99.98%.


Most people would assume that this is nothing more than an administrative change on the account, with no long drawn out six hour process. And if there is an upgrade time required, it absolutely should have in-your-face warnings given the importance of email 24/7 for many organizations.

Even if we buy that this is more than an administrative change and it somehow moves to better hardware, this is a problem that I would think that Google would have built to a mostly transparent process -- at most long term archives are unavailable after a very brief initial migration, etc.


That exactly the sort of assumption you can't afford to make if you've honestly got company-destroying problems if your email goes down for 6 or even 24hrs.

It's easy in hindsight, but I've seen that happen before, and I have no doubt that if you'd considered the risk and (perhaps ironically) googled for information, you too would have known about this.

On the other side of the coin (and perhaps the reason Google haven't cared enough to fix the problem), SMTP is nicely designed so as to not result in this sort of thing losing any mail - "well behaved" email systems will just queue and retry mail for 5 days if needed.


That exactly the sort of assumption you can't afford to make

Okay so what if the transition took five days? How about thirty days? In the absence of seemingly any warning information at all on this, how does one ever perform such a change?

Let's take it further -- what if adding a user took down your email for a month? That is just as rational as removing an artificial limit is ("Oh didn't you know? You pushed our global user database past the threshold so we had to migrate platforms"). How about if you send an email that you CC to ten people and that takes your email down for days?

If this wasn't an expected behavior, and is seemingly a mere administration change, there is no universe where it can be pinned on the user. Doubly obvious given the confused responses of customer service.


Exactly my point. If you're making changes to mission-critical things, you need to have both reliable information on how long those changes will take, as well as rollback plans to cover the risk of things going wrong.

Randomly clicking things like "change my email system" buttons, then complaining afterwards that it didn't work how you expected is the sort of mistake most of us have made at least once.

Once you've suffered through those mistakes, you tend to view phrases like "expected behavior" and "seemingly a mere administration change" a lot more suspiciously.

If it's mission critical, don't "assume", don't "expect" - things are often not as they "seem". As they say "Trust, but verify." Yeah, Google (or Rackspace, or MessageLabs) will _presumably_ "get it right", but when the consequences of "presuming" are business-destroyingly-high, verify the presumptions first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: