Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Tell HN: Worst AWS service and support team experience
27 points by citguru on June 30, 2022 | hide | past | favorite | 20 comments
Its been about 10 hours now that our core business operations is down because AWS security system decided that our account has unauthorized service usage and place a restriction on some part of services we can have access to.

One of them is obviously AWS Lambda. Two weeks ago we started processing large volume of transactions (We provide wallet and digital assets infrastructure for businesses) and needed to increase our services performance from request time out to memory to even our RDS instance to be able to accommodate the kind of requests we are currently processing.

However yesterday (29th of July) at around 5:04PM we got an email from AWS about unautorized service charges due to some unauthorized activity and could be a potential account hack or compromise.

My heart first sank because I was just waking up from a nap after reviewing the new lambda functions the teams worked on and planning to promote to production and staging server.

The second is this is a very high risk security issues and could be a potential hack. We primarily use MPC base vault to securely store and process transactions. Regardless, this is a very a big issue and had to respond by first of all changing the root passwords, deactivating all API keys, disabling console login access for all other users and removing any type of old keys.

After doing this will at least help secure our account for the main time. But then I was curious as well since the suspected activity is because of the unauthorized service charge, decided to go check all services from each region and end up checking our AWS Bill and carefully looking at the bill.

At the end it's a bill the team expected following our new changes from increasing our lambda function memory to time out and upgrading our RDS instance, the bill basically makes sense. AWS must have mistakenly detected this as unauthorized charge.

However the issue here is it's been 10 hours, and we have literally lost access to AWS Lambda functions the moment they notified us of this issues and one of our core business solutions is down because AWS shut it down.

The customer support experience from the AWS support team is the worst I have seen in my entire life. This is outrageous and never wished anyone to experience this. For over 10 hours, our customers (businesses such as payment gateway and exchanges) can't run some business operations because the lambda service (which cant be easily migrated to another alternative service) is down.

Their team keep saying they are waiting for their internal team response for over hours and no good response yet. They didn't provide a valid reason nor why this is taking too much time to fix. Its like they are lack of empathy and just robots behind the keyboards.

The AWS Support team are very heartless with no sense of urgency and this is not the first time such encounter is happening.

This has taught us a lesson not to build our product to be tightly locked with a cloud provider as migrating would be like starting from scratch and be more complicated.

If you work in AWS, the technical support team or customer support team and you would like to help us get past this. Please you can reach out here: hi[[at]]powr[[dot]]finance.

Thanks.



10 hours isn't too bad when dealing with a giant organization. Often problems cut across teams and services, and troubleshooting then liaising and ultimately getting a remediation action through (which might involve producing, testing and releasing a patch) all takes up time. Sometimes things blow out to weeks!

Personally, my last AWS Support ticket was pertaining to Lambdas and I got a very good answer. I was impressed.

It's important I think to appreciate working in support is difficult work, every single day is a customer with their own urgent problem. When urgency is the norm, it's not urgent. And heart? It can be soul sucking work.

In my observation support takes the brunt of the rest of the orgs shortcomings, bad releases, deprecated features, etc, drive customers towards you in unfortunate circumstances. Sometimes there's a whole waterfall of shit raining down on you, and it ain't your fault, and there's nothing you can do or could have done.

And to add insult to injury, you're normally at the bottom of the org pecking order.

As I say, difficult work. I salute all those who do it!


We have SLAs which pay well but incur fines; we pick the hosting partners so we cannot exclude them as ‘act of god’ or something when they go down or something, like this, happens because of them. 10 hours is a bizarre amount of time and would be very costly for many reasons.


10 hours is too much time. Time is money, we deal with time sensitive business. If people are not able to process transactions how are their users able to do their daily activities. Money was definitely lost once the Finance team analyse the situation


You're not paying for the support you're expecting. This was an oversight on your part to vent on HN instead of upgrading to the Business tier the past however many hours your systems have been down. You can face the same suspension issue with any cloud provider. A painful, but necessary lesson it seems.


Like everyone has pointed out, support plan matters in these case. I do 2nd line support for a very big Enterprises and have done both hardware, software and now am at a big Saas.

If you do not have the right support plan sure support will work on your case, but if you are on a "basic" or "free" support plan, the SLA can be anything from "lol tough luck not our problem you are out of scope" to "We'll look at it but it's best effort only". And if its like PremiumPlus, SignatureDeluxe, or Enterprise then its usually something like 30 min, 2h, 4h or 2h, 4h, nbd on the severity levels time to next reply.

Now in my experience involving other departments within your own organisation usually yeets all sense of urgency out the window. Very few companies has interdepartmental SLA's and cross departmental cooperation on cases is like pulling teeth, unless you get some VP or AE to demand shit be done.

Which is why the most expensive service levels have a TAM or SCM that will act as your incident manager on the inside of the company. Trust me these guys and gals are worth their weight in gold to you as a customer when the shit hits the fan. THey cost an arm and a leg becasue they are on 24/7 on call duty for you as a customer when it comes to severity 1 priority 1 business continuity firesale stuff. Great people usually, but will become a pitbull on a mission for you if needed.


AWS allows you to choose the level of support they offer you.

This is a feature not a bug.

Good customer support is worth its weight in gold and is very expensive for a company to provide.

AWS lets you choose. This is best for everybody - those who value it get support from people who aren't seen as a cost centre and are given time to help their customers, and aren't cross subsidising those who don't value it which would cause a downwards pressure on quality.

As a consequence - if you pay for it - AWS has some of the best support I've had from any vendor and is a large part of why I continue to recommend them.


Companies can afford good service. amazon chooses to nickel and dime for it. not because of quality and certainly not because they cannot afford it....


Support isn't for free, and even their developer support is good, you instead pay for response times. If OP wanted a quicker response time, then you should pay for much more than just support as a "developer".

https://aws.amazon.com/premiumsupport/plans/

As others have said, OP fucked up by not taking on a business support plan, AWS has the best support I've seen in a Cloud environment, Google Cloud being a close second. Azure being marginally better than Google's consumer support services and Facebook, e.g, a little better than trash.

Edit:

Business support SLAs, what OP should be on (link above): General guidance: < 24 hours System impaired: < 12 hours Production system impaired: < 4 hours Production system down: < 1 hou

What they are on: General guidance: < 24 hours* System impaired: < 12 hours*


Damn. My assumption was AWS was a market leader globally cause of its customer service. Hope your issue gets resolved.


Its a hard role to do right and empathy is probably asking too much for how much support folks get paid. Now there are people who go above and beyond but that's on their own personal cost.

Every metric is tracked and its second worst role after Ops roles like warehousing.

Maybe yours is a smallish business who has never had these issues before, it always help to have a dedicated support to reach out to during serious outages.

Also you didnt mention which support plan your firm is on - https://aws.amazon.com/premiumsupport/pricing/


Developer Plan


Well I mean.... there's your problem right there. You're paying for a 12 hour response time, on a plan the AWS has told you fairly explicitly is NOT for production or mission critical workloads.

I get that it's frustrating that they try and nickel and dime you on support costs, but at the end of the day you do get what you pay for. I've used Business and Enterprise support in the past and always had very good support from them delivered very quickly.


I'd start by going to business plan and trying again.


Which support plan are you on?


OP replied they're on a developer plan, 10 hours is within the support arrangement. They need to go business.


Interesting. This is why in my current job I developed our business logic to be completely self contained and agnostic to the outside world. There's a lot of needless translation that happens (DynamoDBClient -> MyCorpDynamoDbClient) but we could painlessly switch to a different provider by translating only the edge classes of our system.


Easily ELB. You would be horrified to realize that a decent chunk of the internets traffic is routed across load balancers that are pretty much just held together with the prod-eng equivalent of chewing gum and baling wire.


Where can I read more on this? As I understand it, Internet protocols can route globally so a few routers going down is not Armageddon.


Probably, a lesson for all of us here, to include this scenario in our Disaster Recovery strategy for such important components.


Already started porting some of the services, but alot are tightly coupled with AWS services




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: