OnMetal Performance In Early Benchmarks

jderekw · on July 24, 2014

SoftLayer has been doing this since 2006 and also offers virtual compute instances as well. Can be purchased hourly or monthly

http://www.softlayer.com/bare-metal-servers

old-gregg · on July 24, 2014

I've been SL customer and have using that prior to joining Rackspace and building OnMetal. The point isn't about "lets get rid of hypervisor", the point was to:

* Lets provision as quickly as VMs (as opposed to 1hr+ on SL)

* Lets engineer hardware to deliver maximum uptime via no-moving-parts design (as opposed to vanilla SuperMicro on SL)

* Lets design hardware to deliver maximum unit of work per dollar (like DB transactions/second per dollar, or requests/second/dollar) as opposed to average value elsewhere.

* Deriving from the above, lets just give RAM away nearly for free, and put dual 10Gig network in place, because modern apps should be mostly RAM-based.

* Lets adopt standard OpenStack provisioning API, with myriad of pre-existing tools, community and ecosystem (like auto-scaling, orchestration, etc) as opposed to proprietary API.

The end result is a completely different infrastructure, something akin to what OpenCompute pioneers use internally. This is how running at scale should be like.

As always, I encourage skeptics to spend more than 10 seconds on a product page, because most of the time there're humans behind it, and - in this case for sure - they are way too ambitious to be spending their lives simply cloning old designs.

Enjoy OnMetal, dear jberekw, it's built for critical thinkers (and skeptics! :-) like you and it's awesome - it's going to rock your world.

oasisbob · on July 24, 2014

The pricing (on the compute flavor especially) is appealing -- can certainly see uses for it.

Looking forward to OnMetal being available in other regions, especially ORD.

I'm one of the skeptics -- but that's because I've had very few problems with Rackspace cloud instance performance in general; rather, it's the control plane reliability which has caused 99% of the pain.

As an aside, I find the Cloud Servers SLA disappointing, as it defines control plane availability as 1 – (Total API Errors)/(Total Valid API Requests) . Can't launch instances on Super Bowl Sunday? Tough. The failed POSTs to /servers will never trigger a violation, so long as GET /servers is working:

http://www.rackspace.com/information/legal/cloud/sla

Most SLAs remedies are weak ("we break the SLA you get a free t-shirt"), but this one is broken to the point of glossing over fundamental outages.

snewman · on July 24, 2014

Agreed -- SLAs are a pet peeve of mine. I wrote a post [0] about this a few years ago, it still feels fairly current. Basically, I don't think SLAs are a useful tool for managing service provider relationships; it's too difficult to capture nuance in a legal document. Transparency and track records seem like a more realistic approach. We've been seeing some movement here, but not much. Most service providers offer very little hard data about historical performance, and don't say enough about their internal operations to let you make a realistic assessment of disaster probabilities.

[0] http://amistrongeryet.blogspot.com/2011/04/slas-considered-h...

FireBeyond · on July 24, 2014

We're strongly considering moving our infrastructure to IAD from ORD - we have 60+ VMs in ORD, and their datacenter is at capacity, to the point that only existing customers can create ORD VMs (and there was that instance a few months back where they couldn't provision new SSD block storage).

I know from their sales that ORD isn't on the OnMetal roadmap for at least 12 months (if at all - I believe "it doesn't appear on our 2014 or 2015 roadmap").

toomuchtodo · on July 24, 2014

> * Lets provision as quickly as VMs (as opposed to 1hr+ on SL)

How many Rackspace customers need a full machine in under an hour?

> * Lets engineer hardware to deliver maximum uptime via no-moving-parts design (as opposed to vanilla SuperMicro on SL)

So just replace spinning disk with SSDs. Unless you're replacing the last moving parts (CPU, PSU, Chassis Fans) with something solid state.

> * Lets design hardware to deliver maximum unit of work per dollar (like DB transactions/second per dollar, or requests/second/dollar) as opposed to average value elsewhere.

This is fair for RAM intensive workloads. Everyone else is already giving you instance SSD access.

> * Deriving from the above, lets just give RAM away nearly for free, and put dual 10Gig network in place, because modern apps should be mostly RAM-based.

Again, perfect for RAM intensive workloads.

> * Lets adopt standard OpenStack provisioning API, with myriad of pre-existing tools, community and ecosystem (like auto-scaling, orchestration, etc) as opposed to proprietary API.

Also another fair point. OpenStack (and its open platform design) is all Rackspace has to compete against AWS and Google.

I don't want to say OnMetal is their "Hail Mary", but Rackspace is exploring their options in the marketplace with regards to an acquisition: http://www.bloomberg.com/news/2014-05-15/rackspace-hires-mor...

old-gregg · on July 24, 2014

To answer your questions:

1. Anybody with $10K+ monthly hosting spend would love to get a "full server" in under an hour. Actually sub-second provisioning would be nice too.

2. You are right, and yes, we've moved power and cooling away from the servers to externally serviceable redundant arrangement. They're truly no-moving-parts. And we've put something way better than SSDs into them.

ColinCera · on July 24, 2014

As near as I can figure, the OnMetal servers are 3x-5x what I'd pay for dedicated servers from, say, a Hurricane reseller, and the virtual servers are more than 2x as much as comparable Linode servers, with the Linode servers offering way more SSD storage, more bandwidth, etc.

Not to mention, Rackspace charges you $120 per terabyte of data transfer, while you'll get several terabytes of transfer free/included when buying from Linode or most dedicated server resellers.

That seems like a hefty premium for...what? The Rackspace name? Is there any reason to believe that Rackspace has better uptime/reliability than Linode? Any reason to believe Rackspace has better hardware than Linode? Given that neither company — as with most cloud providers — provides any truly meaningful transparency, it's impossible to say.

For my money, I think I'd rather spend less of it, or spend the same amount and get more/redundant servers.

I'm genuinely struggling to understand the value of Rackspace.

snowwrestler · on July 25, 2014

Earlier this month Rackspace switched away from offering any servers without managed support, even in their cloud. So every server or VM you get from Rackspace now will come with 24/7/365 support, including phone calls with engineers.

That's the main reason their stuff is more expensive than AWS, Linode, Digital Ocean, etc. They just decided that their competitive edge is support, so they aren't going to compete with those guys on price any more. This will probably lose them customers in startups, but improve their margins overall.

http://www.rackspace.com/blog/newsarticles/rackspace-goes-al...

quacker · on July 25, 2014

I'm genuinely struggling to understand the value of Rackspace.

See [1]: "We don't offer raw infrastructure without service." That is, support is included in the cost. Lots of people don't need support or don't want to pay for it, particularly the startup-oriented crowd here on HN. But I would be interested in a comparison of Rackspace's support to others'.

Rackspace charges you $120 per terabyte of data transfer

Is this in the fine print somewhere? I wouldn't be surprised, but I don't see it mentioned in [1].

1: http://www.rackspace.com/cloud/servers/onmetal/

ColinCera · on July 25, 2014

Actually, I didn't notice that support was required. I was basing my cost comparisons on Rackspaces's "Raw Infrastructure" pricing — the fact you are required to purchase support at additional cost makes their pricing even less competitive.

They have a more detailed pricing page, with bandwidth charges listed near the bottom, here: http://www.rackspace.com/cloud/servers/

superuser2 · on July 25, 2014

>I was basing my cost comparisons on Rackspaces's "Raw Infrastructure" pricing

As parent clearly stated, there is no raw infrastructure pricing; all pricing includes support already.

ColinCera · on July 25, 2014

Perhaps you should pay a visit to the link parent posted, so you would understand what I was talking about.

http://www.rackspace.com/cloud/servers/onmetal/

Where they present their pricing in the format of a "Raw Infrastructure" price plus your choice of 2 different levels of support. So, there is raw infrastructure pricing — if you'd visited the page before posting, you'd have known that's where I got the nomenclature in the first place — but you must also purchase some form of support (where the pricing varies for different support levels).

snewman · on July 24, 2014

I want to want this, but the pricing just doesn't seem competitive with EC2. Am I missing something?

For example, compare the "I/O" server with an EC2 i2.4xlarge instance. The Rackspace server has 128GB RAM, 3.2TB disk, and 20 cores; i2.4xlarge has 122GB, 3.2TB, and 16 -- nearly comparable.

On EC2, I can buy a 3-year Light Utilization Reserved Instance for $3884, and then pay $828/month (based on 720 hours per month). After 12 months, my average cost has been $1152/month. I still have two years left on my reservation, which I can keep using, or possibly sell, so the effective cost is even lower.

If I'm more certain of a 12-month server lifetime, I can buy a 1-year Heavy Utilization Reserved Instance for $7280, and then pay $447/month, for a total cost of $1054/month.

On Rackspace, list price is $1800/month. Suppose my total spend is $10,000/month (list price) and I commit to 12 months. I get a 15% discount, or $1530/month. That's quite a bit more expensive than EC2, and with EC2 I'm committing less up front. A longer commitment would help, but it would also bring down the EC2 price.

Can anyone poke holes in my analysis?

hyperliner · on July 24, 2014

You are not too far off. I would use 730 hours = 365 * 24 / 12 instead of 720. You could add little (or not so little) things like the cost of IOPS, bandwidth differential, etc. You could add a little more for the core diffs. But all that won't close the gap.

Regardless, if you want the cheapest, I would go with Digital Ocean, though it is a different kind of hosting. For Rackspace, you really go when you need their support people and not just a server.

I hear a lot more developers this year going to DO than last year, but I have never used them though, trying to stick with the "devil I know."

snewman · on July 24, 2014

Digital Ocean is interesting, but they don't really come out cheaper for large, continuous-duty instances. Their tiny servers look great, if you don't need much storage. But on the larger instances, it's a straight $1/GB/month for SSD. With some commitment, Rackspace is more like $0.50, and Amazon around $0.35.

Digital Ocean looks roughly comparable to Amazon for RAM, and slightly ahead on price-per-core (if you assume all cores are equal), but way behind on SSD. Again, this is assuming you want large instances and are going to run them 24/7 for at least 6 to 12 months -- otherwise, DO pricing starts to look a lot better.

FireBeyond · on July 25, 2014

I can pick a few holes...

EC2 i2.4xlarge: you get 16vCPU, whereas that Rackspace gives you a DUAL 10 core Xeon E5, 40vCPU - it's 40 cores, not 20.

Memory is much of a muchness.

The drives are 2 1.6TB OCZ PCIe devices that can be RAID 0'ed if you choose, and each has 400,000 IOPS, up to 800,000 combined, whilst the EC2 instance tops out at about 155,000.

adrr · on July 24, 2014

Whats the reasoning for using light utilization? Never seen it used in terms of cost saving strategy. Usually its medium vs heavy vs ondemand.

snewman · on July 25, 2014

Light utilization reserved instances IMHO is the best-kept secret in EC2. You get most of the savings of heavy utilization, with much less up-front commitment. Look at my figures above: light utilization costs only $98/month more than heavy utilization, for $3396 less up front. And those figures were for a three-year light utilization reservation, meaning I'd still have an asset at the end of the 12 months.

brendangregg · on July 24, 2014

SysBench is ok when used properly, but UnixBench?

They provide instructions on how you can patch and run it, using "./Run", but no warnings about what this is actually doing. See http://www.brendangregg.com/blog/2014-05-02/compilers-love-m... . I'd like to write a lot more about UnixBench, but I really don't have the time. It takes a lot of energy to refute this stuff.

If I were benchmarking OnMetal vs HW virt, I'd be showing a spectrum of micro-benchmarks, from equal performance (CPU) to network I/O. I'd expect some of my results to show a ~10x difference. You would then choose/weight them depending on what matters for your intended application.

cmsj · on July 24, 2014

Still not seen a discussion of how they are dealing with the huge security nightmare of direct customer hardware access.

old-gregg · on July 24, 2014

What makes you think it's "huge"? How do you think the dedicated hosting industry has been operating for more than a decade? You can re-flush the bios, you can secure-erase the storage, etc.

cmsj · on July 24, 2014

Just because the dedicated server hosting industry hasn't been dealing with the problem, doesn't mean it's not there :)

If I flash the BIOS (or the network card firmware or the LOM device firmware or the disk controller firmware or the individual disk firmware, etc) with malicious firmware, it can lie to you when you try to reflash it later, leaving my malicious code running against later customers.

wmf · on July 24, 2014

This was discussed at OpenStack Summit Atlanta: https://www.openstack.org/summit/openstack-summit-atlanta-20...

The main concern seems to be customers reflashing the firmware which they want to prevent with firmware signing.

cmsj · on July 24, 2014

The question about that (at about 19:30 in the video) was asked by one of the Ironic developers on my team :)

The answer is basically "we don't do it, we'd like to see it happen". I wasn't at that session, but I was at the Atlanta Summit and so far I haven't managed to get a good answer from Rackspace people about how they're actually tackling this in production today.

cmsj · on July 24, 2014

I'd also add that firmware folk tend to be pretty conservative about updates and it's an area of machines that's not used to being hardened. you only need to own it once and you can lie forever about the attempts to reflash it with a legit firmware.

contingencies · on July 25, 2014

It's even worse with IPMI.

AFAIK you can own either the IPMI BMC or the BIOS, then if one is flashed, the other can replace it to stay resident in denial of your attempts to remove it.

You have to coordinate flashing both at 'the same time' (which is possibly impossible) in order to get any peace of mind.

That includes situations in which the IPMI BMC isn't even plugged in to a network, because it still has constantly running firmware and backdoor access to the main system.

Of course, you can do half-way mitigations like dumping firmware contents and comparing checksums over time, but no single savior probably exists. A potentially strong tool here is a hardware jumper on the mainboards that says "disallow flashing at all, full stop" and another one that says "disallow reconfiguration at all, full stop" that is honored at a level of circuitry unenslaved to software. Both the BIOS and the IPMI BMC need protections on this level.

cmsj · on Aug 7, 2014

Everything is worse with IPMI :)

(as for half-way mitigations like dumping firmware, I would suggest that that is a fairly worthless option, malicious firmware would likely be able to return clean data to the dumper, unless it was a hardware dumper)

pfg · on July 24, 2014

Is this any different from your regular dedicated server provider? (Honest question, I don't know much about OpenStack.)

hyperliner · on July 24, 2014

I am sure a regular dedicated server provider would take a long time to get you the server. Maybe days? Not sure.

I think their point is they provision a "dedicated" server in "minutes."

wmf · on July 24, 2014

Does the reduced provisioning time create a huge security nightmare? I don't really see how it does.

cmsj · on July 24, 2014

The time to provision doesn't really affect the size of the security problem, but it does make it easier to deploy malicious firmware across a fleet of hardware, because everything is behind an API.

_delirium · on July 24, 2014

It's also much cheaper, since it has per-minute billing rather than per-month billing, and no setup fee. Getting temporary access to 100 Hetzner servers would cost you ~€10,000, because you have to pay a setup fee plus a minimum one month rental for each one, while it looks like it'd only cost you ~€100 to get 100x OnMetal servers for an hour. If someone stands to benefit enough, you'd be screwed in either situation, but it lowers the cost of opportunistically sowing some malware.

hyperliner · on July 24, 2014

Most hosters or clouds already provide you servers in seconds / minutes.

mrinterweb · on July 24, 2014

The pricing of OnMetal appears relatively competitive if not on the expensive side. I think the best way to make the case that OnMetal is worth the premium is with a series of system benchmarks comparing their solution to comparably priced AWS instances.

tszming · on July 25, 2014

So is rackspace now giving up the virtualized cloud market? (Because I remember they didn't follow the recent price drop of Google/AWS/Azure?)

abrahamrhoffman · on July 24, 2014

Wanna see if we can salt-call the servers to spin-up on demand. O_o

jayofdoom · on July 24, 2014

If it can do it for normal Rackspace Cloud instances, it can do it for OnMetal instances.