A colleague from a different department was managing a fleet of servers doing a lot of computing. Their BMCs would just stop working after a few weeks of uptime, every time. He was annoyed by it but mostly ignored it as it was mostly just used for additional monitoring. The power outlets in the racks were remote controllable, so you could still hard reset a server if this was required.
However, as this fleet was used for computation, after a while they noticed that whenever the BMC stopped working, performance of the system increased by almost 10% or so. Definitely a non-neglible amount. So they kept the machines running in the broken BMC state for as much as possible.
I've seen pretty much the opposite close to a decade ago with a number of AMD Opteron/15h-based servers we had, where the BMC could end up in a bad state that made the ipmi-driver-spawned kernel thread handling it burn a whole core for no good reason.
I wonder what the BMC was doing to hurt performance. Maybe it was questionable DVFS that stopped when the BMC died? Poor fan management that caused thermal throttling?
That's a pretty good guess: lots of BMCs have fan control, and turning it off should make all fans stay at max power.
In a datacenter the odds of anyone noticing that the fans are always on high is practically nil, unless you are specifically monitoring fan RPM. Most folks don't bother, as what you are actually interested in is temperature.
We had an interesting incident where one of our datacenter temperature sensors kept on rising, and operators on site could clearly hear the noise increase of all servers going to max fan speed following a BMC "crash" triggered by a network loop on our IPMI lan.
It took us a while to identify the issue and all systems were running fine, but we had to shut down many racks to avoid the temperature to rise too high.
Anecdotally, this reminds me of a terrible experience with some Dell hardware back around 2011. I volunteered to do an installation of some hardware in a POP cage at one of the London UK colos.
Hardware arrived a day or two late, got it unpacked and carted upstairs. Spent a few days installing it (had to build a deployment environment on my Mac in a VM because the servers had no CDROM drive, only NetBoot and USB; our centralised imaging system hadn't been built with "installs outside of core DC network" in mind).
Discovered that the machines had been ordered with iDRACs, not DRACs. Damn things did not work until the box was booted into Linux and a driver to interface/activate the board was loaded. This, of course, worked wonderfully when the kernel locked up with a panic.
The machines were being used to test custom kernel modifications. Panics and lockups were common. Every time it happened, the engineers in the US had to ask "smart hands" to go reset the machine in the cage.
Experiment was a success though. Turned into the foundation for how a very large edge network was built.
This is brought to you by the BMC with a KVM-over-IP that wouldn't accept '2' entered on the (virtual) keyboard in any way or form.
This is one of those things that I'd probably be willing to spend a ton of time analysing to find the root cause if I could. Dump the memory image, debug the code, and figure out why it's just that (and possibly other) keys. My guess just from the description is that a bitflip happened exactly to the entry of a character map translation table or similar.
This stuff seems to happen on proxmox when you try speaking to a vm console using the novnc web interface. If I recall correctly for me it wouldn't handle modifier keys correctly so stuff like shift-2 for @ wouldn't work.
I suspect it is some kind of console and raw keycode mismatch with the remote ui, maybe via the browser.
I do development on openbmc (https://github.com/openbmc/openbmc) which is an open source bmc implantation using bitbake, primarily targeted to aspeed, and nuvoton bmc chips.
I was looking into this recently. It doesn’t seem like you can easily get your hands on BMC hardware. There is one project I found where they’re using an FPGA and everything is open source but it still looked far from easy.
Afaik the problem is typically not hardware but software, since most server motherboards come with a bmc chip on it. Are you designing a motherboard? All i would ask for a bmc is root access. But now everyone (but the big players, who design their own motherboards) is basically stuck with a stupid embedded linux with shitty software that has half-backed features they don't even need, but now need to care about. When all you'd need would be a way to access a host serial console, read sensors, control boot sequence and perhaps re-flash the bios, ie most likely just basic interactions on serial interfaces.
> Are you designing a motherboard?
Yes, the team I work for designs server motherboards.
BMC security is what keeps me up at night. Firmware software quality is low, and often not up to date. I think openbmc does a good job in both respects.
If the servers in question happen to be on the list of supported hardware, quite possibly. (I don't know of any up-to-date online list, but running `source setup` in the root of the source tree will print it.)
Not returned, but i've seen clients refuse to go with HP again because iLO sucked too much and come upgrade time went Dell. Unfortunately for them right around the time iLO got okay-ish and iDRAC got shitty.
This was me though I ended up moving to SuperMicro instead. Not fancy but their BMC seems to get the job done, and also doesn't cost a significant chunk of money extra for basic functionality. And then even more money to have IPMI be dedicated connection not shared. And HP's frigging BIOS wouldn't even work with their own HP rack console! But it was happy with an Apple mouse and keyboard. Argh! Making me irritated again remembering it. But yeah, we didn't return them but did sell them off to some other poor sucker and never bought an HP again. They may be great at large scale with the higher end central management software suite and support etc, but the BMC definitely was a dealbreaker at an SMB level. It's one of the few differentiating features vs a pile of other competitors who do the same basic thing, so I do think it's not entirely meaningless to get it badly wrong.
> SuperMicro instead. Not fancy but their BMC seems to get the job done
Kind of depends which generation of servers. I had worked with a lot of x9 and x10 (xeon e-2600v1-4) which was alright, as long as you don't mind outdated java (well the newest x10 bmcs do html5 consoles too, IIRC); but I recently started renting an x8 server personally, and it's worse... My favorite is when serial over lan just stops responding when you go from console redirection to os opened serial port (and back)... real helpful for inputting disk encryption passphrases. Oh well, I'm renting this server because it's cheap, it's also 10+ years old, and it works enough.
Totally fair. Yeah should have specified I'm talking fairly new stuff, at least new enough it all has HTML5 consoles and mildly more polish. Not that there aren't problems, less a "bug" then a "cutting edge for newbies" it does near zilch certificate validation for example. So when the friendly new guy wasn't paying attention and generated a client instead of server certificate and uploaded it to a remote one it cheerfully accepted it and whoops! Operator error of course but the real point was that remote rectifying it proved surprisingly impossible despite still having admin serial/ssh access. Documentation is bad and didn't seem to like typical reset codes or tools.
I guess after lots of problems with expensive "high end" fancy stuff like HPE's and co I kind of felt resigned that all BMC/IPMI was kind of crap, and at least with SM's I felt less obviously squeezed. Like I said math will all be different for high end stuff and big herds. But neither Dell or HPE's struck me as "oh yeah that's worth an extra $300 on every single last server!"
I was just fiddling around with a SuperMicro X8 IPMI the other day. The X8 IPMI stuff is terrible, e.g. the warning that your Java installation is outdated on opening the website etc.
Turns out, you can actually install X9 IPMI firmware on X8 boards as the platform files are still shipped.
Might be worth checking out, if this improves things for you. It did for me.
Check out https://github.com/devicenull/ipmi_firmware_tools for unpacking (and repacking) the SuperMicro firmware. The developer just merged my patches making it work with some of the X8 boards.
As long as your board is listed in /etc/defaults of the IPMI tree you should be good.
Datacenter / Cloud Service hardware is a race to the bottom, we want simplicity and ability to work with off-the-shelf and widely available tools (Redfish (cURL), IPMI (freeipmi, openipmi, ipmitool)) so we can integrate these things into our hardware management platforms.
In this world, the legacy vendors like Dell and HP feel like they have to differentiate their hardware somehow, or they lose all the margin. So they'll charge you for all those "value-add" things like KVM or OOB Firmware Updates, because they can't make money on the machines themselves anymore.
The irony is all that extra garbage they add to their servers is exactly the opposite of what you want at high scale and really only serves the "enterprise" market that tends to deploy VMware and hand-manage servers and need point-and-click stuff since there's no incentive to write software to manage small environments like that.
Yes, and as essentially a "hidden flat tax" it burns a lot more for large quantities of lower end hardware vs single bigger iron. Looking back at the records, at the time adding their iLO/M.2 comm card (needed to have a dedicated iLO port on those servers) was $65 from Provantage and then the license to actually do basic stuff with it like a web console was and $227. So ~$290 all told extra. We've got a few higher end systems, $10-20k NAS or larger hypervisor systems, and at that level sure an extra $300 stings a bit but is against a lot of other stuff. At medium range, more like $3k-5k, obviously worse. But some use cases called for say 6x $500-600 systems instead of a single $3k-6k system, and for a few projects someone was trying to go as cheap as possible and grabbed some still pretty new (gen10) plain vanilla ones off of used/bankruptcy sales for like $300-400 because the price seemed so attractive. But then using iLO properly could nearly double the price, and at the lower end there is a big difference in the hardware you can get for $600-700 vs $300-400, and then multiply that per unit.
Meanwhile competitor systems all have dedicated ports out of box. SuperMicro does have a paid "full unlock" for their BMC, but the only thing it does is add bios updates and such. All the core normal management functionality is there by default. It also only costs $30.
Granted HPs had other irritations like really wonky proprietary fan control (heck, proprietary fan cables too) that would mysteriously fail to function with different flavors of the same OS, and couldn't be overridden from the BMC (then what's the point!?). Also they were slow with EPYC options when that's what a lot of us really wanted to be switching for, the performance and value propositions were getting really good vs Intel who were also jerks.
Like lots of big players the experience may get different if you're buying hundreds to thousands or more units and have a dedicated account manager who takes care of all this for you etc etc. But x86-based servers is a pretty damn competitive market and at some point one has to stop and ask why hours are being burned futzing with stuff when literally the entire basic point of getting "server class" hardware with remote management functionality is to save man hours by NOT having to futz. So yeah there's a little rant I didn't even know I still had in me years later :). HP you silly goofs.
I've personally been involved in decisions like that, as well. If you're throwing stuff into a datacenter and the BMC doesn't work right[1], the hardware is basically a brick. Vendors should be blackballed for their incompetence.
[1]: all things i have personal experience with:
- chassis bootdev pxe => doesn't do that, just reboots to normal OS
- chassis power off => doesn't do it (oh, here's an ipmi raw command you can use for this BMC version. NO!)
- dhcp server sends an option in an offer it doesn't understand, drops the offer, no IP at all.
- gets scanned by auditor-checkbox-as-a-service (qualys), locks up and sends the host CPU into 100% and locks up that up too.
- Not supporting IPv6 properly (if there's a place to start deploying IPv6 it's BMCs), i.e. uses SLAAC properly, but doesn't use the gateway from the RA so you actually can't use it from outside it's own segment - needs a firmware update to fix but uhh, we didn't dual-stack the BMC network because the whole point is to get that IP space back.
We had to write a test suite for vendors to run against their BMC and validate these things and you were disqualified if you failed.
The device in question didn't had NTP so to get accurate BMC logs timing it had to be synced from host OS
BMC is them fucking up the watchdog implementation, that worked on wall time.
Which means if you changed time in-between OS sending watchdog ping and BMC checking for timeout, you got system reboot.
But it was intermittent enough that you might set the cron to update the BMC clock (because you want accurate time on it for log correlation) and be lucky enough time to not connect to the cause when it finally happens.
Also we generally observed that BMCs got better, 10 years server were lucky if it was working and not crashing every few weeks (all mostly on IBM gear), to the point we had to stop alerting on that, and it was unreliable as method for OOB access, so far anything newer than few years have been fine. Althought I have no idea how you make modern hardware take minutes to return list of sensors(I'm looking at you Lenovo)
I am honestly surprised how bad many of these are, and in production, no less.
I recently set up a supermicro system and spent a whole day just trying to figure out what to install to get the stupid ancient Java crap to load so I could mount an ISO.
There are reasons for tools like https://github.com/ixs/kvm-cli. It seems like every operations team built their own version that logs into the web interfaces, downloads the java stuff and then runs it locally...
We have various Supermicro boards in production at work with BMCs from 2018 or so. The ATEN iKVM on them works just fine with a recent OpenJDK 11 and OpenWebStart. I’ve found that all the features work including mounting ISOs and doing remote upgrades. No need to whine about installing anything ancient or spreading ridiculous Java FUD.
Sysadmin for 15 years here though, Java was always a problem. The version wasn't always the worst bit, but it was always an exercise in frustration.
Mostly security controls to blame, but I have had so many issues across so many systems that I cannot stand by and let you claim this is FUD about Java.
the .NET applets are also a problem (because who has a compatible IE version?), but they worked more consistently than the Java ones back in the day.
The HTML5 ones are the only ones that seem to work consistently; but that could be biased as HTML5 is much newer, so BMCs implementing that might be updated with more regularity. (or be more modern hardware)
With current browsers Java applets are not supported anymore. Some older HPE systems the didn't update the firmware to provide alternatives.
I've seen multiple vendors with problematic code that didn't work with newer Java versions.
This is by no means meant to bash Java. Some non-Java BMCs can be horrible as well (e.g. require many TCP ports in a firewall/tunnel unfriendly way or require SSH with old algorithms that are no longer enabled by default, or telnet..)
I‘ve really only had experience with Supermicro BMCs but I totally believe you that there are lots of crufty OOB environments in the wild which are hard to work with. While it’s true that applets don’t work anymore (probably a good thing), and therefore the experience isn‘t as integrated or seemless as it once was, it’s a practical matter to just log into the BMC and click on the console preview, using the JNLP file to launch the console via OpenWebStart as an independent application outside of the browser. One other thing is that the self signed certs from these older implementations are often expired and therefore throw an extra warning or two when you launch these interfaces, but you just click through them and carry on.
This is one of the reasons I like to architect networks for netbooting (so no remote media needed) plus force every physical server to boot UEFI-only - because UEFI supports serial console properly, unlike BIOS, so I can just use IPMI Serial-over-LAN support.
Combination of those two generally removes the need for any of the advanced features that required custom clients or even a Web browser
Even worse was one of mine last year that needed Flash. Apparently we neglected to update it. I can handle ancient Java, but trying to get Flash setup was going to be futile, so I just went to the data center.
That was an old Cisco server, right? I think those models still require Flash even if they're fully updated, and people have to use VMs with Flash installed to access the BMC.
Yep. I think you may be right, it’s EOL and probably doesn’t have any more updates available. I have a VM for when I need old Java, but I was going to need an older VM to run Flash, and that just wasn’t how I wanted to spend my time. :D
A few years ago i worked as a grunt for a fleet-wide ESXi upgrade and we took the occasion to update iDRAC. The number one step in the procedure was to reboot iDRAC, no matter how good its state looked. I have never ever seen both a high uptime and completely functional iDRAC at the same time, across 200~ servers.
Yep. And I still think I borked one iDRAC precisely because I ignored the reboot, because it worked fine on its identical twin sitting right next to it.
In proper environment the BMC is on its own dedicated NIC with no way to bridge to that network from the machine and only access from host machine being from root/admin account.
And then there are people that just port-forward BMC ports to the internet as cheap remote KVM...
The operative word being "proper". In practice I see it accessible on the LAN far too often. The ipmi v2 protocol is so bad that if you just request to login with a known account name (which is probably 'admin') the BMC server will _provide you the password hash_ for you to crack at your leisure.
IMO it's wrong - It's terrible product design that servers have network ports that will cause catastrophic failures unless carefully only connected to a special expertly secured un-network for fragile things.
It’s not wrong regardless of the BMC security. You don’t plug things into the network that don’t need to be plugged in. BMCs only need network access for servicing most cases.
As a big openbsd aficionado I always wish I could just run bog standard openbsd on my bmc's. ipmi does not do much for me so I would be fine if it were missing a good opensource ipmi stack. Most people would just want to run linux, but the point is, the bmc is just a small computer in charge of monitoring your big computer, however the bmc usually is unable to run an off the shelf operating system, so it is running an old proprietary version of linux with proprietary software. this situation sucks.
An idea I have floating around in the back of my head is to just use a raspberry pi as a bmc on consumer grade hardware, I am sure it would turn out more complicated than this, but basically just hook the power button, i2c, and other headers to the pi's gpio. now you have a bmc that runs an off the shelf os.
> An idea I have floating around in the back of my head is to just use a raspberry pi as a bmc on consumer grade hardware, I am sure it would turn out more complicated than this, but basically just hook the power button, i2c, and other headers to the pi's gpio. now you have a bmc that runs an off the shelf os.
BMCs generally run Linux and monitor the SoC functionality that the hardware is designed around. You need a vendor specific software stack for the hardware monitoring. The Rasp Pi is a toy.
On a Sun box we had, the system controller would panic the domains every so many days (I think ~700).
You could have rebooted the domains in the chassis for regular patches, but if you hadn't restarted the SC, you were in for a surprise.
Yes I remember something similar on Sun Fire 6800.
Another issue was a firmware bug on the Sun Netra X1 where rebooting or updating the lom would result in a reset of the host. Not fun with UFS without logging enabled.
I can't remember which direction this went, but I had a Netra T1 hooked to another Sun machine (240R? V440?, don't remember), and resetting one would send a break out on the serial console... which would send the other into the ok> prompt in OBP.
We finally got serial servers out of that one, though!
This is why staggered reboots of stuff, weekly or monthly, avoids this class of problem. It's simple and some might say it dumbs down the role of infrastructure management, but it sure as hell beats the feeling in the middle of the workday/workweek ... "It's lost grip. NFI what to do. Can't see anything. Reboot it FFS."
Bold of you to assume a regular reboot will also reboot the BMC (it won't although I guess there might be BMCs which do). Some things you really need a full cold boot. I've seen a test cluster of storage servers where after rebooting the whole cluster all in one go enough failed to boot that data was unavailable until a few servers were fixed due to flaky RAM that failed to make it past memory training on boot but was "fine" as far as we could tell until we rebooted.
I'd mostly agree with you, but this isn't always as simple as it may seem to "just reboot it" and there can be subtle differences between what you're exercising with your rolling reboots and what actually happens in a real complete power loss scenario. Plenty of stuff can break and you'll have no idea until you're trying to get stuff back up after a power loss event and you're up a creek.
BMCs have to be some of the most unreliable devices that I've worked with. Some of the issues I encountered at my last job:
* [ASRock BMC] The BMC firmware updater sometimes causes the NICs to get stuck in a bad state where every NIC has the same MAC address. This can be resolved via a proprietary UEFI application for reflashing the correct MAC address.
* [Dell iDRAC] Local authentication randomly stops working due to some tmpfs running out of space (can see the message if there's an active SSH session). IPMI/SSH occasionally works well enough to issue a reboot command, but when it doesn't, sending the BMC reset comand to /dev/ipmi0 in the server OS is needed.
* [Dell iDRAC] Setting an asset tag via the IPMI DCMI command has a 15 character limit. If that limit is exceeded, the success response is still returned, but when querying the asset tag, random junk data longer than 15 bytes is returned. If I had to guess, I bet there was an sprintf() call somewhere in there :). This was fixed in newer iDRAC firmware. Now, it stores the last 15 bytes of the asset tag instead of returning an error.
* [Lenovo IMM] The shift or alt key sometimes gets stuck on the emulated keyboard without having used the remote console since the last BMC reboot. Can't be fixed by repressing the button, neither physically nor via the virtual keyboard. BMC reboot required.
* [Lenovo IMM] Booting the UEFI shell sometimes crashes both the system and the BMC.
* [Lenovo IMM] BMC and BIOS update sometimes claims to have succeeded, but didn't actually take effect.
* [Lenovo IMM] Rebooting the BMC via the web UI or SSH sometimes doesn't work. Making 50+ simultaneous requests to the login page is enough to crash and restart some component that allows the BMC reboot command to work again though.
* [Supermicro BMC] The BIOS update sometimes doesn't fully upload, but claims that it did. It still parses the header, so the new/old version fields look correct. Sometimes, rebooting the BMC and reflashing works. Other times, only a USB drive + USB keyboard + a recovery key combination fixes it.
* [Supermicro BMC] The remote console sometimes completely fails to initialize (though I've only seen this on servers where the BMC uptime was measured in years). Not just a blank screen. The GPU device was just... gone.
* [Supermicro BMC] Various IPMI commands just lie about successful execution. For example, setting the asset tag via the FRU succeeds, but has no effect. Those commands require toggling a write lock bit via an OEM command, which I only found by reverse engineering. Other commands, like the set asset tag DCMI command, leave the data in a corrupted state until a BMC reboot.
And finally, not really a bug, but an interesting thing about the Lenovo IMM. Instead of exposing information via standard IPMI features, like FRU or DCMI commands, the Lenovo IMM implements a virtual filesystem over OEM IPMI commands. These are commands, like (my naming) open_ro, open_rw, get_size, read, write, and close. They sometimes fail too. I think I ended up making all commands retry up to 10 times with a 5 second delay. At least Lenovo gets error return values right :).
To query the asset tag, you have to open_ro the "config.efi" file, get the size (because read-until-EOF doesn't always work), do a read loop, and close the file. Then, you have the XML data from the file you can query (20 seconds later due to retries). (If anyone ever needs to deal with the Lenovo IMM programmatically, I'd highly recommend the pyghmi [1] library. Wish I knew about it before reverse engineering their proprietary commands...)
> * [Supermicro BMC] The BIOS update sometimes doesn't fully upload, but claims that it did.
Or uploads it to the WebGUI but balks with a different errors. Flash with SUM from SuperDoctor
SET IP=192.168.128.20
SET USER=ADMIN
SET PASS=ADMIN
.\SUM -i %IP% -u %USER% -p %PASS% -c UpdateBios --file E:\Shares\Temp\x10sle-f\bios.bin --force_update --reboot
.\SUM -i %IP% -u %USER% -p %PASS% -c GetBmcInfo
* [Supermicro BMC] BMC on a dedictated Ethernet port get moved to the Shared LAN port on the blade reboot. Which also means what if did shutdown the machine then you no longer can power up it remotely.
> * [Supermicro BMC] Various IPMI commands just lie about successful execution. For example, setting the asset tag via the FRU succeeds, but has no effect. Those commands require toggling a write lock bit via an OEM command, which I only found by reverse engineering. Other commands, like the set asset tag DCMI command, leave the data in a corrupted state until a BMC reboot.
we have Supermicro server with serial flashed to be something like 12345678 which means it probably doesn't work reliably on their production line either lmao
I’ve literally had the 2 problem before which is interesting, in my case I needed to completely de power the machine, a reboot of the bmc was not enough.
It appears that a large number of these are rebranded AMI MegaRAC software running on ASPEED processors, which are a little ARM chip with a virtual display card hanging off a PCIe x2 slot embedded into the mainboard.
AFAIK larger vendors like Dell and HP have their own thing.
Experiences with iDRAC 6, 7, 8 have been terrible. After high runtime they stop to respond via HTTP and SNMP, its just rubbish. Reboot of BMC helped Sometimes, sometimes only a powerdrain. Back then even the ProSupport could not do much, they support a system unsupported by their devs.
The latest iDRACs (with the fancy blue GUI) work a little better. I have no numbers to back all that, its just a feeling, maybe the problems are yet to come.
Given the Dell pricing, they should be better. But i've heard from collegues that ILO and others are not much better either.
A few of my recent systems have come with a built-in combo BMC on the motherboard NICs. I haven't seen this before, is it a new trend? I'm imagining they put in a switch in there with the NICs? How does this even work?
I'm doing a few timing sensitive projects involving hardware timestamping in the NICs. Does this mess with say timing variability? I've disabled the BMCs out of paranoia but I don't know the topology inside.
What I remember was losing it just at a crucial point in the PXE boot process. You're also dependent on the same switches as the main net, and have to rely on fragile firewalling or VLANs to restrict access to the BMCs.
Ah, those wonderful BMC that requires some obsolete version of Java... that doesn't work anyway because certificates are expired...
I also remember a long series of machines (Supermicro I think) whose BMCs would stop responding after a few weeks if there was too much traffic on the network.
Also those BMC that hijack silently eth0, breaking your server's bonded connections, when the main BNC connector get disconnected for some reason...
I've had multiple time to factory reset Dell iDrac when some commands where failing with never seen before internal errors. Thankfully you can factory reset but keep network config, so no need for physical access.
Not just BMCs...'modern' (I learned this 6-7 years ago) network cards are pretty much seperate computers that handle dataflow up and down it's stack.
We had IDSes that were happy, up, network interface counts were climbing, but they were STONE DEAF to the network traffic we were actually interested in.
We racreset our iDracs monthly. Seemed stupid when I started, but I'm grateful after working with them enough. Newer versions seem to be a bit better than 11th and 12th gen iDracs; but I would never rule out a racreset, even on the newest gen.
> Server BMCs are little computers running ancient versions of Linux with software that's probably terribly written and they stay running forever, which means all sorts of opportunities for slow bugs. Reboot away!
Such a silly comment. If your BMC is updated then it will use a recent version of Linux.
However, as this fleet was used for computation, after a while they noticed that whenever the BMC stopped working, performance of the system increased by almost 10% or so. Definitely a non-neglible amount. So they kept the machines running in the broken BMC state for as much as possible.