Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The machine in question has a zfs pool that it would be nice to sync, and a MySQL database running off one of the filesystems on that pool. So yes, all of this would be lovely.

But will it work?

You're sitting on an ssh session that's just about hanging on. There's no other way into the system (it's remote). As soon as you issue e (or definitely i) the shell dies, the ssh session dies and your reisub job gets killed, no?

I tried this:

  echo e >/proc/sysrq-trigger; echo b >/proc/sysrq-trigger;
but what I described above happened, and the box did not reboot.

(--edit-- deleted comment above had a perfectly reasonable explanation of a safe shutdown using the "Magic SysRq" stuff, but it applied to a machine you're at locally rather than the b0rked remote box I described)



  $ killall mysqld
  $ while pgrep mysqld; do sleep 5; done
  # waiting for mysql to shut down
  $ echo s >/proc/sysrq-trigger
  $ watch dmesg
  # wait for "sync complete" msg
  $ echo b >/proc/sysrq-trigger
  # or if you can force-reboot remotely
  $ echo o >/proc/sysrq-trigger
  # /proc will no longer be mounted after this


This would reboot it but might not achieve much-

  $ killall mysqld
  /usr/bin/killall - file not found
  $ while pgrep mysqld; do sleep 5; done
  /usr/bin/pgrep - file not found
  $ echo s >/proc/sysrq-trigger
  $ watch dmesg
  /usr/bin/watch - file not found
  $ echo b >/proc/sysrq-trigger
Unfortunately in this situation only the bash builtins seem to work. I even had an old system drive mounted under /altroot, but it wouldn't execute any of the commands in /altroot/bin or /altroot/usr/bin either (can't find something it needs to execute stuff. ld? libc? unknown).

So really only the 'b' seems to help. (edit: 's' is probably useful, even if you can't kill everything)

Of course the long term solution is to stop using a USB stick on an unreliable/badly supported USB3 add-on card for the root drive :)


Oh, I see. Okay then.

  $ pgrep() { for proc in /proc/[0-9]*; do read CMDLINE < $proc/cmdline; if [[ ${CMDLINE/*$1*/x} == x ]]; then echo ${proc#/proc/}; return 0; fi; done; return 1; }
  $ kill $(pgrep mysqld)
  $ while pgrep mysqld; do sleep 5; done
  $ echo s >/proc/sysrq-trigger
  $ while sleep 2; do dmesg | tail; done
  $ echo b >/proc/sysrq-trigger


I admire your tenacity, but I think sleep, dmesg and tail are all out as well!

Though all could b mitigated by leaving it for a minute or two at the relevant point, and manually checking for the mysql process every so often. I'll keep this on file for next time it happens (though I'm hoping to replace the dodgy parts at the weekend).


grumble grumble

  $ pgrep() { for proc in /proc/[0-9]*; do read CMDLINE < $proc/cmdline; if [[ ${CMDLINE/*$1*/x} == x ]]; then echo ${proc#/proc/}; return 0; fi; done; return 1; }
  $ kill $(pgrep mysqld)
  $ pgrep mysqld; echo $?
  $ echo s >/proc/sysrq-trigger
  $ while read LAST; do; done < /proc/kmsg; echo $LAST
  $ echo b >/proc/sysrq-trigger


You, sir or madam, are awesome and very resourceful. I'll use this one :)


If /lib/ld.so is hosed and you cannot run any dynamically linked program, then you might be able to start programs in /altroot by running something similar to...

  LD_LIBRARY_PATH=/altroot/lib:/altroot/usr/lib  /altroot/lib/ld.so /altroot/bin/sh
...so that you run the dynamic linker directly, as a wrapper program.


That's another good tip, thanks for that.

I'm hoping not to have this situation happen again, mind!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: