Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Best quote: "Start surprising your Ops and Engineering teams by killing stuff in the middle of the day without warning them. They’ll love you"

It sounds stupid but if you really do have a resilient and redunant infrastructure it shouldn't matter. If you fear someone randomly unplugging things then you have work to do ;-)



Netflix (who also survived the outage) wrote a blogpost last year in which they talked about a system they call "Chaos Monkey" which does this exact thing.

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-u...


I don't think they meant that they were doing that on live systems. Sounds like a debugging tool.


If the point is "test your supposed redundancy," I agree 100%. But please, for your customers' sake, don't do it in the middle of the day, especially if it's your first test.


To be clear, he did say start practicing during planned maintenance times, but I agree: your first try on a live system shouldn't be during peak times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: