Saw this Netflix Tech Blog post and wanted to share it with everyone:
The piece contains an impressive narrative of how to do troubleshooting right. It’s an example of the difference a formal education in Computer Science can make. Not for the content (we all know that the actual content of any given CS course was probably obsolete before the first lecture) but for the methods to apply.
If you read the Netflix post and got lost about halfway through, then read it again. I think that understanding the problem presented in this case, and the process used to achieve a solution, is a real measure of where each of us may be as system administrators or developers. If you ask yourself the question, “Could I have done this?”, and your answer is anything less than an automatic, “Of course!”, you really need to dig in and grapple with it, both for your own good and that of your employer(s). I know I did. In fact I’m still at it. The story here isn’t that the Netflix guys were surprised by this mystery, but (a) that they were in a position to detect it; and (b) were able to apply the scientific method to unravelling it.
Thanks once again to the Netflix team for giving us yet another case study in what excellence in systems analysis looks like.