So one of the challenges in managing services is managing quality. When you can’t touch something, it’s hard to know whether you have gotten it right. Of course, there are some exceptions in which measuring quality is pretty straightforward. The easiest examples are settings in which “quality” maps seamlessly to reliability. When I turned on my TV, I expect that both my electricity and cable providers will deliver. When I pick up my land line phone, I expect a dial tone. As the New York Times reports (99.999% Reliable? Don’t Hold Your Breath, Jan 9), traditional phone providers are very good at being highly reliable:
AT&T’s dial tone set the all-time standard for reliability. It was engineered so that 99.999 percent of the time, you could successfully make a phone call. Five 9s. That works out to being available all but 5.26 minutes a year.
Against that backdrop, consider the recent troubles that Hotmail and Skype have had making sure their “utilities” continue to run. Is it reasonable to expect that modern communication and information technology will ever be as reliable as Ma Bell? Maybe not…
As more and more Web services companies acquire years of experience, we’ll see more consistent reliability — it’s just a matter of time and learning. Attaining Four-9s availability will become routine. That means available all but 52.56 minutes a year.
As for moving to 99.999, well, that may never come. “We don’t believe Five 9s is attainable in a commercial service, if measured correctly,” says Urs Hölzle, senior vice president for operations at Google. The company’s goal for its major services is Four 9s.
Google’s search service almost reaches Five 9s every year, Mr. Hölzle says. By its very nature, it is relatively easy to provide uninterrupted availability for search. There are many redundant copies of Google’s indexes of the Web, and they are spread across many data centers. A Web search does not require constant updating of a user’s personal information in one place and then instantly creating identical copies at other data centers.
There are several interesting parts to this. For one, we have to acknowledge that it is a little unfair to compare AT&T in its glory days before the Justice Department forced it to split up with Internet companies that have been around for barely a decade. In some ways, this seems like saying a veteran Major Leaguer can hit a curve ball better than a high school JV player. There is also the question of the complexity of the work. AT&T grew up handling just voice connections and eventually had to handle faxes. On-line provider may well have more data to handle as users move to video chats etc.
The biggest issue to my mind, however, is that Ma Bell use to control everything. Engineers knew the range of equipment were dealing with down to what kind of phones were in homes. As the article notes, today’s service providers aren’t so lucky. A provider like Skype can invest in sorts of redundancy in its operations but have all of that be invisible to the customer because they have a sub-standard ISP or a crappy router.