Saturday, July 5, 2008

Unlearning Google

Figure 2: Light tailed website usage (Click to enlarge)
Figure 1: Google vs. Non-Google properties in my Firefox history (Click to enlarge)

Google's Viacom fiasco is an ominous wake-up call for anyone who cares about his or her online privacy. Today Viacom, tomorrow some other company, another day a government, can arm-twist Google into giving away log data containing user names, IP addresses, keywords, watched content, mouse-clicks, email, and any other information that Google collects.

So far, Google has only used user data for directed marketing. At least it is only about wringing money out of people's thoughts and desires through the ad sense infrastructure. The problem is, the same data can be easily massaged into revealing political, ethical, racial, religious, sexual, and other personal leanings of a person. There may be money to be made out of this data as well, but more importantly, there is the real danger of misusing this information as a pretext for prosecution or blackmail.

Google publicly defends its privacy record. Unfortunately, user privacy is not the most important objective for a publicly traded company. It is shareholder value. And to create shareholder value, a company needs to survive. A determined government can easily make the survival of a company subject to compliance with the government's wish. Google says it "Does no evil". Trouble with this slogan is, who decides what "evil" is?

Another scary scenario can be built around theft of sensitive user data. The media reported that Google is handing over 4TB of You-tube log data to Viacom. Now 4TB is a substantial, but not a lot for future data storage technology: We may have 4TB USB pen drives within the next 5 years. What if one disgruntled employee smuggled this data out of Google and auctioned it off to blackmailers for a few hundred grand?

No easy answers here.

I can keep ranting about Google and privacy and all that, but I am writing this blog on Google property (Blogger)!!! My wife and I are avid Gmail and Orkut and Google Reader and Google search and Google news users. Are we toast? Or, can we wean ourselves from Google?

I parsed our Firefox history over a few weeks to figure out where we stand in terms of Google-to-non-Google websites visited in order to get an idea of our Google dependence. The results are not pretty. Google properties accounted for just over 50% of all the websites visited (Figure 1).

Fortunately, there are non-Google alternatives to all Google applications. So in theory we can start using other applications instead of Google. Off course, there is nothing to guarantee that other websites will not yield to the same pressures as Google. But at least we can spread our web footprint - one entity will not have a complete view of a our web presence as Google does today.

The Firefox history indicated that we visit a few websites often and the rest are rarely visited (Figure 2). The often-visited websites were the usual suspects - search, web-mail, social networking, blogs, and news - and Google dominated this space. This is a great sign because it shows that even though Google is big in terms of visits, it is not very heterogeneous in the content/services it offers. Google is not my bank, not my bookstore, not my voip provider, not my university, and not my community. In fact, if I remove the top-6 Google properties from the data then the distribution starts looking much more uniform. My web log data spread on heterogeneous websites. Doesn't this flavor of obfuscation help privacy?

There may still be hope for privacy on the Internet.

No comments: