Friday, April 30, 2010

Untitled Poem

Among many other things, my father taught me how to read and write English. Everything I've ever written starts with what he taught me. Now as he lies dying of cancer, I wrote this for him. Say a prayer for him.


All of my thoughts
Like river drops
Together making up me
Like a river that flows
Until it throws
Fresh into the salty sea

All rivers meet that end
No matter what they pretend
Or how many bends they make
And so it will be
With every drop inside me
No matter what path I take

So you may ask
The point of the task
To meander toward the salty end

But don't we all know

Drops become vapor and snow

From which new rivers descend

Friday, January 8, 2010

Plug and play internal HDDs, literally!

I just saw this contraption on a colleagues desk. As you can see, a 3.5" HDD is literally plugged into the dock as if it were some super-sized memory card . Well, thats exactly what it is. The dock also has ports for USB keys, SD cards, and probably a few other formats.

Interesting to see the form factor difference between the SD slot and the 3.5" HDD slot. Flash memory capacity is quickly catching up with HDD capacity (the latter's lead has shrunk to only about ~10x). HDDs are endangered species!


Monday, December 14, 2009

India's Broadband Future

Ajit Balakrishnan, CEO, Rediff gave a keynote in IIT Delhi earlier today. His talk suggested that Indian telecommunication operators and the government should not be concentrating at delivering niche multi-Mbps broadband services but should instead concentrate on delivering reasonably good service (100s of kbps) to a larger population. Ajit flashed a slide which showed that 86% of 3G users use their smartphones to access their email, a relatively low bandwidth application, but only 6% use 3G to download and watch videos. Ajit's point was to recognize the importance of broadband as an "always on" connection rather than a high-bandwidth connection in India.

There is a analogue in India's history to this choice that Indian telecommunication operators and the government has to make. The government of India created top notch higher education institutes - IITs, RECs, and IIMs - in the 1950s (after Indian independence). It spends tens of thousands of dollars per year on each student enrolled in these institutes, arguably at the expense of thousands of primary education schools in backward areas of the country. The thinking at the time of creation of these institutes was that this creme de la creme would catalyze the growth of industry and technology in the country. Similarly, it may be theorized that by providing high-speed Internet connectivity, early adoptors will drive applications and create demand in the general population to upgrade their connectivity.

Countries like China or South Korea concentrated on their primary education institutions rather than creating world-class higher education institutes. It is safe to say that both these countries are significantly ahead of India, measured via any human development index. But does this analogy suggest that India should concentrate on democratization of (relatively low speed) broadband rather than creating small pockets of high speed broadband?

I think that the market forces will decide the balance between broadband services in India. The ARPU on low-speed broadband may not exceed $5, but this will be compensated via large volumes. I also believe that low-speed broadband will be served via wireless in India. With mobile phones outpacing fixed line connections by a 12:1 ratio in the country, there is limited scope for technologies like DSL to be widely deployed. Fortunately, 3G, LTE and Wimax are nicely poised to fill in for the lack of fixed line infrastructure in India. As for the niche multi-Mbps broadband, I expect FTTX being deployed in highly urbanized areas where western ARPUs (10s of dollars) are possible.

Sunday, December 6, 2009

Thermal imaging cameras at Bangalore airport!

Photo: Thermal imaging for Swine Flu screening at Bangalore International Airport

Arriving on an international flight at the Bangalore International airport, I was surprised to see two thermal imaging cameras. Each camera was looking at arriving passengers and visually marking those who had an elevated body temperature, in order to discern people who may be suffering from Swine Flu. These cameras are sensitive to IR heat radiation in the body temperature range. The cameras work by mapping temperature readings into a colormap that visually depicts body temperature. The video images produced by the cameras looked eerily similar to the IR images that the alien saw in the Predator movie series!

As compared to conventional body temperature measurements via thermometers, this real-time technique makes it possible for a medical officer to screen many more people. I wonder why these systems are not installed in other world airports.

Tuesday, November 24, 2009

Multiprocessing vs. Network I/O

I've been reading up on Python's (v2.6 and above) multiprocessing module. While multiprocessing has been around for a long time, simplified libraries like this multiprocessing module may spur even casual programmers to consider parallelism in their programs. My feeling is that if issues like inter-process communication, synchronization among processes, and deadlock avoidance are dealt with painlessly, then many non-professional programmers would feel confident enough to load up CPUs with programs with multiple processes to speed things up. Moreover, given that multiple CPU cores are becoming the norm rather than the exception on commodity hardware, there is a real incentive to eventually switch to multiprocessing.

What will this switch in program design mean for network data I/O? Will average users end up opening and using more network connections on average? Web browser tabs are a good example of multiple threads or processes. When modern browsers fire up they often connect to several websites saved from the previous session. I conjecture that multiple tabs fill up the network's queue faster than was possible with single core CPUs. Although Network I/O is much slower than CPU bandwidth (data rate at which CPUs process say, HTML), there is a point beyond which a single core CPU becomes the bottle-neck (e.g. firing a dozen browser tabs). But multiple cores remove this limitation and drive network I/O to its physical (or traffic-shaped) limits. I plan to measure this interplay between multiprocessing and network I/O. Watch this space!

Thursday, November 12, 2009

Free airport Wifi as a marketing tool

Google is offering free Wifi in 47 US airports during the holiday season The idea is to flash a few web pages marketing Google's software and services to users in return for free Wifi service. According to this CNN article, Google is not the only company to do so - apparently Lexus and Ebay have also implemented similar ideas, or intend do so in the near future.

Free service is probably going to bring a torrent of airport Wifi users online - probably many more than the current number of (paying) users. Given that Wifi Internet channel space is a shared resource, it will be interesting to see how airport Wifi scales with the up-tick in usage. I just hope that the service doesn't deteriorate so much that the sponsoring companys' well-meaning message is lost to disgruntled users. And I do hope that engineers running these Wifi access points have done the networking provisioning Math beforehand.

Now the economics. The sponsoring company (Google) is probably going to pay a lot less than the retail price of airport Wifi connectivity. Why? Because the sheer volume of users will be much higher than when users have to pay individually. I think that the payment will include a fixed component depending on the number of access points participating in the service, and a variable component depending on the number of users accessing the service.

Lets assume that an average airport has about 20 accessible Wifi access points. Each access point can support (with any reasonable quality of service) about 10 concurrent users. If the airport is busy for, say, 12 hours in a day, and further say that we assume an average utilization of 50% of the total capacity of the access points, then we have (per day)

10 * 20 * 12 * 0.5 = 1200 hours of usage per day per airport.

I would assume that the sponsoring company (Google) would pay about $5000 per day as a fixed cost and then about $1 per hour usage. This brings the daily total cost per airport for the sponsoring company to $5000+$1200 = $6200.

So for 47 airports and 50 holiday season days, we are looking at a bill of about

6200 * 47 * 50 = $14.57m

That's not a bad deal for a big company like Google, considering the number of eyeballs they will capture. Lets say a user uses the free Wifi for 30 minutes on average. So, we are looking at about 12*10*20/(1/2) = 4800 users per airport, per day. That works out to over 11m users in the 47 airports over the 50 day holiday period. Even if we assume that most people make round trips and therefore use the Wifi connection 2 times, Google can still reach about 5.5 million unique users! No too bad for the $15 million spent.

And I haven't even started counting the goodwill ROI bonus for playing Santa during holiday season! Nifty nifty marketing.

Friday, November 6, 2009

Call for action! Powering down PCs


I've been playing with the idea of building a PC application that measures a computer's idle time. The idea is to gently convince users to suspend or power-down their PCs when they are not being utilised. I strongly believe that if PCs are optimally powered down, then many users could cut energy consumption (and hence also save on energy bills). Powering down battery-powered laptops will also increase the longevity of batteries and thereby decrease toxic battery waste in landfills.

As an example of where the possible savings may be, above is a pie chart showing my own PC usage over the past few working days. As you can see, there is ample scope to power down/suspend PCs when they are idle.

If you want to contribute time to this project (coding/web page/translation into other languages/spreading the word), feel free to contact me. If not, then do suspend your PC every time you are away for more than a few minutes :-).

Friday, October 30, 2009

Impact of International Domain Names

On the 40th birthday of the Internet last week, The Internet Corporation for Assigned Names and Numbers (ICANN) formally announced that there would now be domain name support for non-latin character URLs. This concept, called international domain names or IDNs, will allow URLs composed from letters of scripts of languages such as Korean, Chinese, Hebrew, Arabic and Hindi.

A little digging on Wikipedia about IDNs reveals that the underlying implementation is based on translating unicode names into DNS-compatible (ascii) URLs and visa versa in order to keep the current DNS system functional. This makes the system backward compatible with currently deployed name resolution infrastructure. In fact most of the translation to/from the non-Latin scripts will be done on the users' browsers.

But what does this mean for the fabric of DNS address space and the web?


Dilution of Latin namespace (?) Will we see some dilution in the value of address real estate? For example will http://www.doctor.com become less valuable because folks in Germany can now remember it instead as the more meaningful http://ärzt.com (Ärzt is German for medical doctor)? And what of those tens of thousands of domain names registered from different languages in Latin script (e.g. http://naukri.com in India. Naukri in Hindi means job).

The registration rush Initially, web content providers will scurry to buy up non-Latin names. But this will be more important for those content providers who do not have a global brand-name, or have a brand-name that defines their product or service. For a content provider like doctor.com, it will make sense to buy the synonyms of "doctor" in other languages, in addition to the spelling of "doctor" in the other languages. On the other hand, Microsoft.com will only buy up the spelling "Microsoft" in the languages/scripts becoming available through IDNs. At the very least, I forsee most businesses re-evaluating their namespace position on the web.

Security and phishing Completely unrelated characters in different scripts can look the same to the human eye. This means that users can be tricked into thinking that the address displayed in the address bar points to a legitimate page when in fact it points to a phishing page. It may be prudent for businesses to be aware of these security vulnerabilities of their URLs and perhaps register "similar looking" URLs in other languages/scripts proactively.

Impact on search engines Search engines are known to weigh in address name strings in their ranking algorithms. This may need some re-thinking. At the very least, some search engines may need to use automated translators to link up semantically similar web pages irrespective of how the address space links different copies of the same information in different languages/scripts.

Saturday, July 25, 2009

Feel good sustainable energy video from RWE



This video from RWE, Germany's second largest electricity producer, is sweet. Too bad that in reality, Germany depends on polluting coal for much of its electricity.

Sunday, July 5, 2009

Pimped up Tom-toms...Netbooks+GPS: where is the value?

There is something funny going on in consumer electronics and personal computing nowadays. There is the tendency to combine different functionality into one all encompassing device. Management calls it convergence. Sales and marketing call it up-selling. The product guys call it how-to-keep-your-job-important. Consumers see it as a peculiar phenomena where devices become pricier even though they should be getting cheaper.

Here is a good example :

Smart phone = phone + camera + GPS + DVB-H TV receiver + memory stick + music player + ...

OK so I see some value in this smart phone convergence algebra. All the above services may be useful to a user and its great to carry them all in your pocket.

But it struck me as odd that Dell is planning to up-sell its Mini notebooks by charging users for extra GPS hardware. Apart from robotic aficionados who would want such GPS service on a netbook? The significant time it will take to power on a netbook and fire up the GPS would probably mean you've overshot a couple of highway exits before the GPS locks on and tells you where you are. And that will be after you've somehow placed the netbook over the dashboard for line-of-sight to GPS satellites, driving at 70mph. Netbooks with their limited battery lives would make for poor trekking-in-the-Rockies aids (and why would someone carry a netbook instead of a small portable tom tom up the trail anyway)?.

But the guys at Dell must have thought of all this. Even after you remove the management, sales, and product-team views on such matters, there must be someone who tried to write up credible use-cases. What is the killer-app use-case?

In my opinion, there is a very significant business-opportunity here. Dell wants to know where you are, in order to introduce location-based services (e.g. locality-aware advertising) in return for GPS map services. If you think about it, operators have an advantage over other IT vendors because they know (through cellular triangulation) where a user is. Hardware and software vendors would also want a piece of this action because of the significant scope for location-based advertising. GPS gives them that chance (and more accurate location information). Dell could know, through its GPS/map services, that its 13:00 on a Wednesday afternoon and I am sitting in a downtown Berlin park about 3 minutes walking distance from a restaurant serving schweine haxe and Berliner pilsner. I bet I'd take the bait of a 10% coupon to get there! And Dell would get a piece of the pork too.

Thursday, July 2, 2009

P2P, bandwidth, and FTTH urgency.


Figure 1 (from MPI-SWS) Bittorrent throttling by geographically spread ISPs. Red areas indicate ISPs throttling Bittorrent traffic.

This figure is from the Max Planck Institute for Software Systems' Glasnost project. It shows geographical regions where ISPs interfere with Bittorrent traffic. Comcast (and several other ISPs) claim that P2P applications of a few users slow down the Internet for all network users. All the bandwidth is used up by a miniscule subset of the subscribers, leaving everyone else with a slow network. Theres no reason to disbelieve this argument, a limited shared resource being over-used will result in poor quality for all users in the statistically multiplexed Internet.

There are 2 ways of dealing with the issue. Either bandwidth-hogging users are cut-off (like Bittorrent throttling), or, the network capacity is increased to accomadate the "over-use". The latter technique bailed us out the last time Internet traffic exploded. Broadband was roled out just as media rich Internet applications were catching on (or was it the other way?). Everyone was happy. Customers got better service for similar subscription costs, Web 2.0 companies got the pipes to deliver their content, and ISPs created the whole broadband business, with the option of up-selling through services like digital IPTV.

If it worked so well at that time, can't we do the same trick again? Why not roll out broader-broad-band: FTTH (Fiber to the home) for example? The simple answer is that we cannot, at least not quickly, given the cost. When broadband came the physical access network was already built! There were cable TV wires running to homes and there were phone lines. In terms of a tree analogy, the leaves of the access network were already connected up. All that remained to be done was to put in the trunk links and the branches. And there are a lot fewer branches than there are leaves. On the other hand, FTTH will be prohibitively expensive in many countries - the leaves need to be rewired. Therefore the rollout timeline is going to be slower as compared to broadband.

Back to P2P. Why single out P2P? Don't video CDNs like You-tube, Netflix, Hula etc. also consume large amounts of bandwidth? In my opinion the extra load on the access network imposed by P2P, due to the uploading aspect, creates a lot more congestion in the access network at present. A P2P system will upload (in theory) as much as is downloaded in the system. And all this happens on the access network. Thats a 2X increase in bytes traversing the most expensive component of the network (the edge). This means many many more expensive boxes to cover the leaves of the ISP's tree.

However, most broadband connections are asymmetric (downlinks have higher data rates than uplinks). So P2P is limited to a glass ceiling (the lower uplink bandwidth data rates). On the other hand, conventional CDNs push data down the wire, and so, there is no reason for CDNs to limit video quality and resolution until they hit the downlink rate. As high-quality video content catches on, there will be disgruntled users who wonder about the difference between what data-rate their subscription plan claims (XX Mbps) and what comes down the wire (XX/ZZ Mbps, ZZ being the over-subscription factor).

ISPs need to hurry up fiber-wiring up those leaves! And governments need to help with the capex! Another stimulus perhaps?

Sunday, June 21, 2009

Twitter's Value Proposition



Figure 1: Top-20 query set similarity, or, what fraction of the top-20 queries, separated by a time lag (X-axis) are the same. (Click figure to enlarge).



Figure 2: Frequency of occurrence of a query name vs. the number of unique query names in the top-20 (hourly) query sets over 6 months. About half the queries appear twice or once (Click figure to enlarge)



Twitter has become the latest darling Silicon Valley start-up. What started off as a seemingly incremental idea ("micro-blogging" in 140 characters or less) seems to have caught on big time - more people are twittering than ever before. Twitter has shown its prowess in everything from influencing the American presidential election to challenging Iranian theocracy. Its popularity makes it a very compelling service, but how can it make money for its founders, Evan Williams and Biz Stone, and its promoters?

Needless to say, many Twitter posts (tweets) are inane (A dog bit me) instead of newsworthy (I bit a dog). Many users underestimate the difficulty in producing a constant stream of interesting 140-character long information texts from their everyday lives and experiences. Fortunately, Twitter comes with a search engine optimized to index tweets in real-time. So users can query Twitter for useful information within the deluge of tweets being posted every moment. This makes Twitter a real-time application and a perfect vehicle for propagating news on the Internet. Product releases, reviews, security bugs and vulnerabilities, company press releases, executives' and analysts' statements, etc. previously had to wait for a search engine to crawl and index them on the Internet (a lag of weeks sometimes). But because the tweets first come to Twitter directly from users, these announcements are instantly indexed and available for search via Twitter's search engine.

Twitter provides a great API to study queries returned by Twitter over a time period. The API allowed me to download information about the top-20 most-popular queries submitted to Twitter in every 1-hour interval over 6 months. Parsing this information sheds light on what users search for when they go online looking for time-sensitive information. It also suggests ways of monetizing this vast treasure of users' mind-space - what they are think, search, and find as time goes by.

Figure 1 shows top-20 query name set similarity, or, what fraction of the top-20 queries, separated by a time lag (X-axis) are the same. Twitter reported the set of 20 most-popular query names in each hour. The figure is plotted by finding the fraction of common elements between any two such sets separated by a certain lag (X-axis). The similarity quickly dies within 48 hours, and after 14 days, it settles at about 0.1 (meaning only 2 of 20 query names remain the same between the compared query name sets). There is also a noticeable drop in similarity at 24 and 48 hours (probably due to periodicity effects). Also note the diurnal bumps. Why does that happen?

Figure 2 plots the frequency of occurrence of a query name vs. the number of unique query names in the top-20 (hourly) query sets over 3 months. The majority of top-20 queries captured users' interest for a very few hours. Only about 20% of the queries remained in the top-20 lists for more than 10 hours in the 6 month period.

But what does all this mean for making money via Twitter. Well, here are my 2 cents:

1. Marketing benchmarks: Companies can use Twitter query-popularity to measure their marketing success. Say, how much has an advertising campaign been able to enter a customers mind measured in terms of query frequency (say as compared to their competitors). Twitter can develop and sell analytic tools for companies to measure such stuff. In my opinion, looking for this information in user queries is more effective than looking in the tweets themselves because the latter can be gamed (spam tweets) and because tweeting users are still a minority of people passionate about posting messages (as compared to the silent majority that does not post). Monitoring product mentions in query names is also a great way to keep tabs on marketing success. For example, in Figures 2 and 3, a product name occurring frequently is good news, but if the rate of occurrence decreases over time then its time to launch more marketing efforts or to improve the product's visibility in some way.

2. Real-time Customer feedback: Product groups can use Twitter query information to pinpoint product bugs, fast. There is a certain cost for a user to go on the Internet and search Twitter for say, "iPhone screen blank". If such a query bubbles up in query popularity (say, a top-XX query), then the bug is most certainly a widespread issue. Twitter's real-time feature highlights problems very quickly and efficiently. Selling such product specific query information to companies may create a nice revenue stream for Twitter.

3. Keyword Analytics: The occurrence of a product name with another query word may signal a selling opportunity. For example, the query "iPhone anti-virus" may point to market demand for Apple to sell anti-virus software with its iPhone. Keyword analytics can also be used to help in choosing keywords for online advertising.

4. Risk Management: Twitter quickly captures the viral spread of information on the Internet. This could allow a company to react to, say, a malicious video posted about its product on You-tube. Twitter is your fast-response Internet guardian. For example, Twitter can offer a service to subscribed companies that notifies them about any information (positive or negative) gaining traction on Twitter.

And finally, here is a list of Twitter search strings that were in the 20-most popular search lists for 100 hours or more (over a 6 month period). Funny how Apple dominates the top-3 slots, and then there is AT&T on the 4th slot (probably due to selling the iPhone in the USA). Hats off to Apple's marketing to have captured so much of users' mind-space. Or are they gaming Twitter? Or are Twitter users disproportionately Apple fans? Or is Twitter the newest Apple rumor spreading mechanism?


iphone 3266
ipod 1715
apple 1711
at&t 1253
itunes 981
goodnight 975
tweetdeck 881
vegas 680
bbc 674
new york 645
texas 643
obama 529
star trek 503
gaza 491
lakers 485
susan boyle 484
swine flu 460
sxsw 452
slumdog millionaire 448
watchmen 437
iranelection 427
h1n1 398
dollhouse 394
american idol 382
tgif 364
musicmonday 359
easter 354
wolverine 331
kobe 330
adam lambert 320
windows 7 320
valentine's day 281
aig 275
tehran 271
swineflu 267
bsg 259
sydney 256
super bowl 251
starbucks 242
palm pre 224
twilight 223
christmas 223
austin 217
mexico 216
coraline 212
california 212
cavs 206
nba 205
oprah 204
ces 198
ncaa 197
follow friday 195
spring 194
iran 192
wii 190
transformers 2 188
188
miami 186
bush 183
paris 179
mousavi 177
snl 172
social media 171
spotify 170
superbowl 164
jimmy fallon 164
diddy 164
imax 160
hamas 156
inauguration 154
heroes 154
twestival 153
blackberry 149
brazil 148
macworld 147
earth hour 145
florida 144
lebron 143
happy new year 143
chuck 142
safari 4 138
snow 138
mj's 137
benjamin button 136
bing 136
chelsea 133
chris brown 133
gran torino 130
ellen 127
kris allen 126
nascar 125
teaparty 125
grey's anatomy 124
wimbledon 124
president obama 123
canucks 122
uksnow 122
ted 122
fridays 121
therescue 119
michael 118
michael jackson 118
conan 117
oscars 116
liverpool 116
celtics 115
angels & demons 113
march madness 111
denver 111
blackout 110
jay-z 109
google wave 109
dallas 107
memorial day 106
xbox live 104
gmail 102
player snapshot 102
north korea 102
mardi gras 102
melbourne 101
french 101
father's day 100

Monday, June 1, 2009

Sun setting over the German countryside

Took this one while sitting in the ICE from Cologne to Berlin, between Wolfsburg and Berlin Spandau railway stations. 01.01.2009, 21:01 CET.

The hope of green power ?

Is that Sun driving the wind driving the power-generating windmills producing electricity to power this ICE train cruising at 180 kmph? Please let this be the future for all our sakes.

Sunday, May 3, 2009

Is P2P dead?

There have been some significant setbacks for P2P in the past year or so.
  1. Pirate Bay's founders are in jail in Sweden for abetting illegal file sharing on their website.
  2. Joost, the much lauded P2PTV service, is no longer P2P but is instead a CDN-type streaming service.
  3. Another P2P darling, Skype, seems to be adrift with Ebay wanting to get rid of it through a sale or a spinoff.
  4. There is a sustained reduction in CDN costs that is making the P2P cost reduction less attractive.
  5. Websites like You-tube won the video streaming battle against P2P video streaming long ago, now websites like Rapidshare are leaving P2P file sharing behind as well.
  6. Mobile P2P (near-network P2P on mobile devices over Bluetooth etc.) just didn't happen. These devices are more client-serverish than wired devices because upload bandwidth needed for P2P is too pricey.
So the key question is whether P2P is dead.

In my opinion, the answer is No. Here are some reasons
  1. P2P lacks a business model but has proven to be a remarkably resilient and cost-effective technology. The problem is getting legal content on to P2P networks. Content companies are not going to let users take control of content delivery.
  2. But if one is to look at where the biggest growth in broadband usage is going to be, one looks toward China and India. The legal protection for content is significantly weaker in these countries. Moreover, there is a large amount of reasonably priced content (e.g. regional and Bollywood content in India) that will perfectly ride P2P networks.
  3. P2P has proven itself for voip (Skype has 400m users). Skype is the established voip leader and it will remain that way for a long time.
  4. CDNs do not scale with video quality. That is why Youtube won't do HD - they'll go broke paying for CDN (server) bandwidth. P2P on the other hand can scale up to the extent the access networks allow.
What we may see is an amalgamation of CDNs and P2P technologies. For example, using CDNs for paid content and P2P for promoting the paid content (for free). Cache every recent movie's first 10 minutes on a user's computer using P2P, and then stream the content the user selects via a CDN. Although content pricing can recover CDN distribution costs, the monetary transaction only happens after a user selects to watch. Prior marketing can be P2P.