Tuesday, November 24, 2009

Multiprocessing vs. Network I/O

I've been reading up on Python's (v2.6 and above) multiprocessing module. While multiprocessing has been around for a long time, simplified libraries like this multiprocessing module may spur even casual programmers to consider parallelism in their programs. My feeling is that if issues like inter-process communication, synchronization among processes, and deadlock avoidance are dealt with painlessly, then many non-professional programmers would feel confident enough to load up CPUs with programs with multiple processes to speed things up. Moreover, given that multiple CPU cores are becoming the norm rather than the exception on commodity hardware, there is a real incentive to eventually switch to multiprocessing.

What will this switch in program design mean for network data I/O? Will average users end up opening and using more network connections on average? Web browser tabs are a good example of multiple threads or processes. When modern browsers fire up they often connect to several websites saved from the previous session. I conjecture that multiple tabs fill up the network's queue faster than was possible with single core CPUs. Although Network I/O is much slower than CPU bandwidth (data rate at which CPUs process say, HTML), there is a point beyond which a single core CPU becomes the bottle-neck (e.g. firing a dozen browser tabs). But multiple cores remove this limitation and drive network I/O to its physical (or traffic-shaped) limits. I plan to measure this interplay between multiprocessing and network I/O. Watch this space!

No comments: