Sunday, April 6, 2008

Beyond desktop search...can we make user-PC data more valuable?

I ordered an 8GB flash disk last week (turns out there is a 16GB one around, but I am modest ;-) ). Since I don't have a whole lot of media content to put on this fat-stick, I will instead end up putting all my work over the past few years on it. If I factor in the emails, I should easily fill up 8GB with a couple of years' worth of data. Wow. I remember having a hard time filling a DSDD 5.25 inch 576 kB diskette back in the early 90s.

The low down is that we have *lots* of data. 7MP pictures, podcasts, email archives, documents and web-downloads, not to mention audio/visual media - all this can quickly add up. Fortunately storage has kept up, or perhaps the pace of storage encourages more data generation in the first place? Whatever the truth, we have a situation where we have a whole lot of data sitting in our computers.

There have been many instances of large volumes of data in non-PC computer scenarios. For example, real databases have routinely run into terabytes. The key difference between user-PC data and these databases is the heterogeneity and the lack of structure in the former. User-PC data comes in various formats and is generated by completely different applications. Even when the same application generates the data (e.g. an email mailbox file) the goal has never been to store the data in a way extract global information later.

Desktop search software is the first step in mining information from User-PC data. But search is really a very preliminary tool because it only flags the existence of the information sought via the specified key-words. There is very little cognizance of the bigger picture. Data mining - that power tool which works so beautifully for databases and other highly structured data - does not exist yet for User-PC data.

Isn't it time we started building algorithms beyond just search to help users extract useful information from their gigabytes of data?

3 comments:

Gee said...

Good blog ... but the key questions that needs to be answered will be ... what is "useful"?

KNowing that,it will be possible for most developers to develop a suitable data mining software but without knowing what each independent User of a PC needs the data for, it will be difficult to develop a suitable Data mining software ...
Till then, we will have to live with general word search tools ike Copernicus or Google for Desktops!

Sachin said...

The million (billion?) dollar question, what is useful, and how is desktop data useful?

I think that one great idea is composing data into more pertinent information. For example, a program that automatically takes your vacation photos, flight and hotel details, credit card spending, emails, facebook content, etc. and composes a report of your vacation.

Or perhaps an application that automatically computes tax returns of a small business based on the information on sales and purchases stored in the excel sheets of the transaction data of a shop computer.

These applications probably need a lot of intelligence - at least a lot more than simple search indices that run desktop searches. Still, a goal worth working towards.

Sachin said...

The million (billion?) dollar question, what is useful, and how is desktop data useful?

I think that one great idea is composing data into more pertinent information. For example, a program that automatically takes your vacation photos, flight and hotel details, credit card spending, emails, facebook content, etc. and composes a report of your vacation.

Or perhaps an application that automatically computes tax returns of a small business based on the information on sales and purchases stored in the excel sheets of the transaction data of a shop computer.

These applications probably need a lot of intelligence - at least a lot more than simple search indices that run desktop searches. Still, a goal worth working towards.