Thursday, December 23, 2004

Information management

The burst of information sources and raw data increases the need for high capacity storage devices. Storage takes two ways at the same time. The first approach is the local storage devices. The capacity of local storage grows with SIM cards of 256MB (http://www.m-sys.com/Content/Corporate/Press/prInfo.asp?id=701), MMC cards for smart phones with 1GB, Toshiba announced its 80GB 1.8” drive for iPod (http://www.pcworld.com/news/article/0,aid,118925,00.asp), PC Hard Drives of 250GB, and estimation of 1 TeraByte of memory on handsets by the next decade. The other approach is the distributed storage. In this category we have clusters, storage wrecks, GRID computing (distributed computing and joined resources management) and other remote storage techniques like web storage. In the centralized storage capacities we see capacities of several PetaBytes like in the CERN project.
So now to align with all the names of capacities, here is a basic guide to measures:

Each line we pass adds three zero digits to the number.

We start with Kilobyte that is 1,000 bytes.
Megabyte is 1,000,000 bytes (a common diskette is a little bit larger).
Gigabyte is 1,000,000,000 bytes (a little bit more than a standard CD’s capacity).
Terabyte is 1,000,000,000,000 bytes (4 big home hard drives).
Petabyte is 1,000,000,000,000,000 bytes (size of data in big research facilities).
Exabyte is 1,000,000,000,000,000,000 bytes (no current application)
Zettabyte is 1,000,000,000,000,000,000,000 bytes
And finally Yottabyte is 1,000,000,000,000,000,000,000,000 bytes

While I write this post Microsoft Word doesn’t recognize Petabytes (marks it as a spelling mistake) but who dealt with Gigabytes two decades ago?

Back to the subject itself, we understand the huge amount of data but where does it come from? New technologies enable the gathering of more information from research. There are more connected elements to networks like cell phones, computers, connected cameras, and machines like ATMs. More network elements means more accessibility and more need to trace all the traffic. All these come down to one result of huge amounts of data to be manipulated, analyzed and stored. Information is now stored in general places like with web access to mail, online calendars, Plaxo contacts sharing and therefore people publish more of their personal data. Once again personalization influences the change we experience in information management. The new approach of search engines even provides us a learning mechanism to adjust to our preferences as for the results we need. We can search, refine the search and store our results for further research and for future use. This field is yet to be published for the masses but you can find samples of this trend at: http://mysearch.yahoo.com/ , http://labs.google.com/personalized/ , http://a9.com/-/search/home.jsp?nc=1 as the major commercial projects in this field. Many features are still missing but it is a matter of evolution until we see real information management in its personalized form. There will be no dependency on storage, location or the form of the raw data. Each piece of information will be stored in a centralized online place, public and private by decision, and can be formed by many different pieces of information from different sources that will be compiled to one page tailor made to the request of the search. Knowledge management concepts are applied on the Internet. We need to make the data available anywhere, at any time, at any form, manipulated and analyzed. Only formatted information can be useful in the future when we have even more information and even more focused targets. We will need to get a very specific piece of information out of all the information available and we will need to get it in a very short time. Most of the information we will consume will be a puzzle made of existing information pieces ordered in a specific manner. The amount of data will continue to grow and the only mean to effectively overcome this potential obstacle is by improving the efficiency of our searches and not the number of the results.

Thursday, December 09, 2004

Information, technology, and human behavior.

We live in an era of information. Recent studies show that the rapid growth of the information capacity is several times larger than our ability to manage and analyze it. In the past, people had to be in direct contact with other people to get their opinions about different issues. With the appearance of modern printing and afterwards the radio and the programs that allow callers on air, we broke this basic assumption. Today we have the ability to publish any information at all without the need to print it and even without the need to pay for our words to be heard. We generate information, consume information, and gather information in capacities that were not feasible in the early days of the print. The ARPANET was, in fact, the new breakthrough in the way to the behavioral change. With each connected computer, mobile device, and tool we become more and more information dependant and our behavior indicates it. Contemplate of the difference between hearing the news by chance with a delay of several days from vagabonds to the present form of news consumption. We feel strange if the time interval between the news updates is longer than several hours. Newspapers became obsolete as news tellers in the digitized society and they still haven’t done the needed revolution to adjust. The behavioral patterns still change and from a form of getting the news by chance or in dictated hours like the newscasts, we became news consumers and we decide when we want to get updated with the news. Of course the example of the news is just a small example that emphasizes the behavioral change caused by the information. This evolving area will have significant role in the new technologies and inventions we see in the next several decades.

The main aspects of the information revolution as we experience it for several centuries already are:

1. Information management – gathering the information, stocking the information, and making it available to the designated people.

2. Information accessibility – the ways provided to access the information, to generate the information, and to influence the information and the data.

3. Information / data presentation – the means provided to view, listen and generally interact with the information.

4. Information analysis – tools to assist in filtering information, manipulate information, and provide the desired format for the consumption of the information.

The next four posts will discuss these aspects with general background information and several specific examples. These posts are crucial to better understand future progress in the field of information consumption, and the technologies that will be discussed in details in the future posts.