Information management
The burst of information sources and raw data increases the need for high capacity storage devices. Storage takes two ways at the same time. The first approach is the local storage devices. The capacity of local storage grows with SIM cards of 256MB (http://www.m-sys.com/Content/Corporate/Press/prInfo.asp?id=701), MMC cards for smart phones with 1GB, Toshiba announced its 80GB 1.8” drive for iPod (http://www.pcworld.com/news/article/0,aid,118925,00.asp), PC Hard Drives of 250GB, and estimation of 1 TeraByte of memory on handsets by the next decade. The other approach is the distributed storage. In this category we have clusters, storage wrecks, GRID computing (distributed computing and joined resources management) and other remote storage techniques like web storage. In the centralized storage capacities we see capacities of several PetaBytes like in the CERN project.
So now to align with all the names of capacities, here is a basic guide to measures:
Each line we pass adds three zero digits to the number.
We start with Kilobyte that is 1,000 bytes.
Megabyte is 1,000,000 bytes (a common diskette is a little bit larger).
Gigabyte is 1,000,000,000 bytes (a little bit more than a standard CD’s capacity).
Terabyte is 1,000,000,000,000 bytes (4 big home hard drives).
Petabyte is 1,000,000,000,000,000 bytes (size of data in big research facilities).
Exabyte is 1,000,000,000,000,000,000 bytes (no current application)
Zettabyte is 1,000,000,000,000,000,000,000 bytes
And finally Yottabyte is 1,000,000,000,000,000,000,000,000 bytes
While I write this post Microsoft Word doesn’t recognize Petabytes (marks it as a spelling mistake) but who dealt with Gigabytes two decades ago?
Back to the subject itself, we understand the huge amount of data but where does it come from? New technologies enable the gathering of more information from research. There are more connected elements to networks like cell phones, computers, connected cameras, and machines like ATMs. More network elements means more accessibility and more need to trace all the traffic. All these come down to one result of huge amounts of data to be manipulated, analyzed and stored. Information is now stored in general places like with web access to mail, online calendars, Plaxo contacts sharing and therefore people publish more of their personal data. Once again personalization influences the change we experience in information management. The new approach of search engines even provides us a learning mechanism to adjust to our preferences as for the results we need. We can search, refine the search and store our results for further research and for future use. This field is yet to be published for the masses but you can find samples of this trend at: http://mysearch.yahoo.com/ , http://labs.google.com/personalized/ , http://a9.com/-/search/home.jsp?nc=1 as the major commercial projects in this field. Many features are still missing but it is a matter of evolution until we see real information management in its personalized form. There will be no dependency on storage, location or the form of the raw data. Each piece of information will be stored in a centralized online place, public and private by decision, and can be formed by many different pieces of information from different sources that will be compiled to one page tailor made to the request of the search. Knowledge management concepts are applied on the Internet. We need to make the data available anywhere, at any time, at any form, manipulated and analyzed. Only formatted information can be useful in the future when we have even more information and even more focused targets. We will need to get a very specific piece of information out of all the information available and we will need to get it in a very short time. Most of the information we will consume will be a puzzle made of existing information pieces ordered in a specific manner. The amount of data will continue to grow and the only mean to effectively overcome this potential obstacle is by improving the efficiency of our searches and not the number of the results.
