Created on Feb. 5, 2013, 9:55 p.m. by Hevok & updated by Hevok on May 2, 2013, 4:40 p.m.
One of the major problems of the World Wide Web (WWW) is how to
find particular information in the sea of data.
The Web provides a huge amount of Data. The visible part of Web that is searchable by Web Crawlers and indexed by Search Engines is enormous (over trillions of documents) but there is also Deep Web (Dark Web; user-restricted access, Intranet, etc.) which is estimated to be about 550 times bigger than Surface Web some time ago. The Web is immense huge and gets bigger with an increasingly pace. It is not just about the number of pages and blog posts and there like. It is becoming more and more the Internet of Things.
For machines is very hard to understand information in the WWW. They do not know what is important. What is important information. What information is real Content information and what is only advertisement? What does the information mean? Even if you know what does it mean, how credible/trustworthy is the information? Which parts of the Web sites do really belong together? What information of it is redundant?
Humans are fortunate as they are used to these types of analyses. Humans have the contextual knowledge about the world and Experience to solve such problems and to understand the information, but for a machine this is really difficult.
importantand how do you know?
credible/trustworthyis the information?
In the Web all information is encoded in HTML. HTML is a simple Markup language that puts the information on the Web and gives a presentation for it. It means it only decides how the information is represented, but does not describe how and what does this information really means. The problem faced is that on the Web there is mainly information about textual representation where the meaning is hidden in the language and the only language to put more information on the web (HTML) has the problem that it only contains information how the original information is represented and linked. That is it, there is almost no Representation for any meaning.
Briefly, the Web is based on the Markup language HTML. HTML describes, how information is presented and linked, but not what the information means.
The question that arises is what is the Meaning of the information provided in the Web. If we can not make sense of the meaning of the information in the Web we are completely lost and also Search Engines have their difficulty to cope with this vast increasing amount of information.
Comment on This Data Unit