How does communication work in the Web? How can machines be able to participate in this kind of communication and to understand the information inside the Web in order to make the Web for humans more usable with this huge amount of information?
This is how language and communication works.
In the web, meaning is not formally represented but it is implicit given within the natural language of text of the web documents. HTML only denote the form and the representation of the document as well as the links among the documents as well the information within the document, but it does not denote and represent the meaning of the information which is rather important because the web is huge.
There are many problems that arise by the fact that the meaning is not represented in the web.
Information retrieval by traditional keyword-based Search has two problems. It
leads to many not relevant results
does not find all results
Secondly, it will not get all of the existing answers, because of synonyms and metaphors. Different terms may correspond to the same meaning (e.g. Latin name of a species) and things might also be described only as a metaphor. Another problems is language by itself due to its polysomy and the possibility to use different terms that have the same meaning.
currentlyby a human agent
Software agent does not have sufficient
Textual description gives information about image content, but if there is only the image that it is difficult to interpret its meaning. Consider now that there is an image with text. For humans it is mostly obvious as they can read the text, but machines can not easily read the text within a picture. This is why it is really difficult to get the information out of the web page.
Information extraction can currently only be solved by humans as they have an idea how the information is distributed over the page, because it is heterogeneous and has a particular order on the page. On the other hand a computer agent does not have sufficient information and knowledge of the Contexts, world, knowledge nor experience that the user has to interpret to solve the problem of information extraction.
Therefore, computer agents are not able to solve the problem of information extraction without having explicit formal semantics available.
Implicit knowledge means that information does not have specified explicitly, but must be derived via logical deductions from available information.
Natural language contains a lot of implicit knowledge.
Humans can take clues and out of these clues humans are able to derive by logical deductions new information from the available information.
This is only possible if the information that is available is formally and explicitly defined in terms that make it possible that the semantics can be readably understood by the machine. Otherwise it is not possible to deduce know knowledge from the available knowledge.
The more complex and voluminous a website, the more complicated is the maintenance of the only weakly structured data.
Syntactic consistency of a website means that for example a link to another website within a document and the the other website where it was linked to is moved away, i.e. the link address is changing. The old link is not pointing any more to the correct website and raises 404 Error. This error can be recognized from the HTTP Error.
On the other hand, more difficult to solve is the semantic consistency error. Semantic consistency means that if there is a link to another website and in the surrounding of this link it describes the Content of this website, but that changes. The semantic inconsistency is then due that the description of the first website does not fit the content of the second it points to because the content has changed. Here it is very difficult to detect that the change really means a change in the sense, in the meaning of the information that is represented. To resolve this issues is difficult to maintain in a timely way.
Search engines adapt their search results according to the log files of the user tracks. Cookies can be used to infer the user from previous visits. However simply looking up the log files does not take in account the Context of the user, e.g. does a Search for someone else with different intention. Therefore the problem is where to get the information about what the user really intends to do.