Problems of the Web

Created on Feb. 5, 2013, 10:12 p.m. by Hevok & updated by Hevok on May 2, 2013, 4:40 p.m.

How does communication work in the Web? How can machines be able to participate in this kind of communication and to understand the information inside the Web in order to make the Web for humans more usable with this huge amount of information?

This is how language and communication works.

In the web, meaning is not formally represented but it is implicit given within the natural language of text of the web documents. HTML only denote the form and the representation of the document as well as the links among the documents as well the information within the document, but it does not denote and represent the meaning of the information which is rather important because the web is huge.

There are many problems that arise by the fact that the meaning is not represented in the web.

  1. Information retrieval by traditional keyword-based Search has two problems. It

    • leads to many not relevant results

      • different meanings
      • polysomy
      • different Contexts
    • does not find all results

      • synonyms and metaphors
      • missing context definition

    First because of the lack of explicit formally information and semantics, all the search results are spoiled by answers that are not really related meaning to the intended meaning of the Search.

    Secondly, it will not get all of the existing answers, because of synonyms and metaphors. Different terms may correspond to the same meaning (e.g. Latin name of a species) and things might also be described only as a metaphor. Another problems is language by itself due to its polysomy and the possibility to use different terms that have the same meaning.

  2. Information extraction

    • can only be solved currently by a human agent
    • heterogeneous distribution and order of information
    • Software agent does not have sufficient

    Textual description gives information about image content, but if there is only the image that it is difficult to interpret its meaning. Consider now that there is an image with text. For humans it is mostly obvious as they can read the text, but machines can not easily read the text within a picture. This is why it is really difficult to get the information out of the web page.

    Information extraction can currently only be solved by humans as they have an idea how the information is distributed over the page, because it is heterogeneous and has a particular order on the page. On the other hand a computer agent does not have sufficient information and knowledge of the Contexts, world, knowledge nor experience that the user has to interpret to solve the problem of information extraction.

    Therefore, computer agents are not able to solve the problem of information extraction without having explicit formal semantics available.

    Implicit knowledge means that information does not have specified explicitly, but must be derived via logical deductions from available information.

    Natural language contains a lot of implicit knowledge.

    Humans can take clues and out of these clues humans are able to derive by logical deductions new information from the available information.

    This is only possible if the information that is available is formally and explicitly defined in terms that make it possible that the semantics can be readably understood by the machine. Otherwise it is not possible to deduce know knowledge from the available knowledge.

  3. Maintenance

    • The more complex and voluminous a website, the more complicated is the maintenance of the only weakly structured data.

    • Problems:

      • syntactic and semantic (link) consistency
      • Correctness

    Syntactic consistency of a website means that for example a link to another website within a document and the the other website where it was linked to is moved away, i.e. the link address is changing. The old link is not pointing any more to the correct website and raises 404 Error. This error can be recognized from the HTTP Error.

    On the other hand, more difficult to solve is the semantic consistency error. Semantic consistency means that if there is a link to another website and in the surrounding of this link it describes the Content of this website, but that changes. The semantic inconsistency is then due that the description of the first website does not fit the content of the second it points to because the content has changed. Here it is very difficult to detect that the change really means a change in the sense, in the meaning of the information that is represented. To resolve this issues is difficult to maintain in a timely way.

  4. Personalization

    • Adaptation of the presented information content to personal requirements
    • Problems:

      • from where do we get the required (personal) information?
      • personalization vs. data security

Search engines adapt their search results according to the log files of the user tracks. Cookies can be used to infer the user from previous visits. However simply looking up the log files does not take in account the Context of the user, e.g. does a Search for someone else with different intention. Therefore the problem is where to get the information about what the user really intends to do.

Chiefly, the Web reaches its limits (Limits of the Web) by at least 4 major problems areas:

  1. Information Retrieval
  2. Information Extraction
  3. Maintenance
  4. Personalization

The Semantic Web is one way to express formally the meaning giving in the Web and has the potential to elegantly solve this problems.

problems.jpg

Tags: case, challange, trouble, meaning, matter
Categories: News
Parent: Web

Update entry (Admin) | See changes

Comment on This Data Unit