SemanticWebTechnologies

Web

The World Wide Web enabled documents to be connected by links. Hypertext documents contain Links to other documents and files, which make their access trivial.

The Web 2.0 enabled simplified user participation. Wikis, Blogs, and Social Networking Sites allow non-technicians to publish their information. User are becoming publishers.

Internet of Things

The Internet of things is the next big thing in the Internet. The number of things connected to the Internet dramatically exceeds the number of people on earth. The next logical step in the technological revolution connecting people anytime, anywhere is to connect inanimate objects. This is the vision underlying the **Internet of things: anytime, anywhere, by anyone and anything. And all things will connect with social networks.

Thus, the vision underlying the "Internet of Things" is to connect inanimate objects This is the next technological step connection data anytime, anywhere.

Limits of the Web

One of the problems of the World Wide Web is how to find particular information in the sea of data.

The Web is immense huge and gets bigger with an increasingly pace.

The visible part of web that is search-able by crawler is enormous but there is also Deep Web (Darkweb) which is estimated to be about 550 times bigger than Surface Web.

For machines is very hard to understand information in the WWW. They do not know what is important. What is information, what is advertisement What dies the information mean How credible/trustworthy is the information? Which parts of the web sites really belongs together? What information of it redundant?

Humans are fortunate as they are used to these types of analyses. Humans have the contextual knowledge about the world and experience to solve such problems and to understand the information. But for a machine this is really difficult.

In the web all information is encoded in HTML. HTML is a simple markup language that puts the information on the web and gives a presentation for it. It means it only decides how the information is represented, but does not describe how and what does this information really means. The problem faced is that on the web there is mainly information about textual representation where the meaning is hidden in the language and the only language to put more information on the web (HTML) has the problem tha t it only contains information how the original information is represented and linked. That is it, there is almost no representation for any meaning.

The Web is based on the markup language HTML. HTML describes, how information is presented and linked, but not what the information means.

The question that arises is what is the meaning of the information provided in the web. If we can not make sens of the meaning of the information in the web we are completely lost and also search engines have their difficulty to cope with this vast amount of information.

Importance of Meaning

The problem is to cope with the enormous big Internet. to make sense out of it. There is no way to express the meaning of information, because of HTML, the main language to represent Content in the web, is not able to formally express meaning, i.e. Semantics.

Semantics

Semantics is the study of interpretation of signs or symbols. Semantics (from greek means pertains to the character, the study of meaning) is part of the linguistics focused on

  • Sense and
  • Meaning

of language or symbols of language. So it is the study of interpretation of signs or symbols as used by agents or communities with particular circumstances and contexts.

Semantics asks, for example, how sense and meaning of complex concepts can be derived from simple concepts based on the rules of syntax.

The semantics of a message depends of its context and pragmatics.

Therefore semantics means meaning.

Syntax

Syntax is the definition of normative structure of data. Syntax from greek syntaxis* means Arrangement, Ordering) as in grammatics denotes the study of the principles and processes by which sentences are constructed in particular languages.

  • In formal languages, syntax is just a set of rules, by which well formed expressions cna be created from a fundamental set of symbols (alphabet + extra characters).
  • In computer science, syntax defines the normative structure of data

Syntax tells how to form information ad ow to make valid sentences in a specific language. Syntax comes before Semantics. Semantics tells the meaning of the infromation that is pertained in the sentences of the syntax.

Context

Context is the denotion of all elements defining the interpretation of communicated Content. Context lat. contextus* = interweaved) denotes the surrounding of a symbol (concept) in an expression respective its relationship with surrounding expressions (concepts) and further related elements.

Contexts denotes all elements of any sort of communication that define the intepretation of the communicated content, as e.g.:

  • general contexts: place, time, interrelation of action in a message
  • personal or social contexts relation between sender and receiver of a message

This information determine how the information has to be interpreted, ie.e. how the semantics is meant.

Context is important to make sense of Semantics and to determine the meaning.

Pragmatics

Pragmatics is the study of applying language in different situations. Pragmatics (greek pragma = action) reflects the intention by which the language is used to communicate a message.

In linguistics pragmatics denotes the study of applying language in different situations. It also denotes the intended purpose of the speaker. Pragmatics studies the ways in which context contributes to meaning.

In the end it also tells how the Semantics is manifested in away. Context and Pragmatics determine the Semantics.

Every act of communication follows or pursues a certain purpose with this acting of communication, that is pragmatics.

Experience

Experience considers all information that one has learned and put in context with the world one is living in.

For successful communication,

  • information has to be correctly transmitted (Syntax)
  • the meaning (Semantics) of the transmitted information must be interpreted correctly (=understanding)
  • understanding depends on
    • the Context of both sender and receiver and
    • the Pragmatics of the sender
  • the Context of the sender and receiver depends on
    • the experience (knowledge of the world) of both sender and receiver

Therefore, while Syntax determine what is a valid and well formed sentence, the meaning is important for understanding the message. The Pragmatics is what purpose the sender tried to achieve by choosing this form of message.

Experience considers all information that was learned and put in Context with the world one is living in. But for successful communication information has to be transmitted and interpreted correctly.

Semantic Triangle

Syntax, Pragmatic and Context put together are really mandatory and essential for understanding the meaning of information, i.e. Semantics

A Symbol stands for an Object in the real world. The symbol itself can be ambiguous as it may denote to several objects/things. The symbol symbolizes a Concept. A Concept refers to an Object.

Sender has to give sufficient information to disambiguate the Semantics for the receiver oo make it clear and the symbols Concept unique. Therefore it needs Context and Experience. to put the Context and the Semantic in the right way to perceive the meaning of the message and to understand each other.

Problems of the Web

How does communication work in the web?

How can machines be able to participate in this kind of communication and to understand the information inside the web in order to make the web for humans more usable with this huge amount of information.

This is how language and communication works.

In the web, meaning is not formally represented but it is implicit given within the natural language of text of the web documents. HTML only denote the form and the representation of the document as well as the links among the documents as well the information within the document, but it oes not denote and represent the meaning of the information which is rather important because the web is huge.

There are many problems that arise by the fact that the meaning is not represented in the web.

  1. Information retrieval by traditional keyword-based Search has two problems. It

    • leads to many not relevant results

      • different meanings
      • polysemy
      • different Contexts
    • does not find all results

      • synonyms and metaphors
      • missing context definition

    First because of the lack of explicit formally information and semantics, all the search results are spoiled by answers that are not really related meaning to the intended meaning of the Search.

    Secondly, it will not get all of the existing answers, because of synonyms and metaphors. Different terms may correspond to the same meaning (e.g. latin name of a species) and things might also be described only as a metaphor. Another problems is language by itself due to its polysomy and the possibility to use different terms that have the same meaning.

  2. Information extraction

    • can only be solved currently by a human agent
    • heterogeneous distribution and order of information
    • Software agent does not have sufficient

      • knowledge of Contexts
      • world knowledge and
      • Experience
        • to solve the problem

    Textual description gives information about image content, but if there is only the image that it is difficult to interprete its meaning. Consider now that there is an image with text. For humans it is mostly obvious as they can read the text, but machines can not easily read the text within a picture. This is why it is really difficult to get the information out of the web page.

    Information extraction can currently only be solved by humans as they have an idea how the information is distributed over the page, because it is heterogeneous and has a particular order on the page. On the other hand a computer agent does not have sufficient information and knowledge of the Contexts, world, knowledge nor experience that the user has to interpret to solve the problem of information extraction.

    Therefore, computer agents are not able to solve the problem of information extraction without having explicit formal semantics available.

Implicit knowledge means that information does not have specified explicitly, but must be derived via logical deductions from available information.

Natural language contains a lot of implicit knowledge.

Humans can take clues and out of these clues humans are able to derive by logical deductions new information from the available information.

This is only possible if the information that is available is formally and explicitly defined in terms that make it possible that the semantics can be readably understood by the machine. Otherwise it is not possible to deduce know knowledge from the available knowledge.

  1. Maintenance

    • The more complex and voluminous a website, the more complicated is the maintenance of the only weakly structured data.

    • Problems:

      • syntactic and semantic (link) consistency
      • Correctness

    Syntactic consistency of a website means that for example a link to another website within a document and the the other website where it was linked to is moved away, i.e. the link address is changing. The old link is not pointing any more to the correct website and raises 404 Error. This error can be recognized from the HTTP Error.

    On the other hand, more difficult to solve is the semantic consistency error. Semantic consistency means that if there is a link to another website and in the surrounding of this link it describes the Content of this website, but that changes. The semantic inconsistency is then due that the description of the first website does not fit the content of the second it points to because the content has changed. Here it is very difficult to detect that the change really means a change in the sense, in the meaning of the information that is represented. To resolve this issues is difficult to maintain in a timely way.

  2. Personalization

    • Adaption of the presented information content to personal requirements
    • Problems:

      • from where do we get the required (personal) information?
      • personalization vs. data security

Search engines adapt their search results according to the log files of the user tracks. Cookies can be used to infer the user from previous visits. However simply looking up the log files does not take in account the Context of the user, e.g. does a Search for someone else with different intention. Therefore the problem is where to get the information about what the user really intends to do.

The semantic web is one way to express formally the meaning giving in the web.

Keyword-based Web Search

Traditional keyword-based web search leads to

a) possibly too few relevant results b) possibly too many irrelevant results

Problems Areas of the Web

The Web reaches its limits by at least 4 major problems areas:

  1. Information Retrieval
  2. Information Extraction
  3. Maintenance
  4. Personalization

Vision of the Semantic Web

From the World Wide Web to the Web of Data. The Web was originally designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help.

Precondition:

  • Content can be read and interpreted correctly (= understood) by machines

Natural Language Processing

  • Technologies of traditional Information Retrieval *Search Engines)

Semantic Web

  • Natural Language Web Content will be **explicitly annotated with semantic meta-data
  • Semantic meta-data encode the Meaning (Semantics) of the Content and can be read and interpreted correctly by machines.

This would be only possible if the content of the information can be read and interpreted correctly. It means that it can be understood by machines. Only then machines can participate and help. Originally the way to deduce the meaning of the information was done by Natural Language Processing, which are utilized by Search Engines that try to extract the meaning of the information.

Imagine what would be possible if the meaning of the information in the web would not need to be extracted from the implicit information within natural language which is a error-prone process, but rather than before hand explicitly state what the information given really means.

The semantic web explicitly annotates the information Content of the Natural Language on the web with semantic meta-data. This semantic meta-data encodes the meaning of the Content which can be read and interpreted by machines correctly.

Thus, there are two ways, firstly via Natural Language Processing as t is done today and the other thing is besides encoding the information in natural language, give additional semantic annotation in an explicit way, so that the error prone process of extracting the meaning by Natural Language Processing would be supported by already given explicit stated semantic annotation.

So the Semantic Web really contains explicit Semantics and thus it is pretty much easier to understand the Content and to get rid of the 4 Problems Area of the Web (i.e. Problems of the Web).

Understand the Content on the Web

  • Disambiguation

  • solution of linguistic ambiguities

  • Entity Mapping Disambiguation

  • The Meaning (Semantics) of entities and classes must be defined explicitly

  • The Semantics is expressed with the help of appropriate knowledge Representation (Ontologies)

Information of knowledge is given by properties that instances of class members have as well as by Relations of different classes. This is the way how information on knowledge is represented.

What is the Semantic Web?

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.

The Semantic Web - a Web of Data

By that it is meant that he meaning of the information (Semantics) is made explicit by **formal (structured) and standardized knowledge Representations (Ontologies),

Thus it will be possible:

 * to process the meaning of information automatically
 * to relate and integrate heterogeneous data
 * to deduce implicit (not evident) information from existing (evident information in an automated way

The Semantic Web is kind of a global Database that contains a universal network of semantic propositions

Semantic Web Technology Stack

Semantic Web towards an "intelligent" Web Technological the Semantic Web builds up a stack. The technologies start on the web as bottom. There knowledge representation, knowledge exchange, languages build up one layer over the other. In the level of the web there are identification mechanisms, like URI and Access mechanisms like HTTP and codding mechanisms like Unicode as well as basic authentication that is implemented in the Web layer. Above this is an information exchange layer where there is encoding like XML, TURTLE, RDFa and μFormats. So this gets semantic information into web pages and into XML. Above this there is the possibility to say what is a fact. This is done with the Resource Description Framework (RDF). These facts can be exchanged and queried on the other hand. Based on top of this facts, there also need to be stated what is world and how does the things in the world the classes relate to each other. Thus we have to create models and this is done with Ontology Representation, like OWL which is the Web Ontology language or RDFS with is the RDF schema language, or with simple Ontologies like SKOS. Moreover we can apply logic to make logical deductions and to get more and new knowledge out of the existing knowledge.

information exchange layer.

Machine-Understandable

Machine understandable means that Content can be read and interpreted correctly by machines.

The Vision of the Semantic Web

To explicit represent knowledge and what to do with this knowledge.

The basic layer, the foundation of the semantic web is of course the World Wide Web. So we relay on the the technology that is in the World Wide Web. First of all we have to identify things that we are talking about. Therefore an already existing identification mechanisms that the web provides is used. This is the URI - the Uniform Resource Identifier

dbpedia

dbpedia is the semantic version of the popular Wikipedia.

Resource Description Framework

The Resource Description Framework (RDF) provides means to structure knowledge in the form of small triple facts. A subject, property and object form a simple fact in RDF.

OWL

  • Description logics + rules
  • The Web Ontology Language
  • More complicated facts can be represented with the help of rules.
  • Logical constraints.

SPARQL

Query language pretty much like SQL. All this statements are based on graphs. SPARQL is like a graph access language.

Semantic Entity

A semantic entity is an object or thing with a given explicit meaning.

Class

Any class may have more than on subclass.

Applications in the Web of Data

Linked Open Data

Linked Opne Data (LOD) denote publicly available (RDF) Data in the Web, indentified via URI and accessable via HTTP. Linked Data Link to other data via URI.

  • Information is dynamically aggregated form external
  • publicly available data
  • No screen scraping
  • no specialized API
  • Data available as Linked Open Data
  • Data access via simple HTTP Request
  • Data is always up-to-date without manual interaction

Search Engines Document Retrieval

  • Search Engine Query String.
  • Knowledge Representation (Ontology, Linked Data).
  • General Problems:

    • correct interpretation of query string
    • correct identification of Entities
    • automatic disambiguation
    • usability
    • personalization

The general problems of document retrieval with conventional search engines are:

  • Identification of semantic entities
  • Interpreting the query string correctly
  • Insufficient usability and personalization

Search Engines Fact Retrieval

Asking a question in natural language and retrieving and answer in natural language Query String Answer

Exploratory Search

Finding something that was not searched in the first place. Exploratory Search is possible when there is actually semantic information available.

Intelligent Agents

Intelligent Agents in the Semantic Web.

In the WWW2 users can look up content in the web with the help of browsers which are presentation services (e.g. Firefox) and find documents via the use of specific information retrieval services (e.g. Google).

In the semantic web (WWWW3) this is handle somehow different. The user have on their device (e.g. computer, notebook) have some kind of personal assistant. This personal Assistant will make use of intelligent infrastructure services and these services are accessing the web documents and they are collecting the information that is relevant for the user. The personal assistant will decide how to connect this information and how to present it to the user.

3 Generations of Web Documents

We can distinguish three generations of web documents.

  1. Static Web Pages
  2. Dynamic Web Pages
  3. Adaptive Web Pages

In the first phase of the web (1. generation) there were only static web pages that were programmed and represented as HTML and the form was represented as Cascade Style Sheets (CSS).

Then there were the second generation which were dynamic web pages and interactive web pages. On the one-hand there were interactive elements that were programmed with the help of JavaScript or Applets and on the other hand there were Content-Management Systems. In the latter there were templates that were accessing database information ad the web page itself that were represented in the browser for the user were build up or was created in a dynamic way from the template and from the Database Content.

In the semantic web there will be the third generation of web pages. There we have do not have existing web pages, rather than we will have something like virtual pages which means we have our Information Assistance and below there the infrastructure services, maybe in the form of Netbots, and they are collecting the information that we need from the data sources we are interested in and create according to our preferences a known virtual web page. The user preferences are taking into account. The users personal management learns with the help of machine learning how the user likes to have her/his information represented. So a user model will be created automatically and the layout will be adapted according to the user preferences. This of course will be developed together with the user.

Toolbox for the Semantic Web

  • Standardized languages to express semantics of information content in the Web (XML/XSD, RDF(S), OWL, RIF)
  • Tools to use semantic information in the Web *(RDFa, GRDDL, ...)
  • Various fields of computer Science:

    • Artificial Intelligence
    • Linguistics
    • Cryptography
    • Databases
    • Theoretical Computer Science
    • Computer Architecture
    • Software Engineering
    • Systems Theory
    • Computer Networks

From the view of computer science in general the toolbox that we have for the semantic web are a bunch of standardized languages and knowledge representation. And there are tools to use this information in the Web and there are also techniques how to get these infromation into the web.

Several problems in computer sciences are addressed by the problems that arise and are addressed by the semantic web.

First of all it is artificial intelligence there it is logic. We need logic for a formal knowledge representation.

Lingustic is reqired as a quirey string need to be understaood. We need to map query entities to semantic entities.

Access restriction cryptgraphy Database techonolgies, beaucs teh knwoledge represnetation need to be stored in a persistent way. Traditional database are not very well suited for this task.h

Theoretical computer sciences are requried because we need to trim and tune this algorhtms. and make them as stream lined as possible and it needs to be take into account the complextity of all that like runtime and storage complexity. Also Comptuer archicture is invocled as well as software engineering, systems theory and also in the nend comptuer Networks


Tags:

Edit this page
Wiki-logo