Names Entity Recognition

Created on March 14, 2013, 12:11 a.m. by Hevok & updated by Hevok on May 2, 2013, 5:22 p.m.

Named Entity Recognition (also Entity Mapping, Identification or Extraction) is the Process how one finds the right Entity to connect with within the Semantic Network. For example if one has a text and one has only a Name given in a Text, then this name is only a Character String of course it is not an Entity. With this name which is ambiguous one can find the right Entity in an automated way.

Named Entity Recognition denotes the locating and classifying Atomic Elements into predefined Categories such as Names, Persons, Organizations, Locations, Expression of Time, Quantities, Monetary Values, etc.

Or even on a more finitely Level one can map the Entities to a Semantic Entity which is connected to an Ontology, so not only to a Category or a Classification, but to a Named Entity.

Our Language is able to represent objects from the real World with Symbols. Usually a lot of other Factors are required to solve the Ambiguity Issue, for example one need the Context and the Pragmatics of the one who communicates the Message that consists out of the Symbols. Also it depends on the own Experience how one interprets it. Sender and Receiver must use the same Concepts to understand each other. The Process that is between the Symbol and the Concept is rather important, because for a machine it needs to know how to map a Symbol to a Concepts, which means an Ontology or Knowledge Representation.

Many terms are ambiguous, which means they have several meaning and can for instance refer to a Person or a Location. Therefore we need more Information, we need Context Information to get it right.

                       Astronaut -same as-> Cosmonaut <-is a- Juri Gagarin
Neil Armstrong -is a-> Astronaut -subClassOf-> Science Occupation -subClassOf-> Employment
               -is a-> Person -has a-> Employment
                       Person -is NOT a-> Science Occupation

The Knowledge consists of Entities and Ontologies. because we know that the Classes are related somehow we can deduce Relations among the Entities the Classes refer to.

To get from a String or a Symbol to the Knowledge Representation, i.e. to get the Mapping between between Symbol and Concepts, one need to apply Named Entity Recognition.

We have the Entire Web of Data, Linked Open Data to map a Symbol to. To Decide this one need Context. Context can be given in form of Characters or other Symbols. For example given a Text in Natural Language which may have lots of ambiguous Terms in there (at least ambiguous nouns). In order to disambiguate these Concepts and try to the Right Ontology, the right Entity to these Strings one has to take into account all possible Meanings. The first thing one has to do is to try to identify all ambiguous Terms (e.g. the nouns) and try to determine all possible Entity Mapping Candidates.

First of all one has to do an Linguistic Analysis of the Text, which is Part Of Speech (POC) Tagging where each Component, the Nouns, the Verbs and the Adjectives are categorized and Nouns are classified. Then there is some Normalization Process, one has to look at different Encodings and Spellings because if one has Symbols or Names that come from other Languages, one has to trans-code this Characters, some special Characters are Language dependent and there are Language-dependent different Spellings of Names for instance. Also there are some special Abbreviations, Acronyms and type-dependent Spelling (e.g. putting the first name in the front of the name or Surname, Comma and then the First Name). One has to take this into account of course. There might be alternative Names and Synonyms. Therefore one has to consider Fuzzy String Mapping.

In the End if one looks for a Term in the DBpedia and one considers all possible Mapping one normally will end up with quite a lot, a huge Set of simple Entities, but one has to identify the right one which is a difficult Process, not for Humans as they have their Experience and they know how to read and interpret the text in Context in most cases, but for the Machine this is difficult. Therefore one has to determine all possible Mapping Candidates and finally one has to choose or develop Methods to determine the right Entity. For this one has to determine Context and the ambiguity of the Source Data and the Ambiguity of the Mapping they all determine the Accuracy and the reliability of our Mapping of the Data. This has to be taking into account.

This is done for all the nouns in a sentence. Now one has to check each possible Combination, which one is the one that is most likely the right one to match. These Task is getting a complex Task for the Machine because lots of comparison and Computation have to be performed.

  • Determine all possible Entity Mapping Candidates
    • Linguistic Analysis (POS Tagging)
    • Normalization
    • Encoding and Spelling
    • special (language dependent) Characters
    • Abbreviations, Acronyms
    • Type dependent Spellings
    • alternative Names and Synonyms
    • Fuzzy String Mapping
    • ...
  • Entity Selection process is determined by
    • Context
    • Ambiguity of Source Data / Mapping
    • Accuracy / Reliability of Source Data / Mapping
  • Consider all Entities within the same Context

Tags: identification, extraction, mining, entities, text, semantics, data
Categories: Concept
Parent: Text mining
Children: Disambiguation, Named Entity Recognition Strategies

Update entry (Admin) | See changes

Comment on This Data Unit