Named Entity Recognition Strategies

Created on March 14, 2013, 12:14 a.m. by Hevok & updated by Hevok on May 2, 2013, 5:22 p.m.

There are several Strategies for Named Entity Recognition/Entity Mapping. One has for example to take into account the popularity of the Symbols and the Entity. Then there are Linguistic and Statistical based Strategies. For example one refers to a Large Corpus of Documents, e.g. if one looks at the Texts that are represented in the Wikipedia, then one can look up all the possible candidate Terms. within the Wikipedia Pages of the Entities and on can see whether they co-occur and whether they occur both or all in one single Page. Thus this way they would be put in the same Context which makes it much more likely that they really belong together. Then there are also Semantic Strategies that work quite similar but they do not use Wikipidia, but on this hand they use the DBpedia.

The means that one uses are the reference text copra like the Wikipedia, one uses the link Graph from the Wikipedia and the Semantic Graph from the DBpedia. In general with each of the Strategies one can work in the following way in a general Approach. First one makes an Assumption, about which would be the right Entity to choose. Then one asks, do these Strategies really support or contradict the Assumption. Finally one makes the decision according to Logical and probabilistic Rules and Constraints that one has an put there in advance [N. Ludwig, H. Sack, "Named Entity Recognition for User-generated Tags", TIR 2011].

Try to map a Symbol to some Wikipedia Pages. It will be much more likely that the correct terms are co-occurring in articles. One looks at the Text and tries to find the Terms in all of the pages (i.e. multiple terms in a single page). Then one can take into the account the number, how often they do occur together which is a co-occurrence Masher which is a Statistical and Linguistic Masher that can be used to decide which is the right Context, which one is the right Entity, which one should be choose for this ambiguity.

The same can also be done via the DBpedia Link Graph. One just looks at the Links that are between the Entities. Between these Entities one might find links and the thing one is looking for are connected Components. There might be other connections, but in a closed connected Component, there it is much more likely that this Entities together form a Context with a high probability and they can be mapped against the symbols that occur in the same Context. Then one would map the Strings or Nouns to the according URIs that stands for the Entities we have determined in the Mapping Process.

Whenever one has named Entities that can be mapped against natural Language expressions in some text Corpora, one can use these URIs as Semantic Annotations, because these URIs denote Entities that refers to an Ontology where the Knowledge about this Ontology is stored or represented. This is Semantic Annotation. This Semantic Annotation can for example also be used in the Search Process, the Information Retrieval Process, to proof the Information Retrieval Process.

  • Select matching Entities from all possible candidate Entities:
  • General Approach
    1. Make an Assumption
    2. Do the Strategies support or contradict the Assumption
    3. Make Decision according to logical and probabilistic Rules/Constraints
  • Entity Selection Process
  • Consider all entities within the same Context.

Tags: mining, indentification, approaches, tactic, entities, text, language
Categories: Concept
Parent: Names Entity Recognition

Update entry (Admin) | See changes

Comment on This Data Unit