Aggregate Function

Created on Feb. 23, 2013, 12:20 p.m. by Hevok & updated by Hevok on May 2, 2013, 5:19 p.m.

Aggregation Functions are able to count the number of results, to group results and compute constraints on these groups. This starts with simple counting. If one wants to count the number of items in a database for instance

In the Aggregate function on has the keyword Count and what is counted is the number of times the variable Gene is assigned to a value of the Database and one select this also as a new Variable ?C. The result is simple a number as all those results will be simply counted that qualify for the basic Graph Pattern that matches to the RDF Graph.

  • SPARQL 1.1 allows the use of aggregate functions in queries.

Data:

@prefix ex: <http://example.org/>] .

ex:SIRT1 ex:score 3 ;
         rdf:type ex:Aging-Suppressor .
ex:mTOR ex:score 5 ;
         rdf:type ex:Gerontogene .
ex:FoxO ex:score 8 ;
        rdf:type ex:Aging-Suppressor .

Query:

PREFIX ex: <http://example.org/>]
SELECT (Count(?Gene) AS ?C)
WHERE { ?Gene ex:score ?Sc }

Result

?C
3

Aggregate can also be used to count the number of distinct categories. For this one has a SELECT and the aggregation function Count and as one only wants to take into account distinct types, so one states DISTINCT as the keyword followed by the Variable ?T for type, then it is stated that one selects this as the Variable ?C and one only matches those RDF Triples that the basic Graph Pattern Gene has rdf:type ?T. So one selects the different types and the DISTINCT keyword.

Query:

PREFIX ex: <http://example.org/>]
SELECT (Count(Distinct ?T) AS ?C)
WHERE { ?Item rdf:type ?T }

Result:

?C
2

Further one can restrict things within Groups via the GROUP BY Statement. So one can count all genes per Category. Thus group the results by Category. Here ?T is the variable that holds the Category according to this basic Graph Pattern. Next one again selects ?T, the type, and for each ?T one counts all Genes that belongs to ?T. The result will be a simple Table consisting of Types and the number of Genes belonging to this Type.

Query:

PREFIX ex: <http://example.org/>]
SELECT ?T (Count(?Gene) AS ?C)
WHERE { ?Gene rdf:type ?T }

Result:

?T               ?C
Aging-Suppressor 2
Gerontogene      1

Alternative on can also use the keyword DISTINCT and count the distinct Genes, then one would have the number of distinct Genes per Category.

Going one Step further one can obtain only those groups or Categories that have more than just one Item. HAVING Count Gene create than one means having counted the number of gene sin each category and only take those into account that where is more than one Gene. Hacing Cout simple serves as a Filter and filters all Genes or Categories that contain one Gene or less.

Query:

PREFIX ex: <http://example.org/>]
SELECT ?T (Count(?Gene) AS ?C)
WHERE { ?Gene rdf:type ?T }
HAVING Count(?Gene) > 1

Result:

?T               ?C
Aging-Suppressor 2

Besides, Count one can also sum up with SUM, compute an average sum with AVG also of numerical values, a minimum or maximum value with MIN and MAX that can also be applied to Strings.

SAMPLE is a function that picks one Element from a number of Elements from a Group in a non-deterministically way, so one can pick from a larger Group of Items.

GROUP_CONCAT means that all items with a group will be concatenated with designated string separate that needs to be given for concatenating those single Items.

  • more aggregate Functions
  • SUM
  • AVG
  • MIN
  • MAX
  • SAMPLE -- "pick one non-deterministically
  • GROUP_CONCAT -- concatenate value with a designated string separator.

Given a small knowledge base of different persons. For each person there are given several information like one or more foaf names (official names) as well as several nick names for each person. Assume one wants in the result one of the official one. no matter which one, and also to pick out all the nicknames connected in a comma separated list. For this one prepares a SELECT Statement and state that one wants in the result just one arbitrary official name so one can pick those with the aggregation function SAMPLE and as a second part/colon of the result one wants to have a concatenated list of nick names that are separated by Comma. In the Graph Pattern one selects person the variable ?P which is a foaf Person. For this person one selects the foaf names with the variable ?N and the nick names in the variable ?M. Via the aggregation function SAMPLE one of the official names will be picked for each person and all of the nick names will be concatenated and the result is a simple is with a colon containing one arbitrary official name and a concatenated string with all the nicknames of the persons separated by a comma.

Query:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>]
SELECT ( SAMPLE(?N) as ?Name)
       ( GROUP_CONCAT(?M; Separator = ", ") AS ?Nicknames )
WHERE { ?P a foaf:Person ;
           foaf:name ?N ;
           foaf:nick ?M . }
GROUP BY ?P
aggregate.jpg

Tags: coding, programming, scoring, function
Categories: Tutorial
Parent: SPARQL 1.1

Update entry (Admin) | See changes

Comment on This Data Unit