Denigma: Change Semantic Web Against Aging

Change - Semantic NWeb Againstwork Aging

Created on Jan. 11, 2013, 3:31 a.m. by Hevok & updated on Jan. 30, 2013, 3:30 p.m. by Hevok

=================================== ¶
Semantic Web for Longevity Research ¶
=================================== ¶
¶
.. contents:: Contents ¶
¶
Authors : Daniel Wutske (Hevok), Anton Kulaga (antonkulaga), Dmitry Borisoglebsky ¶
¶
Preface ¶
======= ¶
Let us start from the original description of Denigma Project and then move forward. ¶
¶
“By integrating all the heterogeneous types of biological data and applying a robust unification schema as well as utilizing the increasingly computational power for logical inference, it will be possible to solve biological problems such as aging, diseases and suffering due to other reasons” [http://www.denigma.de/data/entry/denigma-description]. ¶
¶
¶
General considerations ¶
====================== ¶
¶
In order to boost longevity science drastically a big and complex system must be build that comprises many elements. ¶
¶
That is why it is wise to choose a small pieces of functionality (so called bootstrapping pProject) that can be done by a small group of people and can encourage others to join or provide resources for us. ¶
¶
Semantic wWeb may be a good choice for the bootstrapping pProject, because it provides a foundation for a lot of other functionalities, can be used for a lot of different purposes and can attract many people. ¶
¶
That is why the system is defined as having at least two parts: ¶
¶
* general part that may be used for every sScience ¶
* specific part that is specific for bBiogerontology ¶
¶
For the general part it is easy to attract people from outside (i.e. a lot of folks are interested in sSemantic wWeb but are indifferent to longevity). In the same time the specific part must be done by us and other members of longevity community. ¶
¶
In fact the system may consists from many separate components, many “bricks”. Each of them is an Open Source Application and can have its own team (teams may intersect of course). ¶
¶
¶
Theoretical background or why it may work ¶
========================================= ¶
¶
In social sScience there are several theories that may explain why our system can boost scientific progress. That does not matter much from practical point of view but anyway it may be interesting for you. ¶
¶
According to Collective iIntelligence tTheory [http://en.wikipedia.org/wiki/Collective_intelligence]: ¶
¶
collective intelligence of the group does not equal a sum of individual intelligences and is heavily dependent on relationships, communication channels and other aspects (details are omitted). So what we are doing is enhancing collective intelligence of the system by merging human and machine intelligence (applying machine learning techniques) and creating tools and procedures that allow users to collaborate in a more effective ways. ¶
¶
According to Extended mMind tTheory [http://en.wikipedia.org/wiki/The_Extended_Mind] the mind is seen to encompass every level of the cognitive process, which will often include the use of environmental aids. So the system may be kind of external mind for researchers (and other users) and the community as a whole. ¶
¶
According to New iInstitutional eEconomics [http://en.wikipedia.org/wiki/New_institutional_economics; the society ~~are~~is graph of people modeled as aAgents with bounded rationality and limited awareness that interact with each other according to their values, formal and informal rules and rule control mechanisms that are called “iInstitutions”. Every interaction, every connection and transaction has its cost, so called “transaction cost” (time, efforts for negotiation and analysis finding partners, coordinating efforts, collaborating etc.). Where transaction costs are high hierarchies are formed (less connections in the social graph - less transactions and transaction expenses), otherwise networks are created. So that we are making a system that drastically lower different kinds of transaction costs and provide new rules for interactions and new semi-automatic transactions (done by automatic agents). ¶
¶
It is all about theories, so let us now move to the more practical part. ¶
¶
Practical considerations ¶
======================== ¶
¶
Knowledge management ¶
-------------------- ¶
In order to utilize power of machine intelligence and other above mentioned sophisticated techniques we have to get out the knowledge out of people heads and transform descriptions into machine readable formats (for instance: reStructuredText as well as database entries) that allow to build up graphs and hypergraphs. ¶
¶
Actually this are tTasks that Semantic Web solves: ¶
¶
* aAutomation of information processing by agents (computers, humans, organizations). ¶
* aAnnotation of information sources (transforming an information source into the system of “knowledge atoms” that is readable both by man and machine as like). ¶
¶
In the long term, when we have enough high quality well structured (mainly because of utilizing sSemantic wWeb) information we may apply superior machine learning techniques like: ¶
¶
* Complex Semantic Querying ¶
* Text Mining ¶
* bBayes networks ¶
* Logical Inference (where possible) ¶
* Markovs Chains ¶
* Support Vector Machines ¶
* Neural Networks ¶
* and a lot of other interesting implementations ¶
¶
Collaboration ¶
------------- ¶
In the same time we have to admit that the main task of our system is letting Researches and other users be involved to produce better results and move sScience forward faster. For exactly this purpose we need not only better data and tools to work with but also better work-flow, easier (with less transaction cost) cCollaboration, better stimulus (to increase motivation and involvement) and better decision making. ¶
¶
A lot of interesting technologies and tools may be used. But for now, it will be focused on the core and the easiest to implement features. ¶
¶
-Gangman- Git-style wiki-editing ¶
-------------------------------- ¶
Wiki-wiki (i.e. Wikimedia) provides new rules/procedures for cCollaboration by its change editions and mMarkup, new institutions (in terms of new institutional economics) that are hard to reach without IT. That is why Wikipedia/mediawiki was so successful in encyclopedic field. In the same time it is not enough for sScience, its leading edge. If it was we would write our aAcademic aArticles to wWikis instead of academic journals. ¶
¶
The main problem is its overcentralization. It is good for simple and well researched subjects where there are one dominant theory in sScience. But all is too vague on the edge of sScience where there are different alternative tTheories, new discoveries and debates, speculations, a lot of information that ought to be checked and refined etc. In such situations “cCentral rRepository” is not suitable. ¶
¶
The other problem of mediawiki is its weak semantic features. There are a lot of add-ons like semantic wiki but they are limited (mainly because mediawiki architecture was made for other purposes and a lot of crutches should be used to let it behave in another way) and often difficult to implement. ¶
¶
That is why we an alternative is required. There is great collaborative editing (as well as collaborative filtering) mechanism in Git. that can be borrowed and adapted for our purposes [tech talk video that explains Git development in Linux: http://www.youtube.com/watch?v=4XpnKHJAok8] ¶
¶
In Git there are no central repository. If you want to change something you make a fork of it, implement changes and make a pull request. Collaborative filtering works well in Git. There are a lot of different independent repos and people pull from repos they trust and consider good written, owners of trusted repos in their turns pull only from repos the find valuable and so on. So there is a workflow that filters out bad code and accepts the most qualitative one. ¶
¶
The other things~~tuff~~ that we may implement much better than in mediawiki is information storage and retrieval. We may use hybrid (hyper-graph-document database with schema less/full/mixed mode) and store and traverse relationships easier. How it may work in our system is defined in a scheme [http://denigma.de/url/3f]. ¶
¶
¶
Goals and roadmap ¶
================= ¶
We are defining clear goals and provide and rRoadmap. ¶
¶
¶
Architecture ¶
============ ¶
The architecture can undergo rapid drastic changes, but one thing is clear. We must have a kind of sSemantic wWeb ecosystem that consists of different interconnected services (i.e. oOntology builder, query builder, graph/hypergraph storage engine, data visualization, wiki-like editing system, etc.) that are connected together by means of open protocols. ¶
¶
Semantic web services ¶
---------------------- ¶
Regarding the sSemantic wWeb sServices important issues to consider are: ¶
¶
* Annotation - what services do in a for that is readable by machine (in addition to how question of “normal” web services). ¶
* Service discovery - before one can use any service, one must find it. ¶
* Composition - a combination of web services into another one. ¶
* Interoperability - data from one service may be used by another one. ¶
* Invocation - calling an individual web service and making use of the results. ¶
* Privacy and security - encryption and digital signature information about how input/outputs will be passed and stored; to where information may be sent and for what purpose. ¶
¶
Interoperability ¶
---------------- ¶
For the first attempt we will use JSON rest services and wWebsockets. ¶
¶
Websockets will be used for connecting front-ends (there may be several of them) to the backends in a responsive way, so we can make push notifications from server to user and use sfeatuffres like aAuto -completion (e.g. when querying) and sSearch without page reloading. ¶
¶
REST webservers may be used for communication between different server-side parts especially if they are written in different languages (like Python and Scala). ¶
¶
Graphs/hypergraphs and ontologies ¶
--------------------------------- ¶
We will use OrientDB for storage and editing of the oOntology and data. It is Open- Source and very flexible. We can create different knowledge structures with it: graphs, hypergraphs, relationships, documents etc. and traverse them. It can work in schema-less and schema-full mode, so that we will be able to create oOntologies using its native form. ¶
¶
Collaborative editing ¶
--------------------- ¶
How collaborative editing will be implemented is described in the yEd scheme. ¶
¶
Specific part ¶
============== ¶
Here we describe parts of the pProject that are specific to bBiogerontology ¶
¶
The initial goal setting need to be very clear. The question is would full scale "Digital Decipher Machine" include: ¶
¶
* the majority of Research results that are anyhow related to Aging and longevity Research? ¶
* a library of the past, current and future rResearch pProjects related to Aging and longevity Research? May be a control or management framework? ¶
* a Research data storage? ¶
* a collection of software used by an average rResearcher in the related fields of sScience? ¶
* a collection of knowledge management tools and methods that would allow to speed up an average Research Project? ¶
* a knowledge base for these tools? ¶
¶
If yes for one or few points, what is the current state and why/how academia and business are struggling now? What are the trends? What is the future state and why/how academia and business would benefit from the pProject?

Comment: Changed title and revised text.

Facets

Professions

Achievements

Change - Semantic NWeb Againstwork Aging

Comment on This Data Unit