Summary of the invention
The technology of the present invention solves problem: overcome the deficiencies in the prior art, it is provided that a kind of Ontology Query based on ontology semantic information draws
Holding up optimization system, Ontology query language statement is optimized by the semantic information related to the use of in body so that query statement obtains
Simplify, improve the efficiency of user's inquiry.
The technology of the present invention solution: a kind of ontology query engine based on ontology semantic information optimizes system, including: inquiry language
Sentence pretreatment module, query statement derivation module and query optimization module;Wherein:
Query statement pretreatment module: query statement resolves into a series of simple conjunctive query, will change into analysis by query statement
Taking normal form, have only to afterwards inquire about each conjunctive query statement, Query Result is the union of each conjunctive query result;
For single conjunctive query, query interface therein is divided into terminology variable and asserts block variable two class, inquiry atom is also divided
Become two big classes, i.e. assert block atom and terminology atom;Individually being put forward by terminology atom, composition pertains only to terminology information
Query statement, call query engine, on terminology inquire about, by Query Result successively replacement query statement terminology become
Amount, forms a series of query statement pertaining only to assert block message;Described terminology variable represent this occurrences at RDF,
Concept and the position of attribute in triple, assert that block variable then occurs from the position of example, and described RDF is resource description frame
Frame, a kind of markup language for describing web resource, terminology atom then relates to the inquiry atom of terminology information, its
He is then to assert block atom;
Query statement derivation module: for single block query statement of asserting, changes into each inquiry atom and asserts that block breaks accordingly
Speech, each variable becomes asserting an example in block, the knowledge base new with terminology composition;For new knowledge base, use
It is made inferences by ontology inference machine, and ontology inference machine utilizes the semantic information in body to make inferences, and draws in knowledge base implicit
Information, including the uniformity in judgemental knowledge storehouse, draw implicit triple relation according to dependency rule, believed by Ontology
Breath derives a series of implicit information;
Query optimization module: be optimized former query statement according to the implicit information derived, obtains the most succinct looking into
Ask statement;To assert that block query statement calls query engine and asserting that block obtains result and ties mutually with the Query Result on terminology
Close, i.e. can get the result of former query statement.
In described query optimization module, according to the implicit information derived, former query statement is optimized, obtains the simplest
Clean query statement is implemented as follows:
(1) if reasoning show that knowledge base is inconsistent, then explanation query statement is problematic, can not get Query Result, now looks into
Inquiry result is meaningless, so need not optimize;
(2) it is owl:sameAs for the triple predicate position derived, i.e. individual equivalence relation, represent two individualities
Two variablees if subject and object are all variable, are then merged into a variable by identical triple, if there being a constant
It is i.e. the value of this variable query with a variable then this constant;
(3) being the triple of rdf:type for the triple predicate position derived, wherein subject position is variable, object
Position is concept, then according to object position concept definition, and owl:equiventClass attribute, i.e. class equivalence relation, represent two
Individual class is identical, then replace correspondence inquiry atom.Make sure to keep in mind the inquiry atom replaced can not comprise its dependent variable;
(4) if inquiry atom comprise (?x,rdf:type,C1), (?x,rdf:type,C2), the statement of i.e. individual type
Triple, represents?X is the individuality of C1 class, is also the individuality of C2 class simultaneously, andThen can eliminate
(?x,rdf:type,C2), it is similar to by sub-attribute elimination method;
(5) if inquiry atom comprise (?x,rdf:type,C1), (?X, p, o), and (p, rdfs:domain, C1), i.e.
Definition territory statement triple, represents that the definition territory of attribute p is C1, then can eliminate (?x,rdf:type,C1)。
(6) by above step, the query statement that can be more simplified.
Individually being put forward by terminology atom in described query statement pretreatment module, composition pertains only to the inquiry language of terminology information
During sentence, if only one of which terminology atom in query statement (?x,subClassOf,?Y), i.e. subtype triple, represent
?X is?The subclass of y, then be focused to find out all individualities meeting condition at term, and substitute original query statement by these concepts
In?X and?y;If only one of which terminology inquiry atom (?X, subClassOf, y), wherein y is constant, and?x
Be not the most query statement it is to be understood that value xnIn one, originally?The Query Value of x should be all subclasses of y, this kind of situation
?X can only replace forming a query statement with y.
Inquiry atom is changed into by described query statement derivation module when asserting that block is asserted, as (?X, rdf:type, Person),
The triple of i.e. individual type declarations, represents?X is the individuality of Person class, then will generate an example entitled?X's
This concept class of the body representative of Person, i.e. people.
Present invention advantage compared with prior art is:
(1) present invention is by utilizing the semantic information in terminology, reaches the effect of Optimizing Queries statement, improves user and looks into
The efficiency ask.
(2) present invention is processing terminology typically, and asserts when blocks of data amount is the biggest, owing to optimization process only have invoked term
Collection information, consumes little, therefore can reduce in a large number at the query time asserting on block by optimizing, and the knowledge base in reality is big
Partly belong to this situation, further increase the efficiency of user's inquiry.
(3) system of the present invention can combine with existing query engine, is applied to the actual development of semantic net, improves application
Scope.
Detailed description of the invention
As shown in Figure 1, 2, in a kind of ontology query engine optimization system based on ontology semantic information of the present invention, RDF uses
(Subject-Verb object) triple structure organization data, knowledge base comprises asserts block and terminology two parts, and terminology is to retouch
State the set of field concept and association attributes, assert that block is to describe class and the set of association attributes example.Simple its shape of conjunctive query
Such as q (x1,...,xn)←a1,...,am;xnIt is the variable of query statement, amIt is that the RDF triple about constraint inquiry is asserted
It is referred to as inquiring about atom (query atom), xnAlso it is amIn element, in triple with?Add a word to represent such as?X,
Other amIn constant represent with common words, specifically comprising the following steps that of optimization
Query statement pretreatment module:
Step one: query statement resolves into a series of simple conjunctive query, will change into disjunctive normal form by query statement, afterwards
Having only to inquire about each conjunctive query statement, Query Result is the union of each conjunctive query result;
Step 2: for single conjunctive query, by query interface x thereinnIt is divided into two classes, terminology variable and assert block variable,
Then will inquiry atom amIt is also divided into two big classes, asserts block atom and terminology atom;
Step 3: terminology atom is individually put forward, composition pertains only to the query statement of terminology information, calls query engine,
Terminology is inquired about, by the terminology variable in Query Result successively replacement query statement, result in formation of a series of pertaining only to
Assert the query statement of block message;
Query statement derivation module:
Step 4: for single block query statement of asserting, changes into each inquiry atom and asserts that block is asserted accordingly, Mei Gebian
Quantitative change is to assert an example in block, the knowledge base new with terminology composition;
Step 5: for new knowledge base, uses ontology inference machine to make inferences it, and ontology inference machine can utilize in body
Semantic information make inferences, draw in knowledge base implicit information, including the uniformity in judgemental knowledge storehouse, according to dependency rule
Draw implicit triple relation etc..So derive a series of implicit information by ontology semantic information;
Query optimization module:
Step 6: be optimized former query statement according to the implicit information derived, obtains the most succinct query statement;
Step 7: by assert block query statement call query engine assert block obtain result and with step 3 looking into about terminology
Inquiry result combines, and i.e. can get the result of former query statement.
Wherein, in step 2, terminology variable represents this occurrences concept and position of attribute in RDF triple, disconnected
Speech block variable then occurs from the position of example, and terminology atom then relates to the inquiry atom of terminology information such as
(?x,subClassOf,?Y), other are then to assert that block atom, concrete differentiating method have had correlative study.
Wherein, in step 3, terminology atom is formed the query statement about terminology information, such as in query statement only one
Individual terminology atom (?x,subClassOf,?Y), then it is focused to find out all individualities meeting condition at term, and general with these
Read and substitute in original query statement?X and?y.If only one of which terminology inquiry atom (?X, subClassOf, y),
Wherein y is constant, and?X be not the most query statement it is to be understood that value xnIn one, originally?The Query Value of x should be y's
All subclasses, this kind of situation?X can only replace forming a query statement with y.
Wherein, in step 4, inquiry atom is changed into and assert that block is asserted, as (?X, rdf:type, Person), then will be raw
Become an example entitled?The Person class of x.
Wherein, in step 5, in six, new knowledge base can be made inferences by existing inference machine, obtain implicit information, so
After in the following order process:
(1) if reasoning show that knowledge base is inconsistent, then explanation query statement is problematic, can not get Query Result.
(2) it is the triple of owl:sameAs for the triple predicate position derived, if subject and object are all to become
Amount, then be merged into a variable by two variablees, if having a constant and a variable, this constant is i.e. this variable query
Value.
(3) being the triple of rdf:type for the triple predicate position derived, wherein subject position is variable, object
Position is concept, then according to object position concept definition, if any owl:equiventClass attribute, then replace correspondence inquiry former
Son.Make sure to keep in mind the inquiry atom replaced can not comprise its dependent variable.
(4) if inquiry atom comprise (?x,rdf:type,C1), (?x,rdf:type,C2), andThen can disappear
Go (?x,rdf:type,C2).It is similar to by sub-attribute elimination method.
(5) if inquiry atom comprise (?x,rdf:type,C1), (?X, p, o), and (p, rdfs:domain, C1), then
Can eliminate (?x,rdf:type,C1)。
Above-mentioned 5 rules, rule 1-3 is the rule that the present invention creates, and rule 4 and 5 is to combine other technologies to be applied at this
In system, actual development system also can add other rules.Query statement can be made to obtain letter by the process of above 5 rules
Change.
As it is shown on figure 3, query engine implement process.
(1) query statement syntax parsing: query statement is carried out morphological analysis and syntactic analysis, it is judged that whether this query statement
Grammaticality.
(2) chart-pattern extracts: analyze chart-pattern (available triple express) from query statement, subgraph template the most to be matched,
For expressing query intention.
(3) graph pattern matching: carry out mating finding Query Result with data set by chart-pattern.
(4) Query Result feedback: according to the setting feedback query result of query statement.
Below in conjunction with concrete example, the present invention is described in detail further, specifically comprises the following steps that
1, data prepare
It is used for verifying that system effect, LUBM are the benchmark of test bodies query language performance firstly the need of preparing some data, bag
Containing university this voxel data Univ-Bench, for test data set, LUBM provides data producer UBA to be used for producing
Test data based on Univ-Bench body.Use UBA 3 different size of data sets of generation, its triple number,
Example number, take up room (MB) be respectively Lubm1 (82415,20659,8), Lubm2 (516116,129533,
50), Lubm3 (1052895,263427,102).
2, system development
Have developed a query engine combined with Jena and optimize system, Jena is the application development tool of Semantic Web,
Thering is provided the query function for Ontology, its query engine does not considers to be optimized from semantic level, therefore available native system enters
Row optimizes, and performs 10 query statements first by Jena respectively for above 3 data, draws average lookup time.
Then native system is combined with Jena query engine, illustrates Optimization Steps with wherein query statement, inquire about language
Sentence is: find out each laboratory leader, and the employee that the laboratory of his work is engaged.It is expressed as:
q(x,y)←(?x,rdf:type,?c).
(?c,rdfs:subclassof,person).(?x,ishaedof,?z)
.(?x,workat,?o).(?o,hasamember,?y)
Wherein q (x, y) represents that the value looked for required for this query statement is variable x and variable y, represent respectively laboratory leader,
With laboratory employee.A string triple after ← is then used to bound variable x and variable y's, it is to be noted that at inquiry unit
In, variable money all adds?As differentiation.
(1) this query statement has been conjunctive query, need not process.
(2) in query interface,?C is terminology variable, and remaining variables is to assert block variable,
(?C, rdfs:subclassof, person) it is terminology atom, remaining is to assert block atom.
(3) incite somebody to action (?C, rdfs:subclassof, person) call Jena query engine and inquire about on terminology, draw
All subclasses of person class, owing to meeting the special circumstances mentioned by step 3,?C can only substitute with person.
(4) obtain one to pertain only to assert that the query statement q' of block message is as follows:
q'(x,y)←(?x,rdf:type,person).
(?x,isheadof,?z).(?x,workat,?o).
(?o,hasamember,?y)
By this query statement, generate assert block message comprise (?X, rdf:type, person) etc. triple, now?Become
The part of Instance Name rather than represent variable, is combined with terminology and to form new knowledge base.
(5) can be drawn by reasoning, (?X, rdf:type, chair), (?z,owl:sameAs,?The hiding information such as o).
(6) according to (?X, rdf:type, chair) in the middle owl:equiventClass attribute of chair concept, it is equivalent to
Person class and the common factor of isheadof, its replacement query atom available (?X, rdf:type, person) and
(?x,isheadof,?The hiding information that z), obtains according to the 5th step (?z,owl:sameAs,?O), can be by?Z and?O closes
Be a variable, then and pass through (isheadof, rdf:subPorperty, workat) can eliminate (?x,workat,?O) pass through
Query statement after above-mentioned steps simplifies is:
q'(x,y)←(?x,rdf:type,chair).
.(?x,islead,?o).(?o,hasamember,?y)
Then on Jena, perform this query statement i.e. can get identical result.
3, Comparative result
Query time and optimization spent time before optimizing and after optimization can see table:
Time (s) |
Lumb1 |
Lumb2 |
Lumb3 |
Before optimization |
0.38 |
5.2 |
20 |
The optimization time |
0.15 |
0.15 |
0.15 |
After optimization |
0.2 |
4.1 |
14 |
By upper table, after employing the system of invention, for the biggest data volume, the average lookup time of Jena reduces
More, and it is constant for optimizing the consumed time, big so that with data quantitative change, effect of optimization becomes apparent from.This also demonstrates this
Bright being applicable to inquires about the knowledge base asserting that blocks of data amount is the biggest.Certainly, present system also can be applied and the looking into of other developing instruments
Ask the optimization of engine such as Sesame.
Non-elaborated part of the present invention belongs to techniques well known.
Above example is only in order to illustrative not limiting technical scheme, and any without departing from spirit and scope of the invention repaiies
Change or local is replaced, all should contain in the middle of scope of the presently claimed invention.What the present invention did not described in detail partly belongs to ability
Territory known technology.