CN105912721B

CN105912721B - RDF data distributed semantic parallel inference method

Info

Publication number: CN105912721B
Application number: CN201610293055.5A
Authority: CN
Inventors: 汪璟玢; 叶怡新; 郑翠春
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2016-05-05
Filing date: 2016-05-05
Publication date: 2019-06-07
Anticipated expiration: 2036-05-05
Also published as: CN105912721A

Abstract

The present invention relates to a kind of RDF data distributed semantic parallel inference methods, it is first depending on ontology file and RDFS/OWL rule, transitive closure relational matrix (Transitive closure relation matrix, abbreviation TRM) and link variable information are constructed, and then create-rule marks；Then classified according to the type of link variable to RDFS/OWL rule, separately design different inference schemes, the reasoning of RDFS/OWL rule is concurrently completed in conjunction with MapReduce Computational frame.Example triple is filtered by link variable information and regular marks, the transmission loss of a large amount of useless triple data in a distributed system can be reduced.The number of iterations of reasoning can be reduced by constructing transitive closure matrix, improves the efficiency of reasoning.Finally, by inference as a result, deleting duplicate triple data, in real time to further increase the efficiency of successive iterations reasoning.The reasoning of RDFS/OWL rule can efficiently and be correctly realized in the case where data volume increases through the invention.

Description

RDF data distributed semantic parallel inference method

Technical field

The present invention relates to semantic network technology fields, more particularly to a kind of RDF data distributed semantic parallel inference method.

Background technique

RDF and OWL standard in semantic web has a wide range of applications in every field, such as general knowledge (DBpedia), medical life science (LODD) and bioinformatics (UniProt), by September, 2014, the total amount of data is Up to 65,000,000,000 triples.With the rapidly growth of data in semantic net, centralized environment is not suitable for due to the limitation of memory Reasoning to large-scale data；Research RDFS/OWL distributed parallel reasoning is a newer at present field.As obscured pD* Reasoning, RDFS parallel inference, distributed rule matching system, distributing inference engine WebPIE, distributing inference engine YARM. The method of these reasonings is not efficient enough.The scheme of these reasonings largely combines MapReduce Computational frame, passes through RDFS/ OWL inference rule makes inferences data.

With the rapidly growth of data in semantic net, how efficiently to carry out the reasoning of big data is nearly heat of research in 2 years Point, but also in the initial stage of research.The distributing inference engine of comparative maturity is WebPIE at present, although meeting big data Parallel inference, but since WebPIE for each rule enables one or more MapReduce task and makes inferences, And the starting of MapReduce Job is relatively time-consuming, therefore with the increase of RDFS/OWL inference rule, the efficiency of reasoning receives Limitation.Distributing inference engine YARM uses reasonable data partitioning model, optimizes the execution order of rule, realizes one The new parallelization reasoning algorithm based on MapReduce of kind, so that RDFS can be completed in a MapReduce task in reasoning The reasoning of rule.But the algorithm is not suitable for the reasoning of complicated OWL Horst rule.In addition, working as the new of a certain rule generation When triple repeats, YARM can have excessive redundant computation and generate hash.In order to solve above-mentioned RDFS/OWL distribution Formula reasoning algorithm there are the problem of, this paper presents SPRM algorithm (Semantic information Parallel Reasoning on MapReduce), which can efficiently and correctly realize RDFS/ in the case where data volume increases The reasoning of OWL rule.

The demand of mass data is unable to satisfy under centralized environment, and the reasoning under distributed environment is not efficient enough, reasoning Parallelization.Presently, there are although parallel inferences that distributing inference engine can be realized data, but MapReduce task opens Dynamic number is more and time-consuming, there are excessive redundant computation and generates hash, causes it can not be the case where data volume increases The reasoning of RDFS/OWL rule can efficiently and be correctly realized down.

The technical issues that need to address: how to guarantee that RDFS/OWL distributing inference algorithm is real in 1. solution distributed environments The correctness of inference data after being now distributed.2. combine the distributed schemes proposed to propose corresponding parallel inference scheme, thus Meet the demand of the distributed parallel reasoning of large-scale data.

Summary of the invention

In view of this, the object of the present invention is to provide a kind of RDF data distributed semantic parallel inference methods, in data volume The reasoning of RDFS/OWL rule can efficiently and be correctly realized in the case where increase.

The present invention is realized using following scheme: a kind of RDF data distributed semantic parallel inference method specifically includes following Step:

Step S1: loading mode triple constructs TRM, while according to RDFS/OWL rule, constructing may in every rule The link variable information of connection；

Step S2: according to TRM and link variable information, create-rule markup model；

Step S3: being divided into two kinds of forms of single argument and multivariable for link variable, according to the type of TRM and link variable, RDFS/OWL rule is divided into 5 seed types, separately designs different inference schemes；

Step S4: to Flag_Rule_m=1 rule executes the parallel inference of RDFS/OWL rule, and exports intermediate knot Fruit；

Step S5: the repetition triple in intermediate result is deleted；

Step S6: if including new SchemaTriple in intermediate result, TRM and regular marks model is updated, is returned Return step S4；Otherwise, terminate.

Further, in the step S1, the relevant parameter of triple is defined as follows:

Define 1: the Subject-Verb and object that mode triple SchemaTriple refers to triple are all in ontology file It is defined in OntologyFile, i.e., are as follows:

Wherein, the sum of n intermediate scheme triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈SchemaTriple (1)

Define 2: example triple InstanceTriple refer to Subject-Verb and object at least one in ontology file It is undefined in OntologyFile, it is specific example, i.e., are as follows:

Wherein, n indicates the sum of example triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈InstanceTriple

3: triple type mark Flag_TripleType is defined, for identity mode triple and example triple, institute Triple type mark Flag_TripleType is stated to be defined as follows:

Wherein, n indicate triple total then,

Triple item type mark TripleItem_Flag, it is described for identifying subject or object in triple item Triple item type mark is defined as follows:

Wherein, n indicates the sum of triple.If v ∈ { S_i,P_j,O_k, then

4: mode ternary Groups List SchemaList is defined, for obtaining the mode triple collection of identical predicate or object It closes, the mode ternary Groups List SchemaList is defined as follows:

Wherein, the sum of n intermediate scheme triple, then,

SchemaList=O_m_list∪P_t_list

Wherein, O_m_ list expression meets predicate P_j∈'s { rdf:type } and with identical object triplet sets, with Object name；P_t_ list expression meets predicateAll triplet sets with identical predicate, With predicate name, it is defined as follows:

O_m_ list={ (S_i,P_j,O_k)|P_j∈ { rdf:type } &k=m }

Further, it in the step S1, is defined when constructing TRM as follows:

Definition 5: the transitive closure of digraph is defined as: the transitive closure of the vertex a n digraph can be defined as a n Rank Boolean matrix T={ t_ik(1≤i, k≤n), if there are an active path, t by vertex i to vertex k_ik=1, otherwise t_ik= 0。

Define 6: transitive closure relational matrix Transitive closure relation matrix, referred to as TRM are tied It closes and defines 4, defines 5 and WarShall algorithm building TRM；TRM is indicated in the transitive closure of digraph, is top with class or attribute Point, using the predicate of triple as the Boolean matrix T={ t of relationship_ik}；

Predicate is P_jThe relational matrix of SchemaTriple be expressed as P_j_ TRM, wherein P_j∈{subClassOf, SubPropertyOf, sameAs, equivalentClass, equivalentProperty },

IfThen P_j_TRM.t_ik=1, indicate predicate For P_jSchemaTriple subject S_iTo object O_kThere are direct relation, otherwise P_j_TRM.t_ik=0；

Wherein, TRM points are class relational matrix CTRM and relation on attributes matrix PTRM；Wherein, CTRM is according to predicate SubClassOf, equivalentClass, sameAs define SC_TRM, EC_TRM, SAC_TRM respectively；PTRM is according to predicate SubPropertyOf, equivalentProperty, sameAs define SP_TRM, EP_TRM, SAP_TRM respectively；Due to predicate For the SchemaTriple relationship containing class and relation on attributes of sameAs, relationship is separated in the building of TRM, constructs SAC_ respectively TRM and SAP_TRM.

The building TRM specifically includes the following steps:

Step S11: in conjunction with defining 4, obtain in all SchemaTriple respectively predicate be subPropertyOf and The mode triple data of equivalentProperty, according to the SchemaTriple of acquisition and combine define 6 obtain with SubPropertyOf and using equivalentProperty as the Boolean matrix of relationship；

Step S12: according to WarShall algorithm obtain it is all with subPropertyOf and with EquivalentProperty is transitive relation value in the Boolean matrix of relationship, to generate transitive closure relational matrix SP_ TRM, EP_TRM complete the building of SP_TRM and EP_TRM.

Further, it in the step S1, is defined when constructing link variable information as follows:

Define 7: the link variable LinkVar mode triple to be used to connect two former pieces in RDFS/OWL rule , it is described according to rule, the quantity of link variable is greater than 1；By the link variable information of each rule with<key, value> Form is stored in table Rule_mIn _ Table, wherein key stores all mode triple items for former piece connection of the rule, Value stores the mode triple item of the rule conclusion part.

Further, in the step S3 specifically: right using SPRM algorithm according to the type of TRM and link variable RDFS/OWL rule is classified, quote RDFS rule when using RDFS- rule numbers form, reference OWL Horst rule The form of Shi Caiyong OWL- rule numbers；A rule name label is distributed to every rule, rule name label is the rule Then corresponding title；

5 seed types of RDFS/OWL rule it is as follows:

Class1: the rule that SchemaTriple is combined with SchemaTriple is directly made inferences using TRM；

The rule that type 2:SchemaTriple is combined with InstanceTriple, makes inferences according to TRM, without knot Close link variable information；

The rule that type 3:SchemaTriple is combined with InstanceTriple, link variable information are single argument, It needs to make inferences in conjunction with TRM；

The rule that type 4:SchemaTriple is combined with InstanceTriple, link variable information are single argument, Without being made inferences in conjunction with TRM；

The rule that type 5:SchemaTriple is combined with InstanceTriple, link variable information are multivariable；

To any one rule is defined as: assuming that m rule is expressed as follows:

Rule_m:C_m1,C_m2,…,C_mk,…,C_mn→Result

Definition rule flag F lag_Rule_m, for marking whether the rule is the rule that impossible activate；

When for Flag_Rule_m=0, then the condition for meeting the rule is implied the absence of, without activating the rule；Assuming that P_j For Rule_mIn some former piece predicate, wherein P_j∈{subClassOf,subPropertyOf,sameAs, EquivalentClass, equivalentProperty }, P_jCorresponding TRM is set as P_j_TRM；

In conjunction with definition 6 and 7 are defined, regular marks Flag_Rule_mClass definition according to RDFS/OWL rule is as follows:

If 1) Rule_m∈ { Class1, type 2 }, then

If 2) Rule_m∈ { type 3, type 4, type 5 }, then

Further, in the step S4, pushing away for RDFS/OWL rule is completed parallel in conjunction with MapReduce Computational frame Reason just completes reasoning regular in Class1 in the building TRM stage, is directed to type in the parallel inference stage of RDFS/OWL rule 2, type 3, type 4 and type 5 carry out distributed parallel inference, to realize all rule in a MapReduce task A reasoning then；

For the rule of type 2, type 3 and type 4, according to regular marks model and link variable information, by Map The parallel inference in stage can obtain the reasoning results triple data；For the rule of type 5, by meeting Inference Conditions Data carry out redundancy output to ensure the correctness of subsequent parallel data reasoning, i.e., will meet the example triple of same rule Output is set as identical key key, and to ensure in the Reduce stage, the strictly all rules former piece in same rule can be according to connection Variable is completed to connect and exports result；

The parallel reasoning algorithm of the RDFS/OWL rule in Map stage specifically:

Input: key is line number；Value is example triple

Output: key is the combination of the reasoning results triple or regular marks and link variable；Value is arbitrary value or band The triple item of label；

The RDFS/OWL rule parallel reasoning algorithm in Reduce stage specifically:

Input: the output result of duplicate triple data phase is deleted

Output: key is output triple；Value is arbitrary value.

Further, in the step S5, example triple is filtered by link variable information and regular marks, The transmission loss of a large amount of useless triple data in a distributed system is reduced, reasoning is reduced by construction transitive closure matrix The number of iterations, to improve the efficiency of reasoning；By inference as a result, deleting duplicate triple data in real time, into The efficiency of one step raising successive iterations reasoning；

Duplicate triple data are deleted after executing combiner and each reasoning after each Map,

Wherein, combiner is the repeated data exported in the deletion map stage, specifically:

Input: the output result of the parallel reasoning algorithm of the RDFS/OWL rule in Map stage

The algorithm for repeating triple data is deleted after each reasoning specifically:

Input: key is line number；Value is triple

Output: the triple after output duplicate removal.

Distributed parallel reasoning proposed by the present invention is first depending on ontology file and RDFS/OWL rule, constructs transitive closure Relational matrix and link variable information, and then create-rule marks；Then according to the type of link variable to RDFS/OWL rule Classify, separately design different inference schemes, concurrently completes RDFS/OWL rule in conjunction with MapReduce Computational frame Reasoning.Example triple is filtered by link variable information and regular marks, a large amount of useless triples can be reduced The transmission loss of data in a distributed system.The number of iterations of reasoning can be reduced by constructing transitive closure matrix, is improved The efficiency of reasoning.Finally, by inference as a result, deleting duplicate triple data, in real time to further increase successive iterations The efficiency of reasoning.Pushing away for RDFS/OWL rule can efficiently and be correctly realized in the case where data volume increases by this programme Reason.

Compared with prior art, the present invention has the advantage that

1, the present invention constructs TRM, reduces the number of iterations of reasoning.

2, the present invention uses efficient strobe utility, reduces the expense of useless calculating and network transmission.

3, the present invention improves the efficiency of reasoning.

Detailed description of the invention

Fig. 1 is the overall framework figure of SPRM algorithm of the invention.

Fig. 2 is RDFS rule of the invention.

Fig. 3 is OWL Horst rule of the invention.

Fig. 4 is building TRM exemplary diagram of the invention.

Fig. 5 is the link variable hum pattern of building RDFS/OWL rule of the invention.

Fig. 6 is the specific mode data figure of link variable of the invention.

Fig. 7 is the mode data figure in link variable storage table of the invention.

Fig. 8 is the Framework for Reasoning figure of SPRM algorithm of the invention in the t times iteration.

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and embodiments.

The present embodiment provides a kind of RDF data distributed semantic parallel inference methods, as shown in Figure 1, specifically including following Step:

Step S5: the repetition triple in intermediate result is deleted；

In the present embodiment, in the step S1, the relevant parameter of triple is defined as follows:

Wherein, the sum of n intermediate scheme triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈SchemaTriple (1)

Wherein, n indicates the sum of example triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈InstanceTriple

Wherein, n indicate triple total then,

Wherein, n indicates the sum of triple.If v ∈ { S_i,P_j,O_k, then

Wherein, the sum of n intermediate scheme triple, then,

SchemaList=O_m_list∪P_t_list

O_m_ list={ (S_i,P_j,O_k)|P_j∈ { rdf:type } &k=m }

In the present embodiment, it in the step S1, is defined when constructing TRM as follows:

The building TRM specifically includes the following steps:

In the present embodiment, it in the step S1, is defined when constructing link variable information as follows:

Define 7: the link variable LinkVar mode triple to be used to connect two former pieces in RDFS/OWL rule , it is described according to rule, the quantity of link variable is greater than 1；By the link variable information of each rule with<key, value> Form is stored in table Rulem_Table, and wherein key stores all mode triple items for former piece connection of the rule, Value stores the mode triple item of the rule conclusion part.

In the present embodiment, in the step S3 specifically: SPRM algorithm is right according to the type of TRM and link variable RDFS/OWL rule is classified；Fig. 2 is indicated using the form of RDFS- rule numbers, such as RDFS-1 when reference RDFS rule In the 1st rule；It is indicated in Fig. 3 when reference OWL Horst rule using the form of OWL- rule numbers, such as OWL-4 4th rule.Meanwhile a rule name label is distributed to every rule, rule name label is corresponding to the rule Title (for example, the rule name of rule OWL-4 is labeled as OWL-4).Specific rule classification is as follows:

5 seed types of RDFS/OWL rule it is as follows:

Class1: the rule that SchemaTriple is combined with SchemaTriple is directly made inferences (in Fig. 2 using TRM Regular RDFS-5, RDFS-11；Rule OWL-6, OWL-7 in Fig. 3, OWL-9, OWL-10, OWL-12a, OWL-12b, OWL-12c, OWL-13a、OWL-13b、OWL-13c)。

The rule that type 2:SchemaTriple is combined with InstanceTriple, makes inferences according to TRM, without knot Close link variable information (rule RDFS-7, RDFS-9 in Fig. 2；Rule OWL-11 in Fig. 3).

The rule that type 3:SchemaTriple is combined with InstanceTriple, link variable information are single argument, It needs that TRM is combined to make inferences (rule RDFS-2, RDFS-3 in Fig. 2).

The rule that type 4:SchemaTriple is combined with InstanceTriple, link variable information are single argument, Without making inferences (rule OWL-3, OWL-8a, OWL-8b, OWL-14a, OWL-14b in Fig. 3) in conjunction with TRM.

The rule that type 5:SchemaTriple is combined with InstanceTriple, link variable information are multivariable (rule OWL-1, OWL-2, OWL-4, OWL-15, OWL-16 in Fig. 3).

To any one rule is defined as: assuming that m rule is expressed as follows:

Rule_m:C_m1,C_m2,…,C_mk,…,C_mn→Result

If 1) Rule_m∈ { Class1, type 2 }, then

If 2) Rule_m∈ { type 3, type 4, type 5 }, then

Due to RDFS rule 1, only one conditional statement of 4a, 4b, 6,8,10 in Fig. 2, reasoning Parallel Algorithm is transformed Do not influence [9]；Rule 12 and 13 seldom occurs in RDF data in Fig. 2, in most of reasonings work without It discusses [5,7~12].Thus, RDFS rule as described herein does not consider these above-mentioned rules.Due to OWL rule 5a, 5b in Fig. 3 The parallelization of reasoning is not influenced, thus, OWL rule-based reasoning as described herein does not consider this two rule in OWL Horst.

In the present embodiment, in the step S4, RDFS/OWL rule is completed parallel in conjunction with MapReduce Computational frame Reasoning just completes reasoning regular in Class1 in the building TRM stage, is directed to class in the parallel inference stage of RDFS/OWL rule Type 2, type 3, type 4 and type 5 carry out distributed parallel inference, to realize all in a MapReduce task Reasoning of rule；

Input: key is line number；Value is example triple

The RDFS/OWL rule parallel reasoning algorithm in Reduce stage specifically:

Input: the output result of duplicate triple data phase is deleted

Output: key is output triple；Value is arbitrary value.

In the present embodiment, in the step S5, example triple is carried out by link variable information and regular marks Filtering reduces the transmission loss of a large amount of useless triple data in a distributed system, is subtracted by constructing transitive closure matrix The number of iterations of few reasoning, to improve the efficiency of reasoning；By inference as a result, deleting duplicate triple data in real time, To further increase the efficiency of successive iterations reasoning；

Input: key is line number；Value is triple

Output: the triple after output duplicate removal.

In the present embodiment, in conjunction with attached drawing and example to above method explanation specific as follows:

In the building TRM stage: SPRM algorithm is realized by building TRM has regular (Class1) reasoning of transitivity, to subtract Few reasoning the number of iterations accelerates the efficiency of whole reasoning.According to 1 and definition 4 is defined, obtains SchemaTriple and be loaded into memory In.Then in conjunction with defining 6, the TRM of each transitivity rule is constructed, in which: matrix dimensionality when CTRM is constructed is data set Total class number, matrix dimensionality when PTRM is constructed are total attribute number of data set.

By taking predicate is the SchemaTriple of subPropertyOf, equivalentProperty as an example, describe in PTRM The building process of SP_TRM and EP_TRM, and in conjunction with rule OWL-13a, OWL-13b, OWL- in rule RDFS-5 and Fig. 3 in Fig. 2 13c realizes the reasoning between SchemaTriple, specific steps are as follows:

Step S11: in conjunction with defining 4, obtain in all SchemaTriple respectively predicate be subPropertyOf and The mode triple data of equivalentProperty according to the SchemaTriple of acquisition and combine definition 6 available Using subPropertyOf and using equivalentProperty as the Boolean matrix of relationship.

Step S12: according to WarShall algorithm it is available it is all with subPropertyOf and with EquivalentProperty is transitive relation value in the Boolean matrix of relationship, to generate transitive closure relational matrix SP_ TRM,EP_TRM.The building of SP_TRM and EP_TRM are just completed by the two steps, the process of building is as shown in Figure 4.

In addition, the transmitting sexual norm triplet rules of rule RDFS-5 in Fig. 1 are realized in the building for presenting SP_TRM in Fig. 4 Reasoning；Meanwhile according to the SP_TRM of generation, if SP_TRM.t_ik=SP_TRM.t_ki, then can in conjunction with rule OWL-13c in Fig. 3 With the mode triple for generating new predicate as equivalentProperty, for constructing EP_TRM.For EP_TRM, ifThen combine rule OWL-13a, OWL-13b in Fig. 3 that new predicate can be generated as subPropertyOf Mode triple.Since the quantity of SchemaTriple is few, by constructing TRM in memory, class can be quickly finished Regular reasoning in type 1, and the number of iterations in RDFS/OWL rule-based reasoning stage MapReduce task can be reduced.

In the link variable information phase that building may connect: needing to pass through change between each former piece of RDFS/OWL rule The connection of amount, ability reasoning generate new triple data.Since transitivity rule can realize that transmitting pushes away by each TRM Reason, therefore SPRM algorithm is established connection according to SchemaTriple data and RDFS/OWL rule and is become for non-transitivity rule Scale, for storing the pattern information for meeting the rule former piece variable.If the link variable table of a certain rule is recorded as sky, Mean that all mode triples in this data set do not meet the rule former piece, thus the rule can be judged swash Rule living.The link variable that may be connected refers to the link variable of triggering rule, is each rules and regulations of RDFS/OWL rule A corresponding table (Rulem_Table) is then established, is stored in Hbase.Wherein, the line unit (key) in every table stores the rule The variable that may then connect, the variable that train value (value) may be used in the reasoning triple for rule generation.Building The link variable of RDFS/OWL rule is as shown in Figure 5.

In order to construct the process of link variable in definitely RDFS/OWL rule, to omit prefix in LUBM data set 20 triple data instances, wherein number 1~11 is mode triple data, and number 12~20 is example triple data, As shown in Figure 6.

In conjunction with defining 7, according to the 1~11 of table 2 SchemaTriple data, obtain specific in link variable storage table Mode data is as shown in Figure 7.As a result, according to 8 regular marks that can be readily available each rule are defined, such as in this instance, advise The then Flag_Rule of RDFS-3 (range rule)_RDFS-3=0.

In the parallel inference stage of RDFS/OWL rule: due to just completing regular in Class1 push away in the building TRM stage Reason, therefore distribution is carried out mainly for type 2, type 3, type 4 and type 5 in the parallel inference stage of RDFS/OWL rule Parallel inference, thus in a MapReduce task realize all rule a reasonings.For 2~type of type 4 Rule can obtain the reasoning results three by the parallel inference in Map stage according to regular marks model and link variable information Tuple data.For the rule of type 5 (multivariable), need to export by carrying out redundancy to the data for meeting Inference Conditions with true Protect the correctness of subsequent parallel data reasoning；The example triple output that same rule will be met is set as identical key (key), to ensure in the Reduce stage, it is same rule in strictly all rules former piece can according to link variable complete connection and it is defeated Result out.

Specific step is as follows for the parallel reasoning algorithm of RDFS/OWL rule:

The RDFS/OWL rule parallel reasoning algorithm in algorithm 1.Map stage:

Input: key is line number；Value is example triple

Output: key is the combination of the reasoning results triple or regular marks and link variable；Value is arbitrary value or band The triple item of label.

Specific algorithm is as shown in Table 1:

The RDFS/OWL rule parallel reasoning algorithm in algorithm 2.Reduce stage:

Input: the output result of algorithm 3

Output: key is output triple；Value is arbitrary value

Specific algorithm is as follows:

SPRM algorithm is as shown in Figure 8 for the Framework for Reasoning of the t times iteration of RDFS/OWL rule.

It is other kinds of for definitely RDFS/OWL rule since Class1 directly makes inferences in TRM matrix Reasoning shows the process of reasoning in conjunction with LUBM data slot in Fig. 6 by way of example, and the arbitrary value output in example takes “irrelevant”。

By taking rule RDFS-7 (subProperty example rule) in Fig. 2 as an example, show the reasoning process of 2 rule of type: root According to Flag_Rule known to definition 8_RDFS-7=1.It is attribute according to all parent attributes of the available headOf of SP_TRM worksFor,memberOf；There was only (FullProfessor7, headOf, Department0) available activation in Fig. 6 RDFS-7 rule, thus can with reasoning generate triple (FullProfessor7, worksFor, Department0) and (FullProfessor7,memberOf,Department0).If irrelevant can use arbitrary value, then in the key in Map stage Value to output shaped like:

<" ResultFlag "+(FullProfessor7, worksFor, Department0), irrelevant>

<" ResultFlag "+(FullProfessor7, memberOf, Department0), irrelevant>

By taking rule RDFS-2 in Fig. 2 (domain rule) as an example, show the reasoning process of 3 rule of type: can according to defining 8 Know Flag_Rule_RDFS-2=1；According to according to the example triple data inputted in Fig. 6, using the predicate of triple as line unit (rowkey) train value (value) in inquiry Domain_Table, example triple (Lecturer0, teacherOf, GraduateCourse5) enter the rule-based reasoning.Then, it is according to all parents of the available Faculty of SC_TRM Employee, thus, it is possible to reasoning generate triple (Lecturer0, type, Faculty) and (Lecturer0, type, Employee).Then exported in the key-value pair in Map stage are as follows:

<" ResultFlag "+(Lecturer0, type, Faculty), irrelevant>

<" ResultFlag "+(Lecturer0, type, Employee), irrelevant>

By taking rule OWL-8a (inverseOf) in Fig. 3 as an example, show the reasoning process of 4 rule of type: can according to defining 8 Know Flag_Rule_OWL-8a=1.According to the example triple data inputted in Fig. 6, using the predicate of triple as line unit (rowkey) Train value (value) in inquiry Inverse_8a_Table, only example triple (FullProfessor0, degreeFrom, University1) can obtain train value (value) of its predicate in Inverse_8a_Table is hasAlumnus, thus Reasoning generates triple (University1, hasAlumnus, FullProfessor0).Then exported in the key-value pair in Map stage Shaped like:

<" ResultFlag "+(University1, hasAlumnus, FullProfessor0), irrelevant>

By taking rule OWL-4 (transitiveProperty) in Fig. 3 as an example, show the reasoning process of 5 rule of type: according to SP_Flag_Ru le known to table 1_OWL-4=1.According to the example triple data inputted in Fig. 6, using the predicate of triple as line unit (rowkey) train value (value) in TransitiveProperty_Table, example triple are inquired (ResearchGroup0, subOrganizationOf, Department0) and (Department0, SubOrganizationOf, University0) column of its predicate in TransitiveProperty_Table can be obtained Being worth (value) is subOrganizationOf.The key-value pair in Map stage exports are as follows:

<oWL-4+subOrganizationOf+Department0,0+ResearchGroup0>

<oWL-4+subOrganizationOf+Department0,1+University0>

In the Reduce stage, the data that key (key) is OWL-4, value (values) root in the same key (key) are obtained It is subject or object that triple item caused by reasoning is obtained according to label 0 or 1, generates the triple data of final reasoning Are as follows: (ResearchGroup0, subOrganizationOf, University0).Regular marks and its respective value table such as 1 institute of table Show.

1 regular marks of table and its respective value table

The triple stage is repeated deleting: a large amount of duplicate triple numbers can be generated during executing distributing inference According to if the data de-duplication that this node and this reasoning are not generated, these identical triples will generate more Repetition triple data, cause the waste that system resource is meaningless.This algorithm by after each map execute combiner with And duplicate triple data are deleted after each reasoning, to reduce the expense of network, promote whole Reasoning Efficiency.

Algorithm 3.Combiner deletes the repeated data that this node map stage exports

Input: the output result of map function in algorithm 1

Specific algorithm is as follows:

This method is completed to delete by a Map/Reduce repeats triple algorithm, the specific steps are as follows:

Algorithm 4., which is deleted, repeats triple algorithm

map(key,value)

Input: key is line number；Value is triple

Output: the triple after output duplicate removal

Specific algorithm is as follows:

reduce(key,valueList)

Input: key is the key of map output；Value is corresponding valueList

Output: the triple after output duplicate removal

In the present embodiment, this algorithm analysis is as follows: the complexity and centralized algorithm complexity of SPRM algorithm Analysis it is less identical, by the time complexity under the worst case of SPRM algorithm be divided into the Map stage time complexity and The time complexity in Reduce stage.Assuming that the scale of data set is N, wherein mode triple is n, In MapReduce the Map stage and line number be k, Reduce stage incoming example triple number is m, the Reduce stage And line number is t.

Since SPRM algorithm is classified RDFS/OWL rule, after TRM building is completed, SPRM algorithm is parallel again 2~type of type, 5 rule-based reasoning is executed, therefore the complexity of SPRM algorithm is analyzed according to the type of RDFS/OWL rule, Wherein the complexity of a reasoning iteration is respectively as follows:

1) complexity of Class1 rule-based reasoning is mainly the WarShall algorithm complexity for constructing TRM, is O (n³)

2) complexity of 2 rule-based reasoning of type:

The time complexity in Map stage are as follows: O (n²*N/k)

The time complexity in Reduce stage are as follows: O (m/t)

3) complexity of 3 rule-based reasoning of type:

The time complexity in Map stage are as follows: O (n*N/k)

The time complexity in Reduce stage are as follows: O (m/t)

4) complexity of 4 rule-based reasoning of type:

The time complexity in Map stage are as follows: O (N/k)

The time complexity in Reduce stage are as follows: O (m/t)

5) 5 rule-based reasoning complexity of type:

The time complexity in Map stage are as follows: O (n*N/k)

The time complexity in Reduce stage are as follows: O (m²/t)

In summary, the time complexity of reasoning iteration of SPRM algorithm is O (n)=O (n³)+O(n²*N/k)+O(n* N/k)+O(m/t)+O(m²/t).Compared to data set, the number of mode triple n is considerably less, it can be considered that its magnitude is Constant, and SPRM uses efficient strobe utility, reduction m as much as possible, to reduce overall time complexity.

The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims

1. a kind of RDF data distributed semantic parallel inference method, it is characterised in that: specifically includes the following steps:

Step S1: loading mode triple constructs TRM, while according to RDFS/OWL rule, and constructing may connection in every rule Link variable information；

Step S3: being divided into two kinds of forms of single argument and multivariable for link variable, will according to the type of TRM and link variable RDFS/OWL rule is divided into 5 seed types, separately designs different inference schemes；

Step S4: to Flag_Rule_m=1 rule executes the parallel inference of RDFS/OWL rule, and exports intermediate result；

Step S5: the repetition triple in intermediate result is deleted；

Step S6: if in intermediate result including new SchemaTriple, TRM and regular marks model are updated, returns to step Rapid S4；Otherwise, terminate；

Wherein, in the step S1, the relevant parameter of triple is defined as follows:

Wherein, the sum of n intermediate scheme triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈SchemaTriple (1)

Wherein, n indicates the sum of example triple；

If v ∈ { S_i,P_j,O_k,Then,

(S_i,P_j,O_k)∈InstanceTriple

3: triple type mark Flag_TripleType is defined, for identity mode triple and example triple, described three Tuple type flag F lag_TripleType is defined as follows:

Wherein, n indicate triple total then,

Triple item type mark TripleItem_Flag, for identifying subject or object in triple item, the ternary Group item type mark is defined as follows:

Wherein, n indicates the sum of triple；If v ∈ { S_i,P_j,O_k, then

4: mode ternary Groups List SchemaList is defined, for obtaining the mode triplet sets of identical predicate or object, The mode ternary Groups List SchemaList is defined as follows:

Wherein, the sum of n intermediate scheme triple, then,

SchemaList=O_m_list∪P_t_list

Wherein, O_m_ list expression meets predicate P_j∈'s { rdf:type } and with identical object triplet sets, with the guest Language name；P_t_ list expression meets predicateAll triplet sets with identical predicate, with this Predicate name, is defined as follows:

O_m_ list={ (S_i,P_j,O_k)|P_j∈ { rdf:type } &k=m }

Wherein in the step S1, it is defined when constructing TRM as follows:

Definition 5: the transitive closure of digraph is defined as: the transitive closure of the vertex a n digraph can be defined as a n rank cloth You are matrix T={ t_ik(1≤i, k≤n), if there are an active path, t by vertex i to vertex k_ik=1, otherwise t_ik=0；

Definition 6: transitive closure relational matrix Transitive closure relation matrix, referred to as TRM, in conjunction with fixed Justice 4 defines 5 and WarShall algorithm building TRM；TRM is indicated in the transitive closure of digraph, using class or attribute as vertex, with The predicate of triple is the Boolean matrix T={ t of relationship_ik}；

IfThen P_j_TRM.t_ik=1, expression predicate is P_j SchemaTriple subject S_iTo object O_kThere are direct relation, otherwise P_j_TRM.t_ik=0；

Wherein, TRM points are class relational matrix CTRM and relation on attributes matrix PTRM；Wherein, CTRM according to predicate subClassOf, EquivalentClass, sameAs define SC_TRM, EC_TRM, SAC_TRM respectively；PTRM is according to predicate SubPropertyOf, equivalentProperty, sameAs define SP_TRM, EP_TRM, SAP_TRM respectively；Due to predicate For the SchemaTriple relationship containing class and relation on attributes of sameAs, relationship is separated in the building of TRM, constructs SAC_ respectively TRM and SAP_TRM；

The building TRM specifically includes the following steps:

Step S12: it is obtained according to WarShall algorithm and all is with subPropertyOf and with equivalentProperty Transitive relation value in the Boolean matrix of relationship completes SP_TRM to generate transitive closure relational matrix SP_TRM, EP_TRM With the building of EP_TRM；

Wherein, it in the step S1, is defined when constructing link variable information as follows:

Define 7: the link variable LinkVar mode triple item to be used to connect two former pieces in RDFS/OWL rule, root It is described according to rule, the quantity of link variable is greater than 1；By the link variable information of each rule with<key, value>form It is stored in table Rulem_Table, wherein key stores all mode triple items for former piece connection of the rule, and value is deposited Store up the mode triple item of the rule conclusion part.

2. a kind of RDF data distributed semantic parallel inference method according to claim 1, it is characterised in that:

In the step S3 specifically: using SPRM algorithm according to the type of TRM and link variable, carried out to RDFS/OWL rule Classification, quote RDFS rule when using RDFS- rule numbers form, quote OWL Horst rule when using OWL- rule compile Number form；A rule name label is distributed to every rule, rule name label is title corresponding to the rule；

5 seed types of RDFS/OWL rule it is as follows:

The rule that type 2:SchemaTriple is combined with InstanceTriple, makes inferences according to TRM, without the company of combination Connect variable information；

The rule that type 3:SchemaTriple is combined with InstanceTriple, link variable information are single argument, are needed It is made inferences in conjunction with TRM；

The rule that type 4:SchemaTriple is combined with InstanceTriple, link variable information are single argument, are not necessarily to It is made inferences in conjunction with TRM；

To any one rule is defined as: assuming that m rule is expressed as follows:

Rule_m:C_m1,C_m2,…,C_mk,…,C_mn→Result

When for Flag_Rule_m=0, then the condition for meeting the rule is implied the absence of, without activating the rule；Assuming that P_jFor Rule_mIn some former piece predicate, wherein P_j∈{subClassOf,subPropertyOf,sameAs, EquivalentClass, equivalentProperty }, P_jCorresponding TRM is set as P_j_TRM；

If 1) Rule_m∈ { Class1, type 2 }, then

If 2) Rule_m∈ { type 3, type 4, type 5 }, then

3. a kind of RDF data distributed semantic parallel inference method according to claim 1, it is characterised in that:

In the step S4, the reasoning of RDFS/OWL rule is completed parallel in conjunction with MapReduce Computational frame, in building TRM rank Section just completes reasoning regular in Class1, is directed to type 2, type 3,4 and of type in the parallel inference stage of RDFS/OWL rule Type 5 carries out distributed parallel inference, to realize a reasoning of all rules in a MapReduce task；

For the rule of type 2, type 3 and type 4, according to regular marks model and link variable information, by the Map stage Parallel inference, the reasoning results triple data can be obtained；For the rule of type 5, by the number for meeting Inference Conditions According to redundancy output is carried out to ensure the correctness of subsequent parallel data reasoning, i.e., the example triple for meeting same rule is exported It is set as identical key key, to ensure in the Reduce stage, the strictly all rules former piece in same rule can be according to link variable It completes to connect and exports result；

Input: key is line number；Value is example triple

Output: key is the combination of the reasoning results triple or regular marks and link variable；Value is arbitrary value or tape label Triple item；

The RDFS/OWL rule parallel reasoning algorithm in Reduce stage specifically:

Input: the output result of duplicate triple data phase is deleted

Output: key is output triple；Value is arbitrary value.

4. a kind of RDF data distributed semantic parallel inference method according to claim 1, it is characterised in that:

In the step S5, example triple is filtered by link variable information and regular marks, is reduced a large amount of useless Triple data transmission loss in a distributed system, the number of iterations of reasoning is reduced by construction transitive closure matrix, To improve the efficiency of reasoning；It is subsequent to further increase by inference as a result, deleting duplicate triple data in real time The efficiency of iteration reasoning；

Input: key is line number；Value is triple

Output: the triple after output duplicate removal.