CN108763451A - Streaming RDF data parallel reasoning algorithm based on Spark Streaming - Google Patents

Streaming RDF data parallel reasoning algorithm based on Spark Streaming Download PDF

Info

Publication number
CN108763451A
CN108763451A CN201810521793.XA CN201810521793A CN108763451A CN 108763451 A CN108763451 A CN 108763451A CN 201810521793 A CN201810521793 A CN 201810521793A CN 108763451 A CN108763451 A CN 108763451A
Authority
CN
China
Prior art keywords
data
reasoning
rule
streaming
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810521793.XA
Other languages
Chinese (zh)
Other versions
CN108763451B (en
Inventor
汪璟玢
陈晓曦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810521793.XA priority Critical patent/CN108763451B/en
Publication of CN108763451A publication Critical patent/CN108763451A/en
Application granted granted Critical
Publication of CN108763451B publication Critical patent/CN108763451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming.OWL Horst inference rules are combined first, build corresponding regular link variable relation table;As input data, mode data and instance data to input sort out processing and store arriving corresponding Redis clusters the data that batch new data and previous reasoning in Iterative Parallel reasoning stage timing acquisition Streaming data flows generate;Then, according to regular link variable relation table, judge the rule that this reasoning can activate, inference data is generated in conjunction with corresponding instance data;Finally, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.The present invention reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;Design rule link variable relation table stores the new data generated in data and reasoning, ensure that the completeness of algorithm;The storage scheme for devising example triple is traded space for time in conjunction with the characteristic of Redis, realizes the quick reading of instance data.

Description

Streaming RDF data parallel reasoning algorithm based on Spark Streaming
Technical field
The invention belongs to magnanimity streaming RDF data inference technology fields, and in particular to one kind being based on Spark Streaming Streaming RDF data parallel reasoning algorithm.
Background technology
The existing inference method based on OWL rules is the static data collection of the processing fixed size of centralization mostly, by In the limitation of centralized processing mechanism, existing algorithm inefficiency when handling the real time data of magnanimity.In response to it is this not The disconnected demand increased, many scholars study and propose the RDF streaming reasoning frameworks of oneself:Barbieri DF [1] et al. are proposed Increment reasoning algorithm based on streaming and rich background knowledge, the algorithm add the temporal information that expires into each RDF triples, when When new stream data reaches, calculating is made inferences to new data, and terminates the clear fact and deletes invalid ternary Group.IDRM [2] algorithm efficiently expansible can carry out RDFS reasonings to incremental data.ChevalierJ [3] et al. puts forward A kind of effective increment reasoning device (Slider), the reasoning device make inferences it by the internal characteristics in semantic data stream, To realize the expansible batch processing reasoning device for being directed to stream data.Leaf is happy new et al. to propose the base in conjunction with pseudo- bilateral network In the streaming RDF data parallel reasoning algorithm PRAS [4] of Spark platforms.
The main target of RDF streaming reasonings is how the stream data received to be stored and be made inferences.IDRM is calculated Method carries out special modeling for RDFS rules, so for the inefficient of OWL Horst rule-based reasonings.Slider just for RDFS rules are designed, so for complicated OWL Horst rule-based reasonings and being not suitable for.PRAS algorithms are pseudo- double by designing The reasoning of stream data is carried out to network, but since the consumption of pseudo- two-way network communication is larger, handles a large amount of stream Formula data it is inefficient.
What the streaming RDF data reasoning algorithm of combination Spark platforms proposed by the present invention to be solved is exactly stream data Storage and two problems of reasoning.In order to ensure the completeness of the reasoning results, how to design the storage scheme of stream data is herein Emphasis.The present invention stores large-scale RDF data using Redis data-base clusters, in conjunction with distributive type Computational frame SparkStreaming studies and realizes the distribution of streaming RDF data by the MapReduce computation module in platform Parallel inference scheme, solve the problems, such as in face of a large amount of stream datas can not Rapid Inference and the reasoning results it is incomplete.These are right There is good reference in the reasoning of mass data.
Bibliography:
[1]Barbieri D F,Braga D,Ceri S,et al.Incremental reasoning on streams and rich background knowledge[C]//Extended Semantic Web Conference.Springer Berlin Heidelberg,2010:1-15.
[2]Liu B,Wu L,Li J,et al.Exploiting Incremental Reasoning in Healthcare Based on Hadoop and Amazon Cloud[C]//Semantic Cities Workshop at AAAI Conference on Artificial Intelligence (AAAI’14).2014.
[3]Chevalier J,Subercaze J,Gravier C,et al.Slider:an Efficient Incremental Reasoner[C]//Proceedings of the 2015ACM SIGMOD International Conference on Management of Data.ACM,2015:1081-1086.
[4] distributed parallel reasoning algorithm [J] the computer system applications of the happy new, Wang Jing cellophane of leaf based on Spark, 2017,26(05):97-104.
Invention content
The purpose of the present invention is to provide a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming, The algorithm reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;Design rule connection becomes Magnitude relation table stores the new data generated in data and reasoning, ensure that the completeness of algorithm;Devise example triple Storage scheme trade space for time in conjunction with the characteristic of Redis, realize the quick reading of instance data.
To achieve the above object, the technical scheme is that:A kind of streaming RDF numbers based on Spark Streaming According to parallel reasoning algorithm, include the following steps:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel The data that batch new data and previous reasoning in reasoning stage timing acquisition Streaming data flow generate are as input number According to mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding reality Number of cases is according to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
In an embodiment of the present invention, in step S1, the mode data, that is, pattern triple data, the instance data That is example triple data.
In an embodiment of the present invention, the example triple data, which are stored to the mode in Redis clusters, is:According to The characteristics of Redis clusters, uses<Key, value>Form, using in triple subject S, predicate P, object O as Key, i.e., respectively with<S,(P,O)>,<P,(S,O)>With<O,(S,P)>Form there are in three tables.
In an embodiment of the present invention, the pattern triple data, which are stored to the mode in Redis clusters, is:By OWL Each rule of Horst inference rules generates a corresponding table Rulem_Table, is stored in Redis;Using rule as table Name is divided into 2 classes according to the difference of each regular link variable number:Without link variable rule, have the rule of link variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is deposited as value Storage;
There is the rule of link variable:
(1) for the rule of single link variable, the storage mode in Redis is using P as key, and only there are one key Value;
(2) for the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with <S,<O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
In an embodiment of the present invention, step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning obtain knot By then skipping to step S23;If necessary to combine example triple data, then the example triple number needed with each rule According to link variable as key, corresponding example triple data are gone for from the example table being previously stored, if can find pair The example triple data answered, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is defeated Go out to set<Si,(Pj,Ok)>In, and skip to step S22.
In an embodiment of the present invention, step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate if the data received are sky Algorithm;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for The reading of next reasoning.
Compared to the prior art, the invention has the advantages that:
1, the number of tasks for reducing MapReduce carries out the iteration reasoning of stream data in conjunction with Spark;
2, design rule link variable relation table stores the new data generated in data and reasoning, ensure that algorithm Completeness;
3, the storage scheme for devising example triple is traded space for time in conjunction with the characteristic of Redis, realizes example The quick reading of data.
Description of the drawings
Fig. 1 is inventive algorithm overall framework figure.
Fig. 2 is OWL Horst of the present invention rules.
Specific implementation mode
Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.
The present invention provides a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming, including it is as follows Step:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel The data that batch new data and previous reasoning in reasoning stage timing acquisition Streaming data flow generate are as input number According to mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding reality Number of cases is according to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
In step S1, the mode data, that is, pattern triple data, the instance data, that is, example triple data.
The example triple data are stored to the mode in Redis clusters:The characteristics of according to Redis clusters, uses< Key, value>Form, using in triple subject S, predicate P, object O as key, i.e., respectively with<S,(P,O)>,< P,(S,O)>With<O,(S,P)>Form there are in three tables.The pattern triple data are stored to the side in Redis clusters Formula is:Each rule of OWL Horst inference rules is generated into a corresponding table Rulem_Table, is stored in Redis; Using rule as table name, 2 classes are divided into according to the difference of each regular link variable number:Without link variable rule, have company Connect the rule of variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is deposited as value Storage;
There is the rule of link variable:
(1) for the rule of single link variable, the storage mode in Redis is using P as key, and only there are one key Value;
(2) for the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with <S,<O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
Step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning obtain knot By then skipping to step S23;If necessary to combine example triple data, then the example triple number needed with each rule According to link variable as key, corresponding example triple data are gone for from the example table being previously stored, if can find pair The example triple data answered, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is defeated Go out to set<Si,(Pj,Ok)>In, and skip to step S22.
Step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate if the data received are sky Algorithm;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for The reading of next reasoning.
It is the specific implementation process of the present invention below.
The streaming RDF data parallel reasoning algorithm (PSRH algorithms) based on Spark Streaming of the present invention, algorithm master It is divided into the regular link variable relation table of structure and Iterative Parallel reasoning two benches, wherein Iterative Parallel reasoning includes stream data Sort out two parts of reasoning with OWL Horst rules.The algorithm combines OWL Horst inference rules, structure corresponding first Regular link variable relation table;Batch new data in Iterative Parallel reasoning stage timing acquisition Streaming data flows with And the data that previous reasoning generates, as input data, mode data and instance data to input are carried out sorting out processing and be stored To corresponding Redis clusters;Then, according to regular link variable relation table, judge the rule that this reasoning can activate, in conjunction with Corresponding instance data generates inference data.Finally, the duplicate data of this reasoning generation and storage, current iteration reasoning are deleted Terminate.The overall framework figure of PSRH algorithms is as shown in Figure 1, specific algorithm process is as follows.
1, RDF stream datas store
The characteristics of PSRH algorithms are according to Redis clusters, in conjunction with OWL Horst regular (as shown in Figure 2) and RDF ontology numbers According to being built to inference pattern.Triple data are then obtained by Spark Streaming frames in real time, and will wherein Pattern triple data come with example triple data separation.
1.1, example triple store designs
Since instance data is very huge, and in reasoning process, the link variable used in specific rules may be Any one in Subject-Verb object in example triple, therefore the search efficiency of example triple is just reduced. The characteristics of this chapter algorithms algorithm is according to Redis clusters uses<Key, value>Form, in triple subject, predicate, Object respectively as key, respectively with<S,(P,O)>,<P,(S,O)>With<O,(S,P)>Form there are in three tables.Such one Come, either go in example table to search with which of Subject, Predicate and Object keyword, according to corresponding key and combines Redis clusters Characteristic, the lookup time of example triple is shorten to O (1), has achieved the effect that trade space for time.
1.2, pattern triple store designs
For pattern triple, we devise regular link variable relation table to store.
The connection by variable is needed between each former piece of OWL rules, could generate new triple data.Due to calculating Method is Stream Processing algorithm, therefore pattern triple data can not possibly be as processing static data, and disposably all load is completed, So we establish link variable table according to pattern triple data in the algorithm, before recording and meet part in reasoning process The rule of part, in this way when new data next time is into ingress, so that it may according to the content of table, to continue last do not complete Reasoning.
Each rule of OWL rules is generated into a corresponding table (Rulem_Table), is stored in Redis.With rule As table name, 2 classes are divided into according to the difference of each regular link variable number:Without link variable rule, have link variable Rule.
We construct different tables according to different classes of respectively:
(1) it is not necessarily to the rule of link variable
The former piece of rule without link variable<S,P,O>In P mostly all be with transmit, similar, reciprocal property, Therefore our storage modes in Redis are using P as key,<S,O>It is stored as value.With OWLHorst rules 12a (v owl:EquivalentClass w=>v rdf:SubClassOfw for), example such as table 1-1:
The storage table structure of table 1-1 OWL Horst rules 12a
Table name Line unit (key) Train value (value)
Rule12a_Table owl:equivalentClass <v,w>
Likewise, the table structure of the rule for other connectionless variables is provided herein, such as table 1-2.
The storage table structure of table 1-2 OWL Horst rules 12a, 13a, 13b
Table name Line unit (key) Train value (value)
Rule12b_Table owl:equivalentClass <v,w>
Rule13a_Table owl:equivalentProperty <v,w>
Rule13b_Table owl:equivalentProperty <v,w>
(2) there is the rule of link variable
For the rule of single link variable, the storage mode in Redis only takes there are one key using P as key Value.With (the p rdf of rule 3 of OWL:type owl:SymmetricProperty, vp u=>U p v) for, wherein connection becomes Amount is p, example such as table 1-3:
The storage table structure of table 1-3 OWL Horst rules 3
Table name Line unit (key) Train value (value)
Rule3_Table owl:SymmetricProperty p
Above-mentioned example indicates in this stream data that, there are the triple that p is link variable, type is SymmetricProperty。
For the complicated rule for there are multiple link variables, the characteristics of due to stream data, a pair can not be simply used< key,value>To store all pattern triples that a rule is related to.It is transmitted through the pattern come in order not to omit each data flow Triple data, also for convenient follow-up connection, with regular former piece pattern triple<S, P, O>P as key, S, O with<S,< O, 0>>, and<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object, deposit in this way No matter the effect of storage is to ensure that link variable is subject or object, it can be transferred through key and found within O (1) times.
Meanwhile in table with<LinkVar,<a,b,…>>The current rule of form storage in matched completion pattern The link variable that former piece is included.Thus it is the corresponding values of LinkVar that key can be preferentially searched in reasoning process, if root Corresponding example triple is found from example table according to the value of LinkVar, then can go out result with immediate reasoning.With the rule of OWL 16(v owl:allValuesFrom u,v owl:onProperty p,w rdf:Type v, wp x=>x rdf:type u)
For, wherein link variable is v, p and w, storage example such as table 1-4:
The storage table structure of table 1-4 OWL Horst rules 16
If it is sky that four line units in Rule16_Table, which have the value of some, then illustrate needed for the rule The pattern former piece wanted is incomplete, this rule can not make inferences, and can save the inference time for searching instance data, to Improve Reasoning Efficiency.
Other are contained with the rule of link variable, their table structure is provided herein, such as table 1-5 to 1-14.
The storage table structure of table 1-5 OWL Horst rules 1
Table name Line unit (key) Train value (value)
Rule1_Table owl:FuncionalProperty p
The storage table structure of table 1-6 OWL Horst rules 2
Table name Line unit (key) Train value (value)
Rule2_Table owl:InverseProperty p
The storage table structure of table 1-7 OWL Horst rules 4
Table name Line unit (key) Train value (value)
Rule4_Table owl:TransitiveProperty p
The storage table structure of table 1-8 OWL Horst rules 8a
Table name Line unit (key) Train value (value)
Rule8a_Table p q
The storage table structure of table 1-9 OWL Horst rules 8b
Table name Line unit (key) Train value (value)
Rule8b_Table p q
The storage table structure of table 1-10 OWL Horst rules 12c
Table name (Rulem_Table) Line unit (key) Train value (value)
Rule12c_Table v w
The storage table structure of table 1-11 OWL Horst rules 13c
Table name Line unit (key) Train value (value)
Rule13c_Table v w
The storage table structure of table 1-12 OWL Horst rules 14a
The storage table structure of table 1-13 OWL Horst rules 14b
The storage table structure of table 1-14 OWL Horst rules 15
1.3,owl:SameAs pertinent triplets design Storages
For predicate owl:The triple of sameAs, due to owl:Subject (object) associated by sameAs is either mould Formula data can also be instance data, therefore this section is to include owl:The rule of sameAs designs different regular link variables Relation table, specific example such as table 1-15 to 1-19:
The storage table structure of table 1-15 OWL Horst rules 6
Table name Line unit (key) Train value (value)
Rule6_Table owl:sameAs <v,w>
The storage table structure of table 1-16 OWL Horst rules 7
Table name Line unit (key) Train value (value)
Rule7_Table v w
The storage table structure of table 1-17 OWL Horst rules 9
The storage table structure of table 1-18 OWL Horst rules 10
The storage table structure of table 1-19 OWL Horst rules 11
1.4, stream data storage is realized
The stage completes the classification storage of data by parallelization mode, batch in timing acquisition Streaming data flows Measure the flow data new_data and data itr_data of previous reasoning generation.Then it checks in new_data or itr_data Triple is then directly stored according to 1.1 design if it is example triple;The triple is then matched if it is pattern triple Corresponding inference rule, and according to the design of regular link variable relation table in 1.2, store all pattern triples.
1 parallel data of algorithm stores algorithm ParallelStoreForHorst
Input:The new triple data (itr_data) that streaming triple data (new_data), previous reasoning generate
Output:It is empty
With 1.3 rule, 6 (v owl:SameAs w=>w owl:SameAs v) and 1.2 rules 16
(v owl:allValuesFrom u,v owl:onProperty p,w rdf:Type v, w p x=>x rdf: type u)
For, the pseudo-code in the stage is described as follows:
Since the calculation that each rule needs LinkVar will be given below according to the definition of specific rules in the generation of LinkVar Method:
The acquisition algorithm of LinkVar in rule 11
The acquisition algorithm of LinkVar in regular 14a
The acquisition algorithm of LinkVar in rule 15
The acquisition algorithm of LinkVar in rule 16
2, the parallelization reasoning stage
2.1, the Map stages:Data reasoning
The Map stages mainly complete data reasoning, are as follows:
Step1 traversal rule link variable relation tables, judge which rule can activate.
Step2 for the rule that can activate, if regular former piece do not need example triple can immediate reasoning obtain Go out conclusion, then skips to Step3;If necessary to combine example triple, then the company of the example triple needed with each rule Variable is connect as key, corresponding example triple is gone for from the example table being previously stored, if corresponding example three can be found Tuple then enters Step3, the otherwise repeatedly judgement work of Step2.If all data are all completed to calculate, terminate algorithm.
Step3 executes current rule-based reasoning, obtains inference conclusion, the triple that reasoning is generated<Si,Pj,Ok>It is output to Set<Si,(Pj,Ok)>In, and skip to Step2.
2 data reasoning algorithm ParallelReasoningForHorst of algorithm
Input:Regular link variable relation table (Rulei_Table), example triple store (S_Table, P_Table, O_Table)
Output:The new triple that reasoning generates
The overall code of algorithm is described as follows:
With 6 (v owl of rule in 1.3:SameAs w=>w owl:SameAs v) for, pseudo-code is described as follows:
The reasoning pseudocode of the above-mentioned rule for being connectionless variable, next description has the rule of link variable, in 1.2 16 (v owl of rule:allValuesFrom u,v owl:onProperty p,w rdf:Type v, w p x=>x rdf: type u)
For, pseudo-code is described as follows and (in pseudo-code, defines the object that s16 is Set_16):
Similar to the rule 16 of multi-connection variable, by being built in Redis clusters<key,value>Form stores mould Formula triple can quickly be attached the matching of variable;For associated example triple, example ternary in Redis is utilized Group storage strategy, relevant example triple is found out by the value of link variable, due to Redis according to key search when Between be O (1), so Reasoning Efficiency greatly improves.
2.2, the Reduce stages:Duplicate removal and storage
The data that the Reduce stages mainly generate reasoning preserve.For the triple that reasoning generates, it is stored in It is entitled in Redis clusters " set of itr_data ", and the triple to repeating carries out deduplication operation, then will " itr_ A part of the data " set as next reasoning input data.Data deduplication proposed in this paper and storage algorithm specific steps are such as Under:
Step1. receive Map stage reasonings generate new triplet sets (including SchemaTriple and InstanceTriple), if the data received are sky, terminate algorithm;
Step2. the new triplet sets received, the triple that removal wherein repeats are traversed;
Step3. it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for down The reading of secondary reasoning.
Algorithm 3.Reduce algorithms (DuplicateRemovalForHorst)
Input set<Si,(Pj,Ok)>
Export itr_data.
Inventive algorithm reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;If Regular link variable relation table is counted to store the new data generated in data and reasoning, ensure that the completeness of algorithm;Design The storage scheme of example triple is traded space for time in conjunction with the characteristic of Redis, realizes the quick reading of instance data.
The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.

Claims (6)

1. a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming, which is characterized in that including walking as follows Suddenly:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel reasoning The data that batch new data in stage timing acquisition Streaming data flow and previous reasoning generate as input data, Mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding instance number According to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
2. the streaming RDF data parallel reasoning algorithm according to claim 1 based on Spark Streaming, feature It is, in step S1, the mode data, that is, pattern triple data, the instance data, that is, example triple data.
3. the streaming RDF data parallel reasoning algorithm according to claim 2 based on Spark Streaming, feature It is, the example triple data storage is to the mode in Redis clusters:The characteristics of according to Redis clusters, uses< Key, value>Form, using in triple subject S, predicate P, object O as key, i.e., respectively with< S, (P,O) >,< P, (S,O) >With<O, (S,P) >Form there are in three tables.
4. the streaming RDF data parallel reasoning algorithm according to claim 3 based on Spark Streaming, feature It is, the pattern triple data storage is to the mode in Redis clusters:By each of OWL Horst inference rules Rule generates a corresponding table Rulem_Table, is stored in Redis;Using rule as table name, become according to each rule connection The difference of amount number is divided into 2 classes:Without link variable rule, have the rule of link variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is stored as value;
There is the rule of link variable:
(1)For the rule of single link variable, the storage mode in Redis only takes there are one key using P as key Value;
(2)For the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with<S,< O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
5. the streaming RDF data parallel reasoning algorithm according to claim 4 based on Spark Streaming, feature Be, step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning draw a conclusion, Then skip to step S23;If necessary to combine example triple data, then with the example triple data of each rule needs Link variable goes for corresponding example triple data as key, from the example table being previously stored, if can find corresponding Example triple data, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed to count It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is output to Set< Si, (Pj,Ok)>In, and skip to step S22.
6. the streaming RDF data parallel reasoning algorithm according to claim 5 based on Spark Streaming, feature Be, step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate algorithm if the data received are sky;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, for next time The reading of reasoning.
CN201810521793.XA 2018-05-28 2018-05-28 Streaming RDF data parallel reasoning algorithm based on Spark Streaming Active CN108763451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810521793.XA CN108763451B (en) 2018-05-28 2018-05-28 Streaming RDF data parallel reasoning algorithm based on Spark Streaming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810521793.XA CN108763451B (en) 2018-05-28 2018-05-28 Streaming RDF data parallel reasoning algorithm based on Spark Streaming

Publications (2)

Publication Number Publication Date
CN108763451A true CN108763451A (en) 2018-11-06
CN108763451B CN108763451B (en) 2022-03-11

Family

ID=64006259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810521793.XA Active CN108763451B (en) 2018-05-28 2018-05-28 Streaming RDF data parallel reasoning algorithm based on Spark Streaming

Country Status (1)

Country Link
CN (1) CN108763451B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120272142A1 (en) * 1999-10-28 2012-10-25 Augme Technologies, Inc. System and method for adding targeted content in a web page
CN104778277A (en) * 2015-04-30 2015-07-15 福州大学 RDF (radial distribution function) data distributed type storage and querying method based on Redis
CN105912721A (en) * 2016-05-05 2016-08-31 福州大学 Rdf data distributed semantic parallel reasoning method
CN106874425A (en) * 2017-01-23 2017-06-20 福州大学 Real time critical word approximate search algorithm based on Storm
CN106980901A (en) * 2017-04-15 2017-07-25 福州大学 Streaming RDF data parallel reasoning algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120272142A1 (en) * 1999-10-28 2012-10-25 Augme Technologies, Inc. System and method for adding targeted content in a web page
CN104778277A (en) * 2015-04-30 2015-07-15 福州大学 RDF (radial distribution function) data distributed type storage and querying method based on Redis
CN105912721A (en) * 2016-05-05 2016-08-31 福州大学 Rdf data distributed semantic parallel reasoning method
CN106874425A (en) * 2017-01-23 2017-06-20 福州大学 Real time critical word approximate search algorithm based on Storm
CN106980901A (en) * 2017-04-15 2017-07-25 福州大学 Streaming RDF data parallel reasoning algorithm

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
RONG GU等: "Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark", 《2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM》 *
叶怡新等: "基于Spark的分布式并行推理算法", 《计算机系统应用》 *
赵慧含等: "基于Spark的OWL语义规则并行化推理算法", 《计算机应用研究》 *

Also Published As

Publication number Publication date
CN108763451B (en) 2022-03-11

Similar Documents

Publication Publication Date Title
Li et al. Algebraic approach to dynamics of multivalued networks
Šikšnys et al. Aggregating and disaggregating flexibility objects
CN102073700B (en) Discovery method of complex network community
Liang et al. Express supervision system based on NodeJS and MongoDB
CN106383830B (en) Data retrieval method and equipment
Taloba et al. Developing an efficient spectral clustering algorithm on large scale graphs in spark
Bhardwaj et al. How does topology influence gradient propagation and model performance of deep networks with densenet-type skip connections?
Gao et al. GraphNAS++: Distributed architecture search for graph neural networks
Singh et al. Performance Measure of Similis and FPGrowth Algo rithm
Chen et al. Neighborhood convolutional graph neural network
Wang et al. An asynchronous distributed-memory optimization solver for two-stage stochastic programming problems
Liang et al. A novel combined model based on VMD and IMODA for wind speed forecasting
CN108763451A (en) Streaming RDF data parallel reasoning algorithm based on Spark Streaming
CN115001978B (en) Cloud tenant virtual network intelligent mapping method based on reinforcement learning model
Wei et al. Fixed-time passivity of coupled quaternion-valued neural networks with multiple delayed couplings
CN106980901B (en) Streaming RDF data parallel reasoning algorithm
Zhang et al. A new sequential prediction framework with spatial-temporal embedding
Zhiqiang et al. Entity alignment method for power data knowledge graph of semantic and structural information
Xu et al. Graph Neural Ordinary Differential Equations-based method for Collaborative Filtering
Zhao et al. A web service composition method based on merging genetic algorithm and ant colony algorithm
CN111177188A (en) Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge
Wen et al. Parallel naïve Bayes regression model-based collaborative filtering recommendation algorithm and its realisation on Hadoop for big data
Xia et al. Carpooling algorithm with the common departure
Farhan Answering Shortest Path Distance Queries in Large Complex Networks
CN104063516B (en) Based on the social networks rubbish filtering method that distributed matrix characteristics of decomposition is extracted

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant