CN108763451A - Streaming RDF data parallel reasoning algorithm based on Spark Streaming - Google Patents
Streaming RDF data parallel reasoning algorithm based on Spark Streaming Download PDFInfo
- Publication number
- CN108763451A CN108763451A CN201810521793.XA CN201810521793A CN108763451A CN 108763451 A CN108763451 A CN 108763451A CN 201810521793 A CN201810521793 A CN 201810521793A CN 108763451 A CN108763451 A CN 108763451A
- Authority
- CN
- China
- Prior art keywords
- data
- reasoning
- rule
- streaming
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/046—Forward inferencing; Production systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming.OWL Horst inference rules are combined first, build corresponding regular link variable relation table;As input data, mode data and instance data to input sort out processing and store arriving corresponding Redis clusters the data that batch new data and previous reasoning in Iterative Parallel reasoning stage timing acquisition Streaming data flows generate;Then, according to regular link variable relation table, judge the rule that this reasoning can activate, inference data is generated in conjunction with corresponding instance data;Finally, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.The present invention reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;Design rule link variable relation table stores the new data generated in data and reasoning, ensure that the completeness of algorithm;The storage scheme for devising example triple is traded space for time in conjunction with the characteristic of Redis, realizes the quick reading of instance data.
Description
Technical field
The invention belongs to magnanimity streaming RDF data inference technology fields, and in particular to one kind being based on Spark Streaming
Streaming RDF data parallel reasoning algorithm.
Background technology
The existing inference method based on OWL rules is the static data collection of the processing fixed size of centralization mostly, by
In the limitation of centralized processing mechanism, existing algorithm inefficiency when handling the real time data of magnanimity.In response to it is this not
The disconnected demand increased, many scholars study and propose the RDF streaming reasoning frameworks of oneself:Barbieri DF [1] et al. are proposed
Increment reasoning algorithm based on streaming and rich background knowledge, the algorithm add the temporal information that expires into each RDF triples, when
When new stream data reaches, calculating is made inferences to new data, and terminates the clear fact and deletes invalid ternary
Group.IDRM [2] algorithm efficiently expansible can carry out RDFS reasonings to incremental data.ChevalierJ [3] et al. puts forward
A kind of effective increment reasoning device (Slider), the reasoning device make inferences it by the internal characteristics in semantic data stream,
To realize the expansible batch processing reasoning device for being directed to stream data.Leaf is happy new et al. to propose the base in conjunction with pseudo- bilateral network
In the streaming RDF data parallel reasoning algorithm PRAS [4] of Spark platforms.
The main target of RDF streaming reasonings is how the stream data received to be stored and be made inferences.IDRM is calculated
Method carries out special modeling for RDFS rules, so for the inefficient of OWL Horst rule-based reasonings.Slider just for
RDFS rules are designed, so for complicated OWL Horst rule-based reasonings and being not suitable for.PRAS algorithms are pseudo- double by designing
The reasoning of stream data is carried out to network, but since the consumption of pseudo- two-way network communication is larger, handles a large amount of stream
Formula data it is inefficient.
What the streaming RDF data reasoning algorithm of combination Spark platforms proposed by the present invention to be solved is exactly stream data
Storage and two problems of reasoning.In order to ensure the completeness of the reasoning results, how to design the storage scheme of stream data is herein
Emphasis.The present invention stores large-scale RDF data using Redis data-base clusters, in conjunction with distributive type Computational frame
SparkStreaming studies and realizes the distribution of streaming RDF data by the MapReduce computation module in platform
Parallel inference scheme, solve the problems, such as in face of a large amount of stream datas can not Rapid Inference and the reasoning results it is incomplete.These are right
There is good reference in the reasoning of mass data.
Bibliography:
[1]Barbieri D F,Braga D,Ceri S,et al.Incremental reasoning on streams
and rich background knowledge[C]//Extended Semantic Web Conference.Springer
Berlin Heidelberg,2010:1-15.
[2]Liu B,Wu L,Li J,et al.Exploiting Incremental Reasoning in
Healthcare Based on Hadoop and Amazon Cloud[C]//Semantic Cities Workshop at
AAAI Conference on Artificial Intelligence (AAAI’14).2014.
[3]Chevalier J,Subercaze J,Gravier C,et al.Slider:an Efficient
Incremental Reasoner[C]//Proceedings of the 2015ACM SIGMOD International
Conference on Management of Data.ACM,2015:1081-1086.
[4] distributed parallel reasoning algorithm [J] the computer system applications of the happy new, Wang Jing cellophane of leaf based on Spark,
2017,26(05):97-104.
Invention content
The purpose of the present invention is to provide a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming,
The algorithm reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;Design rule connection becomes
Magnitude relation table stores the new data generated in data and reasoning, ensure that the completeness of algorithm;Devise example triple
Storage scheme trade space for time in conjunction with the characteristic of Redis, realize the quick reading of instance data.
To achieve the above object, the technical scheme is that:A kind of streaming RDF numbers based on Spark Streaming
According to parallel reasoning algorithm, include the following steps:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel
The data that batch new data and previous reasoning in reasoning stage timing acquisition Streaming data flow generate are as input number
According to mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding reality
Number of cases is according to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
In an embodiment of the present invention, in step S1, the mode data, that is, pattern triple data, the instance data
That is example triple data.
In an embodiment of the present invention, the example triple data, which are stored to the mode in Redis clusters, is:According to
The characteristics of Redis clusters, uses<Key, value>Form, using in triple subject S, predicate P, object O as
Key, i.e., respectively with<S,(P,O)>,<P,(S,O)>With<O,(S,P)>Form there are in three tables.
In an embodiment of the present invention, the pattern triple data, which are stored to the mode in Redis clusters, is:By OWL
Each rule of Horst inference rules generates a corresponding table Rulem_Table, is stored in Redis;Using rule as table
Name is divided into 2 classes according to the difference of each regular link variable number:Without link variable rule, have the rule of link variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is deposited as value
Storage;
There is the rule of link variable:
(1) for the rule of single link variable, the storage mode in Redis is using P as key, and only there are one key
Value;
(2) for the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with
<S,<O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
In an embodiment of the present invention, step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning obtain knot
By then skipping to step S23;If necessary to combine example triple data, then the example triple number needed with each rule
According to link variable as key, corresponding example triple data are gone for from the example table being previously stored, if can find pair
The example triple data answered, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed
It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is defeated
Go out to set<Si,(Pj,Ok)>In, and skip to step S22.
In an embodiment of the present invention, step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate if the data received are sky
Algorithm;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for
The reading of next reasoning.
Compared to the prior art, the invention has the advantages that:
1, the number of tasks for reducing MapReduce carries out the iteration reasoning of stream data in conjunction with Spark;
2, design rule link variable relation table stores the new data generated in data and reasoning, ensure that algorithm
Completeness;
3, the storage scheme for devising example triple is traded space for time in conjunction with the characteristic of Redis, realizes example
The quick reading of data.
Description of the drawings
Fig. 1 is inventive algorithm overall framework figure.
Fig. 2 is OWL Horst of the present invention rules.
Specific implementation mode
Below in conjunction with the accompanying drawings, technical scheme of the present invention is specifically described.
The present invention provides a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming, including it is as follows
Step:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel
The data that batch new data and previous reasoning in reasoning stage timing acquisition Streaming data flow generate are as input number
According to mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding reality
Number of cases is according to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
In step S1, the mode data, that is, pattern triple data, the instance data, that is, example triple data.
The example triple data are stored to the mode in Redis clusters:The characteristics of according to Redis clusters, uses<
Key, value>Form, using in triple subject S, predicate P, object O as key, i.e., respectively with<S,(P,O)>,<
P,(S,O)>With<O,(S,P)>Form there are in three tables.The pattern triple data are stored to the side in Redis clusters
Formula is:Each rule of OWL Horst inference rules is generated into a corresponding table Rulem_Table, is stored in Redis;
Using rule as table name, 2 classes are divided into according to the difference of each regular link variable number:Without link variable rule, have company
Connect the rule of variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is deposited as value
Storage;
There is the rule of link variable:
(1) for the rule of single link variable, the storage mode in Redis is using P as key, and only there are one key
Value;
(2) for the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with
<S,<O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
Step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning obtain knot
By then skipping to step S23;If necessary to combine example triple data, then the example triple number needed with each rule
According to link variable as key, corresponding example triple data are gone for from the example table being previously stored, if can find pair
The example triple data answered, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed
It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is defeated
Go out to set<Si,(Pj,Ok)>In, and skip to step S22.
Step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate if the data received are sky
Algorithm;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for
The reading of next reasoning.
It is the specific implementation process of the present invention below.
The streaming RDF data parallel reasoning algorithm (PSRH algorithms) based on Spark Streaming of the present invention, algorithm master
It is divided into the regular link variable relation table of structure and Iterative Parallel reasoning two benches, wherein Iterative Parallel reasoning includes stream data
Sort out two parts of reasoning with OWL Horst rules.The algorithm combines OWL Horst inference rules, structure corresponding first
Regular link variable relation table;Batch new data in Iterative Parallel reasoning stage timing acquisition Streaming data flows with
And the data that previous reasoning generates, as input data, mode data and instance data to input are carried out sorting out processing and be stored
To corresponding Redis clusters;Then, according to regular link variable relation table, judge the rule that this reasoning can activate, in conjunction with
Corresponding instance data generates inference data.Finally, the duplicate data of this reasoning generation and storage, current iteration reasoning are deleted
Terminate.The overall framework figure of PSRH algorithms is as shown in Figure 1, specific algorithm process is as follows.
1, RDF stream datas store
The characteristics of PSRH algorithms are according to Redis clusters, in conjunction with OWL Horst regular (as shown in Figure 2) and RDF ontology numbers
According to being built to inference pattern.Triple data are then obtained by Spark Streaming frames in real time, and will wherein
Pattern triple data come with example triple data separation.
1.1, example triple store designs
Since instance data is very huge, and in reasoning process, the link variable used in specific rules may be
Any one in Subject-Verb object in example triple, therefore the search efficiency of example triple is just reduced.
The characteristics of this chapter algorithms algorithm is according to Redis clusters uses<Key, value>Form, in triple subject, predicate,
Object respectively as key, respectively with<S,(P,O)>,<P,(S,O)>With<O,(S,P)>Form there are in three tables.Such one
Come, either go in example table to search with which of Subject, Predicate and Object keyword, according to corresponding key and combines Redis clusters
Characteristic, the lookup time of example triple is shorten to O (1), has achieved the effect that trade space for time.
1.2, pattern triple store designs
For pattern triple, we devise regular link variable relation table to store.
The connection by variable is needed between each former piece of OWL rules, could generate new triple data.Due to calculating
Method is Stream Processing algorithm, therefore pattern triple data can not possibly be as processing static data, and disposably all load is completed,
So we establish link variable table according to pattern triple data in the algorithm, before recording and meet part in reasoning process
The rule of part, in this way when new data next time is into ingress, so that it may according to the content of table, to continue last do not complete
Reasoning.
Each rule of OWL rules is generated into a corresponding table (Rulem_Table), is stored in Redis.With rule
As table name, 2 classes are divided into according to the difference of each regular link variable number:Without link variable rule, have link variable
Rule.
We construct different tables according to different classes of respectively:
(1) it is not necessarily to the rule of link variable
The former piece of rule without link variable<S,P,O>In P mostly all be with transmit, similar, reciprocal property,
Therefore our storage modes in Redis are using P as key,<S,O>It is stored as value.With OWLHorst rules 12a
(v owl:EquivalentClass w=>v rdf:SubClassOfw for), example such as table 1-1:
The storage table structure of table 1-1 OWL Horst rules 12a
Table name | Line unit (key) | Train value (value) |
Rule12a_Table | owl:equivalentClass | <v,w> |
Likewise, the table structure of the rule for other connectionless variables is provided herein, such as table 1-2.
The storage table structure of table 1-2 OWL Horst rules 12a, 13a, 13b
Table name | Line unit (key) | Train value (value) |
Rule12b_Table | owl:equivalentClass | <v,w> |
Rule13a_Table | owl:equivalentProperty | <v,w> |
Rule13b_Table | owl:equivalentProperty | <v,w> |
(2) there is the rule of link variable
For the rule of single link variable, the storage mode in Redis only takes there are one key using P as key
Value.With (the p rdf of rule 3 of OWL:type owl:SymmetricProperty, vp u=>U p v) for, wherein connection becomes
Amount is p, example such as table 1-3:
The storage table structure of table 1-3 OWL Horst rules 3
Table name | Line unit (key) | Train value (value) |
Rule3_Table | owl:SymmetricProperty | p |
Above-mentioned example indicates in this stream data that, there are the triple that p is link variable, type is
SymmetricProperty。
For the complicated rule for there are multiple link variables, the characteristics of due to stream data, a pair can not be simply used<
key,value>To store all pattern triples that a rule is related to.It is transmitted through the pattern come in order not to omit each data flow
Triple data, also for convenient follow-up connection, with regular former piece pattern triple<S, P, O>P as key, S, O with<S,<
O, 0>>, and<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object, deposit in this way
No matter the effect of storage is to ensure that link variable is subject or object, it can be transferred through key and found within O (1) times.
Meanwhile in table with<LinkVar,<a,b,…>>The current rule of form storage in matched completion pattern
The link variable that former piece is included.Thus it is the corresponding values of LinkVar that key can be preferentially searched in reasoning process, if root
Corresponding example triple is found from example table according to the value of LinkVar, then can go out result with immediate reasoning.With the rule of OWL
16(v owl:allValuesFrom u,v owl:onProperty p,w rdf:Type v, wp x=>x rdf:type u)
For, wherein link variable is v, p and w, storage example such as table 1-4:
The storage table structure of table 1-4 OWL Horst rules 16
If it is sky that four line units in Rule16_Table, which have the value of some, then illustrate needed for the rule
The pattern former piece wanted is incomplete, this rule can not make inferences, and can save the inference time for searching instance data, to
Improve Reasoning Efficiency.
Other are contained with the rule of link variable, their table structure is provided herein, such as table 1-5 to 1-14.
The storage table structure of table 1-5 OWL Horst rules 1
Table name | Line unit (key) | Train value (value) |
Rule1_Table | owl:FuncionalProperty | p |
The storage table structure of table 1-6 OWL Horst rules 2
Table name | Line unit (key) | Train value (value) |
Rule2_Table | owl:InverseProperty | p |
The storage table structure of table 1-7 OWL Horst rules 4
Table name | Line unit (key) | Train value (value) |
Rule4_Table | owl:TransitiveProperty | p |
The storage table structure of table 1-8 OWL Horst rules 8a
Table name | Line unit (key) | Train value (value) |
Rule8a_Table | p | q |
The storage table structure of table 1-9 OWL Horst rules 8b
Table name | Line unit (key) | Train value (value) |
Rule8b_Table | p | q |
The storage table structure of table 1-10 OWL Horst rules 12c
Table name (Rulem_Table) | Line unit (key) | Train value (value) |
Rule12c_Table | v | w |
The storage table structure of table 1-11 OWL Horst rules 13c
Table name | Line unit (key) | Train value (value) |
Rule13c_Table | v | w |
The storage table structure of table 1-12 OWL Horst rules 14a
The storage table structure of table 1-13 OWL Horst rules 14b
The storage table structure of table 1-14 OWL Horst rules 15
1.3,owl:SameAs pertinent triplets design Storages
For predicate owl:The triple of sameAs, due to owl:Subject (object) associated by sameAs is either mould
Formula data can also be instance data, therefore this section is to include owl:The rule of sameAs designs different regular link variables
Relation table, specific example such as table 1-15 to 1-19:
The storage table structure of table 1-15 OWL Horst rules 6
Table name | Line unit (key) | Train value (value) |
Rule6_Table | owl:sameAs | <v,w> |
The storage table structure of table 1-16 OWL Horst rules 7
Table name | Line unit (key) | Train value (value) |
Rule7_Table | v | w |
The storage table structure of table 1-17 OWL Horst rules 9
The storage table structure of table 1-18 OWL Horst rules 10
The storage table structure of table 1-19 OWL Horst rules 11
1.4, stream data storage is realized
The stage completes the classification storage of data by parallelization mode, batch in timing acquisition Streaming data flows
Measure the flow data new_data and data itr_data of previous reasoning generation.Then it checks in new_data or itr_data
Triple is then directly stored according to 1.1 design if it is example triple;The triple is then matched if it is pattern triple
Corresponding inference rule, and according to the design of regular link variable relation table in 1.2, store all pattern triples.
1 parallel data of algorithm stores algorithm ParallelStoreForHorst
Input:The new triple data (itr_data) that streaming triple data (new_data), previous reasoning generate
Output:It is empty
With 1.3 rule, 6 (v owl:SameAs w=>w owl:SameAs v) and 1.2 rules 16
(v owl:allValuesFrom u,v owl:onProperty p,w rdf:Type v, w p x=>x rdf:
type u)
For, the pseudo-code in the stage is described as follows:
Since the calculation that each rule needs LinkVar will be given below according to the definition of specific rules in the generation of LinkVar
Method:
The acquisition algorithm of LinkVar in rule 11
The acquisition algorithm of LinkVar in regular 14a
The acquisition algorithm of LinkVar in rule 15
The acquisition algorithm of LinkVar in rule 16
2, the parallelization reasoning stage
2.1, the Map stages:Data reasoning
The Map stages mainly complete data reasoning, are as follows:
Step1 traversal rule link variable relation tables, judge which rule can activate.
Step2 for the rule that can activate, if regular former piece do not need example triple can immediate reasoning obtain
Go out conclusion, then skips to Step3;If necessary to combine example triple, then the company of the example triple needed with each rule
Variable is connect as key, corresponding example triple is gone for from the example table being previously stored, if corresponding example three can be found
Tuple then enters Step3, the otherwise repeatedly judgement work of Step2.If all data are all completed to calculate, terminate algorithm.
Step3 executes current rule-based reasoning, obtains inference conclusion, the triple that reasoning is generated<Si,Pj,Ok>It is output to
Set<Si,(Pj,Ok)>In, and skip to Step2.
2 data reasoning algorithm ParallelReasoningForHorst of algorithm
Input:Regular link variable relation table (Rulei_Table), example triple store (S_Table, P_Table,
O_Table)
Output:The new triple that reasoning generates
The overall code of algorithm is described as follows:
With 6 (v owl of rule in 1.3:SameAs w=>w owl:SameAs v) for, pseudo-code is described as follows:
The reasoning pseudocode of the above-mentioned rule for being connectionless variable, next description has the rule of link variable, in 1.2
16 (v owl of rule:allValuesFrom u,v owl:onProperty p,w rdf:Type v, w p x=>x rdf:
type u)
For, pseudo-code is described as follows and (in pseudo-code, defines the object that s16 is Set_16):
Similar to the rule 16 of multi-connection variable, by being built in Redis clusters<key,value>Form stores mould
Formula triple can quickly be attached the matching of variable;For associated example triple, example ternary in Redis is utilized
Group storage strategy, relevant example triple is found out by the value of link variable, due to Redis according to key search when
Between be O (1), so Reasoning Efficiency greatly improves.
2.2, the Reduce stages:Duplicate removal and storage
The data that the Reduce stages mainly generate reasoning preserve.For the triple that reasoning generates, it is stored in
It is entitled in Redis clusters " set of itr_data ", and the triple to repeating carries out deduplication operation, then will " itr_
A part of the data " set as next reasoning input data.Data deduplication proposed in this paper and storage algorithm specific steps are such as
Under:
Step1. receive Map stage reasonings generate new triplet sets (including SchemaTriple and
InstanceTriple), if the data received are sky, terminate algorithm;
Step2. the new triplet sets received, the triple that removal wherein repeats are traversed;
Step3. it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, is used for down
The reading of secondary reasoning.
Algorithm 3.Reduce algorithms (DuplicateRemovalForHorst)
Input set<Si,(Pj,Ok)>
Export itr_data.
Inventive algorithm reduces the number of tasks of MapReduce, and the iteration reasoning of stream data is carried out in conjunction with Spark;If
Regular link variable relation table is counted to store the new data generated in data and reasoning, ensure that the completeness of algorithm;Design
The storage scheme of example triple is traded space for time in conjunction with the characteristic of Redis, realizes the quick reading of instance data.
The above are preferred embodiments of the present invention, all any changes made according to the technical solution of the present invention, and generated function is made
When with range without departing from technical solution of the present invention, all belong to the scope of protection of the present invention.
Claims (6)
1. a kind of streaming RDF data parallel reasoning algorithm based on Spark Streaming, which is characterized in that including walking as follows
Suddenly:
Step S1, in conjunction with OWL Horst inference rules, corresponding regular link variable relation table is built;In Iterative Parallel reasoning
The data that batch new data in stage timing acquisition Streaming data flow and previous reasoning generate as input data,
Mode data and instance data to input sort out processing and store arriving corresponding Redis clusters;
Step S2, according to regular link variable relation table, the rule that this reasoning can activate is judged, in conjunction with corresponding instance number
According to generation inference data;
Step S3, the duplicate data and storage, current iteration reasoning for deleting this reasoning generation terminate.
2. the streaming RDF data parallel reasoning algorithm according to claim 1 based on Spark Streaming, feature
It is, in step S1, the mode data, that is, pattern triple data, the instance data, that is, example triple data.
3. the streaming RDF data parallel reasoning algorithm according to claim 2 based on Spark Streaming, feature
It is, the example triple data storage is to the mode in Redis clusters:The characteristics of according to Redis clusters, uses<
Key, value>Form, using in triple subject S, predicate P, object O as key, i.e., respectively with< S, (P,O)
>,< P, (S,O) >With<O, (S,P) >Form there are in three tables.
4. the streaming RDF data parallel reasoning algorithm according to claim 3 based on Spark Streaming, feature
It is, the pattern triple data storage is to the mode in Redis clusters:By each of OWL Horst inference rules
Rule generates a corresponding table Rulem_Table, is stored in Redis;Using rule as table name, become according to each rule connection
The difference of amount number is divided into 2 classes:Without link variable rule, have the rule of link variable;
Rule without link variable:Storage mode in Redis using P as key,<S,O>It is stored as value;
There is the rule of link variable:
(1)For the rule of single link variable, the storage mode in Redis only takes there are one key using P as key
Value;
(2)For the complicated rule for having multiple link variables, the storage mode in Redis using P as key, S, O with<S,<
O, 0>>,<O,<S, 1>>Map patterns be stored in value, wherein 0 indicate key be subject, 1 indicate key be object.
5. the streaming RDF data parallel reasoning algorithm according to claim 4 based on Spark Streaming, feature
Be, step S2's the specific implementation process is as follows:
Step S21, traversal rule link variable relation table judges the rule that can be activated;
Step S22, for the rule that can be activated, if you do not need to example triple data can immediate reasoning draw a conclusion,
Then skip to step S23;If necessary to combine example triple data, then with the example triple data of each rule needs
Link variable goes for corresponding example triple data as key, from the example table being previously stored, if can find corresponding
Example triple data, then enter step S23, otherwise the repeatedly judgement work of step S22;If all data are all completed to count
It calculates, then terminates algorithm;
Step S23, current rule-based reasoning is executed, inference conclusion is obtained, the triple that reasoning is generated<Si,Pj,Ok>It is output to
Set< Si, (Pj,Ok)>In, and skip to step S22.
6. the streaming RDF data parallel reasoning algorithm according to claim 5 based on Spark Streaming, feature
Be, step S3's the specific implementation process is as follows:
Step S31, the new triplet sets that receiving step S2 reasonings generate terminate algorithm if the data received are sky;
Step S32, the new triplet sets received, the triple that removal wherein repeats are traversed;
Step S33, it by the triplet sets after duplicate removal using itr_data as set name, is stored in Redis clusters, for next time
The reading of reasoning.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810521793.XA CN108763451B (en) | 2018-05-28 | 2018-05-28 | Streaming RDF data parallel reasoning algorithm based on Spark Streaming |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810521793.XA CN108763451B (en) | 2018-05-28 | 2018-05-28 | Streaming RDF data parallel reasoning algorithm based on Spark Streaming |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108763451A true CN108763451A (en) | 2018-11-06 |
CN108763451B CN108763451B (en) | 2022-03-11 |
Family
ID=64006259
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810521793.XA Active CN108763451B (en) | 2018-05-28 | 2018-05-28 | Streaming RDF data parallel reasoning algorithm based on Spark Streaming |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763451B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120272142A1 (en) * | 1999-10-28 | 2012-10-25 | Augme Technologies, Inc. | System and method for adding targeted content in a web page |
CN104778277A (en) * | 2015-04-30 | 2015-07-15 | 福州大学 | RDF (radial distribution function) data distributed type storage and querying method based on Redis |
CN105912721A (en) * | 2016-05-05 | 2016-08-31 | 福州大学 | Rdf data distributed semantic parallel reasoning method |
CN106874425A (en) * | 2017-01-23 | 2017-06-20 | 福州大学 | Real time critical word approximate search algorithm based on Storm |
CN106980901A (en) * | 2017-04-15 | 2017-07-25 | 福州大学 | Streaming RDF data parallel reasoning algorithm |
-
2018
- 2018-05-28 CN CN201810521793.XA patent/CN108763451B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120272142A1 (en) * | 1999-10-28 | 2012-10-25 | Augme Technologies, Inc. | System and method for adding targeted content in a web page |
CN104778277A (en) * | 2015-04-30 | 2015-07-15 | 福州大学 | RDF (radial distribution function) data distributed type storage and querying method based on Redis |
CN105912721A (en) * | 2016-05-05 | 2016-08-31 | 福州大学 | Rdf data distributed semantic parallel reasoning method |
CN106874425A (en) * | 2017-01-23 | 2017-06-20 | 福州大学 | Real time critical word approximate search algorithm based on Storm |
CN106980901A (en) * | 2017-04-15 | 2017-07-25 | 福州大学 | Streaming RDF data parallel reasoning algorithm |
Non-Patent Citations (3)
Title |
---|
RONG GU等: "Cichlid: Efficient Large Scale RDFS/OWL Reasoning with Spark", 《2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM》 * |
叶怡新等: "基于Spark的分布式并行推理算法", 《计算机系统应用》 * |
赵慧含等: "基于Spark的OWL语义规则并行化推理算法", 《计算机应用研究》 * |
Also Published As
Publication number | Publication date |
---|---|
CN108763451B (en) | 2022-03-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Algebraic approach to dynamics of multivalued networks | |
Šikšnys et al. | Aggregating and disaggregating flexibility objects | |
CN102073700B (en) | Discovery method of complex network community | |
Liang et al. | Express supervision system based on NodeJS and MongoDB | |
CN106383830B (en) | Data retrieval method and equipment | |
Taloba et al. | Developing an efficient spectral clustering algorithm on large scale graphs in spark | |
Bhardwaj et al. | How does topology influence gradient propagation and model performance of deep networks with densenet-type skip connections? | |
Gao et al. | GraphNAS++: Distributed architecture search for graph neural networks | |
Singh et al. | Performance Measure of Similis and FPGrowth Algo rithm | |
Chen et al. | Neighborhood convolutional graph neural network | |
Wang et al. | An asynchronous distributed-memory optimization solver for two-stage stochastic programming problems | |
Liang et al. | A novel combined model based on VMD and IMODA for wind speed forecasting | |
CN108763451A (en) | Streaming RDF data parallel reasoning algorithm based on Spark Streaming | |
CN115001978B (en) | Cloud tenant virtual network intelligent mapping method based on reinforcement learning model | |
Wei et al. | Fixed-time passivity of coupled quaternion-valued neural networks with multiple delayed couplings | |
CN106980901B (en) | Streaming RDF data parallel reasoning algorithm | |
Zhang et al. | A new sequential prediction framework with spatial-temporal embedding | |
Zhiqiang et al. | Entity alignment method for power data knowledge graph of semantic and structural information | |
Xu et al. | Graph Neural Ordinary Differential Equations-based method for Collaborative Filtering | |
Zhao et al. | A web service composition method based on merging genetic algorithm and ant colony algorithm | |
CN111177188A (en) | Rapid massive time sequence data processing method based on aggregation edge and time sequence aggregation edge | |
Wen et al. | Parallel naïve Bayes regression model-based collaborative filtering recommendation algorithm and its realisation on Hadoop for big data | |
Xia et al. | Carpooling algorithm with the common departure | |
Farhan | Answering Shortest Path Distance Queries in Large Complex Networks | |
CN104063516B (en) | Based on the social networks rubbish filtering method that distributed matrix characteristics of decomposition is extracted |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |