CN108959318A - Distributed keyword query method based on RDF graph - Google Patents

Distributed keyword query method based on RDF graph Download PDF

Info

Publication number
CN108959318A
CN108959318A CN201710376203.4A CN201710376203A CN108959318A CN 108959318 A CN108959318 A CN 108959318A CN 201710376203 A CN201710376203 A CN 201710376203A CN 108959318 A CN108959318 A CN 108959318A
Authority
CN
China
Prior art keywords
rdf
vertex
sentence
subgraph
crt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710376203.4A
Other languages
Chinese (zh)
Inventor
郑志蕴
丁阳
李钝
张行进
王振飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN201710376203.4A priority Critical patent/CN108959318A/en
Publication of CN108959318A publication Critical patent/CN108959318A/en
Pending legal-status Critical Current

Links

Abstract

The present invention designs a kind of distributed keyword query method based on RDF graph, belongs to information retrieval field.The present invention converts RDF sentence figure for RDF data figure first;Secondly conditional depth-priority-searching method and simulated annealing are utilized, RDF sentence figure is split according to two most basic principles of data balancing between subgraph after edge cut set minimum and segmentation;The RDF sentence figure after segmentation is finally refined as RDF data figure, the vertex cut set of RDF graph is obtained, and utilize reverse search algorithm and Hadoop distributed computing framework, realizes the efficient quick search of keyword.The present invention efficiently solves traditional algorithm to the limitation of large-scale dataset segmentation efficiency in the case where guaranteeing the atomicity and semantic integrity of RDF data, and greatly improves the search efficiency of keyword.

Description

Distributed keyword query method based on RDF graph
Technical field
The present invention relates to the distributed keyword query methods based on RDF graph, belong to information retrieval field.
Background technique
Figure is a kind of generally existing data structure, is widely used in every field.Keyword based on RDF graph structure is looked into Inquiry is a current research hotspot, it allows user in the case where not using labyrinth query language, is obtained efficient Query result.Current most of search algorithms are realized under centralized environment, i.e., keyword query can only be on single machine Processing.In fact, it is very time-consuming for carrying out keyword query on single machine as the scale of RDF graph constantly expands, therefore There are highly important theoretical value and realistic meaning to figure processing and storage under distributed environment.
Currently used keyword query technology is the digraph that RDF data is expressed as to a tape label, the top in figure Subject and object in the corresponding triple of point, predicate is side.Make to figure the related information that RDF data had both been able to maintain between data Semantic information is not lost again, therefore the query processing of RDF data is usually transformed into figure matching problem, i.e., on RDF data figure Positioning includes the steiner tree (Steiner Tree) of keyword.Since the connectivity of diagram data inherently and figure calculate performance Strong coupling feature out, so need to just reduce each son of distributed treatment as far as possible to realize the efficient parallel processing of figure The degree of coupling between figure, then effective figure segmentation is exactly to realize the important means of decoupling.Current figure partitioning algorithm mainly has two A principle: first is that improving the connectivity inside subgraph, the connectivity between subgraph is reduced;Second is that considering the equilibrium of subgraph scale Property, guarantee that the data scale of each subgraph is balanced as far as possible, biggish inclination do not occur.Wherein Kim et al. proposes the side SBV-Cut Method, this method determine balance vertex by the method for random walk, are split according to balance top pair graph, so that each height It include approximately equal vertex in figure;And guarantee that the quantity at point of contact is minimum by expansion and modularity two attributes. However, the number on vertex is identical (or close) in each subgraph after being divided using this method, but each vertex is closed The quantity on the side of connection is different, to cause the imbalance of data between subgraph.Simultaneously with the continuous expansion of figure scale, pass The limitation of system algorithm (such as KL, DFEP, VSEP) in figure scale, so that these algorithms can not meet data in explosive The demand of growth.
Summary of the invention
The deficiency of the present invention regarding to the issue above proposes the distributed keyword query method based on RDF graph.This method exists In the case where guaranteeing segmentation sub-graph data balance, the efficient segmentation of big data figure is realized, while can be realized the quick of keyword Inquiry, to meet the query demand of user.Technical solution implementation steps are as follows:
(1) RDF sentence figure is converted by RDF graph:
It is concentrated in RDF data, each basic statement of RDF triple as RDF data indicates one of resource on Web A integrated semantic, therefore when being split to RDF data collection, it is necessary to assure the atomicity of each RDF triple;Simultaneously Blank node simply indicates the presence of something or other, without specified overall identification so can only share same in local use The RDF triple of blank node expresses the common context of blank node, if such RDF triple is separated, table The context reached will be destroyed.Therefore RDF sentence s is made of RDF triple and meets the following conditions:
Any two RDF member ancestral in condition 1:s be it is attachable, i.e., when two RDF member ancestrals share the same blank section When point, the two RDF member ancestrals are attachable;
Any one RDF member ancestral in condition 2:s cannot connect with the RDF member ancestral not in s;
The present invention uses (s, p, o) to describe a RDF triple, and is abbreviated as t, with s (t), p (t) and o (t) difference table Show main body, predicate and the object in triple, wherein RDF digraph and RDF sentence figure are defined as follows:
RDF digraph: setting G=(V, E, L) indicates the RDF digraph of a tape label, wherein by main body in RDF triple With the vertex set V={ v | v ∈ s (T) ∪ o (T) } of object composition, the directed edge of the predicate composition of relationship between subject and object Set E={<s (t), o (t)>| t ∈ T }, object vertex is directed toward by main body vertex in the direction on side.L is the set of label, L= Lv∪Lp, wherein LvIndicate vertex label, LpIndicate predicate label.
RDF sentence figure: G is sets=(S, E, l, w) indicates a RDF sentence figure, it is a vertex weighted-graph, Each of middle figure node corresponds to a RDF sentence, and S indicates the vertex in figure, and E indicates the side in figure.If si, sj∈ S and si ≠sj,t′∈sjAnd there is t.s=t ' .s or t.o=t ' .s then siAnd sjIt is associated, i.e. (si, sj)∈E.L is One label function, forl(si) it is the subject comprising the sentence, the local label collection of predicate or object It closes;W is the weight on vertex,w(si) it is equal to the number of RDF member ancestral included in the sentence.
(2) based on the side partitioning algorithm (REC) of RDF sentence figure:
According to two principles of figure segmentation it is found that one is divided it is necessary to reduce as far as possible in order to reduce Internet traffic Side quantity.Therefore the present invention carries out depth-first traversal using the smallest principle of Vertex Degree, since this method easily sinks into office Portion's optimal solution, so avoiding the situation using simulated annealing;Two in order to realize the balances of data between subgraph, by RDF graph In side be evenly distributed in each subgraph, i.e., the number on side in each RDF subgraphIf G will be schemedsIt is divided into k Subgraph indicates the subgraph where each vertex using function P, wherein different subgraph is indicated with { 1,2...k }, then label Subgraph where j=p (s) indicates sentence s is Sj, wherein SjMeet following two conditionAnd Si∩Sj=Φ,Wherein, the specific steps of the side partitioning algorithm of RDF sentence figure are as follows:
Step A: the number on side in input RDF subgraph
According to RDF sentence figure Gs, input the number e on side in RDF subgraph;
Step B: setting " access flag "
The smallest vertex of degree in RDF sentence figure is solved, is put it into set D, and is each vertex setting one A " access flag ";
Step C: the vertex in traversal set D is split RDF sentence figure
C1: the access order on vertex in given set D, if currently in RDF sentence figure all not visited vertex power The sum of weight | Gs| > e then sequentially selects a not visited vertex from D;
C2: the vertex is added in set S while the state on the vertex being set to has accessed;
C3: if all vertex weights of current subgraph | S | < e and set N (S, the G on the vertex adjacent with set Ss)! =null, then from N (S, Gs) one vertex v of middle random selection;
C4: if the vertex v is not visited and removes vertex adjacent with vertex v in S | N (v, Gs) S | number most Small, i.e. the degree on the vertex is minimum.Jump to step C2;
C5: if in current collection S vertex weight | S | < e, and N (S, Gs)=null then returns to the top of recent visit Point, and jump to step C4;
C6: if at this time in current collection S vertex weight | S | > e jumps to step C1;
D step: optimal solution is sought in simulated annealing
D1: the sequence of vertex access in given set D;
D2: the number on the side divided according to such access order is calculated;
D3: the access order on two vertex in random replacement D, if divide at this time while number be less than in step D2 while Number illustrates that new result is better than old as a result, then replacing old access order with new access order;
D4: simulated annealing function is finally called;
(3) keyword query is realized using Hadoop distributed computing framework
I step: divide the determination of the vertex cut set of RDF graph
The vertex cut set of segmentation RDF graph can be obtained in the intersection for solving RDF subgraph;
II step: the determination of directed tree
The definition of given query result tree (RT), if the set with the matched vertex keyword K is M (K)={ m1, m2...ms, query result is defined as the directed tree for meeting following condition:
1) tree root of G is R;
2) for each set miIn some vertex vi, exist from R to viDirected path
During keyword query, the query result tree that those include Partial key word is known as candidate result tree, Referred to as CRT.
III step: keyword query is realized using reverse search algorithm
The process for realizing keyword query using MapReduce is specifically described below, is mainly made of following four step, That is MR1: in the map stage, candidate result tree is searched first with reverse search algorithm, indicates (CRT with 4 yuan of ancestralsi, K, B, Si), Middle CRTiIt is the one tree using R as tree root, K indicates CRTiThe keyword for being included, B indicate the divided top in the subgraph Point, SiIndicate CRTiThe subgraph at place;Secondly, 4 yuan of ancestrals to be packaged into the key-value pair < B, CRT of key-valuei>;
MR2: two candidate result tree CRT in the combiner stage, in different subgraphsiAnd CRTjIf the two CRT Associated segmentation vertex Vi∩Vj≠ Φ, then by CRTiAnd CRTjIt is put into only one combiner;
MR3: in the reduce stage, the CRT in the same combiner is merged;If including in the result after merging The keyword of all inquiries then exports the query result;
MR4: in the range stage, since RDF data collection is very big, an inquiry might have multiple matched inquiries As a result.However, user is only interested in the query result of sub-fraction under normal conditions, it is therefore desirable to using score function to looking into Result is ask to score.Therefore this patent is scored using currently used method using the compactedness of result, returns to Top-k A query result.
Detailed description of the invention
The basic framework of distributed keyword query method of the Fig. 1 based on RDF graph
Fig. 2 RDF exemplary diagram
Fig. 3 converts RDF graph to the exemplary diagram of RDF sentence figure
The flow chart of side partitioning algorithm of the Fig. 4 based on RDF sentence figure
The flow chart of Fig. 5 simulated annealing
The flow chart of Fig. 6 MapReduce realization keyword query
Fig. 7 RDF segmentation figure
The comparison figure of the response time of Fig. 8 different partitioning algorithms
Fig. 9 keyword query example
The comparison figure of response time before and after Figure 10 parallelization
Specific embodiment
Below with reference to the embodiments and with reference to the accompanying drawing further description of the technical solution of the present invention.
Embodiment: this patent using true data set swetodblp (http://lsdis.cs.uga.edu/ Projects/SemDis/Swetodblp), Data subject is the information that computer science is published an article.In the data altogether Comprising 681636 triples, storage occupies 53.6MB, and number of edges and number of vertex are respectively 1026375 and 373219.
The present invention is based on the flow charts of the distributed keyword query method of RDF graph, as shown in Figure 1, leading as we know from the figure To include following 3 stages:
First stage: RDF sentence figure is converted by the RDF graph in Fig. 2, as shown in Figure 3.From figure it is found that will have no right to have It is converted into the undirected RDF sentence figure of vertex cum rights to RDF graph, the number in RDF sentence figure represents the RDF tri- for including in the sentence The number of tuple.
Second stage: the side partitioning algorithm based on RDF sentence figure, flow chart is as shown in figure 4, the algorithm is substantially sharp Figure segmentation is carried out with the smallest principle of degree of vertex in figure and depth-first traversal.Below according to the RDF sentence figure in Fig. 3, give The example that the fixed algorithm is realized.
Primary condition: this degree of 4 vertex in figure of D={ S3, S4, S5, S6 } is 1, by visited [Si]= False (i=3,4,5,6).It is assumed that figure is divided into 2 subgraphs, i.e. k=2, then in each RDF subgraph side number e=(2+1 + 1+1+1+3)/2=4.5, probably there are 5 sides in each subgraph.
C1: it is assumed that the access order on vertex is { S5, S6, S3, S4 }, the at this time weight on vertex not visited in figure in D For 9 > 5, select the vertex begun stepping through for S5
C2: S5 is added to S={ S5 } in set S, and visited [S5]=true
C3:| S |=2 < 5, N (S, Gs)={ S1 }
C4: S1 is added in set S, S={ S1, S5 } and visited [S1]=true;Due to | S |=3 < 5 continue Execute R2, R3.N (S, Gs)={ S1, S5, S6, S2 } in only S6 and S2 it is not visited, wherein | N (S6, Gs) S |=0, | N (S2, Gs) S |=| { S1, S3, S4 } { S1, S5 } |=| { S3, S4 } | the associated vertex=2, S6 be less than the associated vertex S2, Therefore S6 is added to S={ S1, S5, S6 } and visited [S6]=true. in set S
C5: however with N (S, Gs) S={ S1, S5, S6 } { S1, S5, S6 }=NULL, and | S |=4 < 5, therefore retract To the vertex S1 of recent visit, the vertex adjacent with S1 only has S2, therefore S2 is added to S=in S { S1, S5, S6, S2 }, Visited [S2]=true, | S |=5, then side associated with vertex in set S is removed, that is, removes S2-S3 and S2-S4 Two sides.Its weight of S3 not visited in figure at present, S4 and be 3+1=4 < 5, therefore remaining figure do not have to be split again, Two subgraphs are finally obtained, the set of subgraph is respectively { S1, S5, S2, S6 } and { S3, S4 }, and the number on divided side is 2
Since the algorithm is easy to get to locally optimal solution, the situation is avoided using simulated annealing, as shown in Figure 5. An example of algorithm realization is given below.
D1: the access order of given set D is { S5, S6, S3, S4 }, i.e. initial solution
D2: objective function is that the number on divided side is minimum, is accessed according to working as known to side partitioning algorithm described above Sequence is that the number on the divided side { S5, S6, S3, S4 } is 2
D3: the condition for generating new explanation is whether the number on divided side after adjusting access order reduces, if reducing Generate new explanation.Such as the access order of the D after adjustment is { S3, S4, S5, S6 } at present, divides according to side described above and calculates The side of Fa Ke get segmentation is S1-S2, number 1, and the set of two subgraphs is respectively { S2, S3, S4 } and { S1, S5, S6 } this time Segmentation effect be better than step D2, therefore replace old access order with this stylish access order
D4: constantly optimizing the process using simulated annealing, until optimal solution is obtained, since simulated annealing makes With than wide so be no longer described in detail herein
3) three phases: realize that the procedure chart of keyword query is as shown in Figure 6 using Hadoop distributed computing framework. Below by taking Fig. 2 RDF graph as an example, the process for carrying out distributed keyword query using reverse search algorithm is as follows.
After being split RDF sentence figure, need to be refined as RDF graph again.According to second stage based on RDF It is respectively { S2, S3, S4 } and { S1, S5, S6 } that the side partitioning algorithm of sentence figure, which can obtain divided two sub- set of graphs, then refines RDF subgraph afterwards is as shown in fig. 7, the intersection for solving the two RDF subgraphs is that vertex { iswc } assumes searching keyword K= { paper-45, paper-13, OWL }
MR1: the subgraph after segmentation is placed on 2 different nodes, is realized in the Map stage using reverse search algorithm The inquiry of candidate result tree CRT, the two CRT are respectively CRT1=< { paper-45- > isPartof- > iswc }, { paper-45 }, { iswc }, { (L) }>and CRT2=<{ iswc- > hasPart- > paper-13- > title- > Can OWL And Logie Programming Live Together Happily }, { paper-13, OWL }, { iswc }, { (R) } > and with <key, the result of value>key-value pair form storage inquiry;
MR2: it will be put into identical key value but exist from the CRT of different subgraphs in the combiner stage same There are cutpoint key={ iswc } and the left side Fig. 7 subgraph and the right subgraph are respectively present in combiner, between CRT1 and CRT2 In, therefore its two CRT is put into the same combiner;
MR3: being attached merging for the CRT in the same combiner in the reduce stage, and the result after merging is RT ={ paper-45- > isPartof- > iswc- > hasPart- > paper-13- > title- > Can OWL and Logie Programming Live Together Happily } and connect after RT in include all searching keywords, then Export query result;
MR4: in the range stage, query result is ranked up and returns to Top-k result to user.
In order to compare the advantage of this patent partitioning algorithm REC, a comparative analysis, such as Fig. 8 are with VSEP and DVCP algorithm It is shown.Can be obtained from figure REC segmentation the response time it is most short, the time longest of DVCP.The advantage of REC partitioning algorithm is to RDF On the one hand figure, which carries out compression processing, ensure that on the other hand the atomicity of RDF data and semantic integrity reduce vertex in figure Number, to reduce the traversal space of algorithm to improve the efficiency of figure segmentation.VSEP algorithm passes through where exchange vertex Subgraph reduce divided number of vertices, need that two vertex is arbitrarily selected to carry out from figure during iteration each time Exchange time complexity is o (n2) (number that n is vertex in figure), with the increase of RDF data scale, corresponding RDF The number of vertex of figure can become very big, the efficiency of extreme influence figure segmentation.DVCP algorithm mainly passes through the subgraph where exchange side, makes With these when associated vertex includes least subregion, to reduce the vertex cut, in fact in figure side number Far more than the number on vertex, so the efficiency being split using edge flip, which is lower than, exchanges the efficiency being split using vertex.
In order to compare the advantage of the search algorithm under distributed environment, 10 group polling examples are given, as shown in Figure 9.Respectively Search algorithm is executed on single machine and cluster, average response time of this 10 group polling example on different clustered nodes such as Figure 10 It is shown.The efficiency that the efficiency of available parallelization inquiry will be inquired much higher than single machine from figure, and with interstitial content Increase, the parallelization inquiry response time is constantly to reduce, but the amplitude changed is smaller and smaller.It even appear that working as node When number is changed by 40 to 50, query responding time has almost no change.Therefore, by testing it can also be seen that being looked into parallel doing When inquiry, the number of node needs appropriateness to choose, and just parallel effect can be made to reach best.
Above-mentioned is only concentration embodiment of the invention, it is noted that those skilled in the art are in technical solution of the present invention It is carried out in range.

Claims (4)

1. the distributed keyword query method based on RDF graph comprising the steps of:
Step (1): RDF sentence figure is converted by RDF graph;
Step (2): the side partitioning algorithm (REC) based on RDF sentence figure;
Step (3): keyword query is realized using Hadoop distributed computing framework.
2. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step (1) RDF sentence s is made of RDF triple and meets the following conditions in:
Any two RDF member ancestral in condition 1:s be it is attachable, i.e., when two RDF member ancestrals share the same blank node, The two RDF member ancestrals are attachable;
Any one RDF member ancestral in condition 2:s cannot connect with the RDF member ancestral not in s;
It uses (s, p, o) to describe a RDF triple, and is abbreviated as t, respectively indicated in triple with s (t), p (t) and o (t) Main body, predicate and object, wherein RDF digraph and RDF sentence figure are defined as follows:
RDF digraph: setting G=(V, E, L) indicates the RDF digraph of a tape label, wherein by main body and visitor in RDF triple The vertex set V={ v | v ∈ s (T) ∪ o (T) } of body composition, the collection of the directed edge of the predicate composition of relationship between subject and object Conjunction E=< s (t), o (t) > | and t ∈ T }, object vertex is directed toward by main body vertex in the direction on side, and L is the set of label, L=Lv ∪Lp, wherein LvIndicate vertex label, LpIndicate predicate label;
RDF sentence figure: G is sets=(S, E, l, w) indicates a RDF sentence figure, it is a vertex weighted-graph, wherein in figure The corresponding RDF sentence of each node, S indicates the vertex in figure, and E indicates the side in figure;If si, sj∈ S and si≠sj,t′∈sjAnd there is t.s=t ' .s or t.o=t ' .s then siAnd sjIt is associated, i.e. (si, sj)∈E;L is a mark Function is signed, forIt is the subject comprising the sentence, the local label set of predicate or object;W is top The weight of point,It is equal to the number of RDF member ancestral included in the sentence.
3. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step It (2) is split according to two basic principles of figure segmentation.One in order to reduce Internet traffic it is necessary to reducing quilt as far as possible The number of edges amount of segmentation.Therefore depth-first traversal is carried out using the smallest principle of Vertex Degree, and uses Simulated Anneal Algorithm Optimize Segmentation result;Two, in order to realize the balances of data between subgraph, the side in RDF graph are evenly distributed in each subgraph, i.e., The number on side in each RDF subgraphIf G will be schemedsIt is divided into k subgraph, indicates each vertex place using function P Subgraph, wherein different subgraph is indicated with { 1,2...k }, then the subgraph where label j=p (s) indicates sentence s is Sj, Wherein SjMeet following two conditionAnd Si∩Sj=Φ,Wherein, the side segmentation of RDF sentence figure The specific steps of algorithm are as follows:
Step A: the number on side in input RDF subgraph
According to RDF sentence figure Gs, input the number e on side in RDF subgraph;
Step B: setting " access flag "
The smallest vertex of degree in RDF sentence figure is solved, is put it into set D, and is arranged one for each vertex and " visits Ask mark ";
Step C: the vertex in traversal set D is split RDF sentence figure
C1: the access order on vertex in given set D, if currently in RDF sentence figure all not visited vertex weight it With | Gs| > e then sequentially selects a not visited vertex from D;
C2: the vertex is added in set S while the state on the vertex being set to has accessed;
C3: if all vertex weights of current subgraph | S | < e and set N (S, the G on the vertex adjacent with set Ss)!= Null, then from N (S, Gs) one vertex v of middle random selection;
C4: if the vertex v is not visited and removes vertex adjacent with vertex v in S | N (v, Gs) S | number it is minimum, i.e., should The degree on vertex is minimum.Jump to step C2;
C5: if in current collection S vertex weight | S | < e, and N (S, Gs)=null then returns to the vertex of recent visit, and Jump to step C4;
C6: if at this time in current collection S vertex weight | S | > e jumps to step C1;
D step: optimal solution is sought in simulated annealing
D1: the sequence of vertex access in given set D;
D2: the number on the side divided according to such access order is calculated;
D3: the access order on two vertex in random replacement D, if divide at this time while number be less than in step D2 while Number, illustrates that new result is better than old as a result, then replacing old access order with new access order;
D4: simulated annealing function is finally called.
4. the distributed keyword query method according to claim 1 based on RDF graph, which is characterized in that the step (3) the specific implementation process is as follows:
I step: divide the determination of the vertex cut set of RDF graph
The vertex cut set of segmentation RDF graph can be obtained in the intersection for solving RDF subgraph;
II step: the determination of directed tree
The definition of given query result tree (RT), if the set with the matched vertex keyword K is M (K)={ m1, m2...ms, Query result is defined as the directed tree for meeting following condition:
1) tree root of G is R;
2) for each set miIn some vertex vi, exist from R to viDirected path;
During keyword query, the query result tree that those include Partial key word is known as candidate result tree, referred to as For CRT;
III step: keyword query is realized using reverse search algorithm
The process for realizing keyword query using MapReduce is specifically described below, is mainly made of following four step, i.e.,
MR1: in the map stage, candidate result tree is searched first with reverse search algorithm, indicates (CRT with 4 yuan of ancestralsi, K, B, Si), Wherein CRTiIt is the one tree using R as tree root, K indicates CRTiThe keyword for being included, B indicate divided in the subgraph Vertex, SiIndicate CRTiThe subgraph at place;Secondly, 4 yuan of ancestrals to be packaged into the key-value pair < B, CRT of key-valuei>;
MR2: two candidate result tree CRT in the combiner stage, in different subgraphsiAnd CRTjIf the two CRT are closed The segmentation vertex V of connectioni∩Vj≠ Φ, then by CRTiAnd CRTjIt is put into only one combiner;
MR3: in the reduce stage, the CRT in the same combiner is merged;If comprising all in the result after merging The keyword of inquiry then exports the query result;
MR4: in the range stage, since RDF data collection is very big, an inquiry might have multiple matched query results. However, user is only interested in the query result of sub-fraction under normal conditions, therefore inquiry is tied using the compactedness of result Fruit scoring is to return to Top-k query result.
CN201710376203.4A 2017-05-25 2017-05-25 Distributed keyword query method based on RDF graph Pending CN108959318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710376203.4A CN108959318A (en) 2017-05-25 2017-05-25 Distributed keyword query method based on RDF graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710376203.4A CN108959318A (en) 2017-05-25 2017-05-25 Distributed keyword query method based on RDF graph

Publications (1)

Publication Number Publication Date
CN108959318A true CN108959318A (en) 2018-12-07

Family

ID=64493947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710376203.4A Pending CN108959318A (en) 2017-05-25 2017-05-25 Distributed keyword query method based on RDF graph

Country Status (1)

Country Link
CN (1) CN108959318A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222240A (en) * 2019-05-24 2019-09-10 华中科技大学 A kind of space RDF data keyword query method based on summary figure

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156633A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Scalable Multi-Query Optimization for SPARQL
CN104462253A (en) * 2014-11-20 2015-03-25 武汉数为科技有限公司 Topic detection or tracking method for network text big data
CN104765875A (en) * 2015-04-24 2015-07-08 海南易建科技股份有限公司 Distributed processing method and system for passenger behavior data
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156633A1 (en) * 2012-11-30 2014-06-05 International Business Machines Corporation Scalable Multi-Query Optimization for SPARQL
CN104462253A (en) * 2014-11-20 2015-03-25 武汉数为科技有限公司 Topic detection or tracking method for network text big data
CN104765875A (en) * 2015-04-24 2015-07-08 海南易建科技股份有限公司 Distributed processing method and system for passenger behavior data
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李慧颖等: "基于关键词的RDF数据查询方法", 《东南大学学报 自然科学版》 *
王振涛: "基于二分图的RDF关键词扩展查询算法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222240A (en) * 2019-05-24 2019-09-10 华中科技大学 A kind of space RDF data keyword query method based on summary figure
CN110222240B (en) * 2019-05-24 2021-03-26 华中科技大学 Abstract graph-based space RDF data keyword query method

Similar Documents

Publication Publication Date Title
US10706103B2 (en) System and method for hierarchical distributed processing of large bipartite graphs
Rahimian et al. Ja-be-ja: A distributed algorithm for balanced graph partitioning
CN106021457B (en) RDF distributed semantic searching method based on keyword
US10803121B2 (en) System and method for real-time graph-based recommendations
Wang et al. Skyframe: a framework for skyline query processing in peer-to-peer systems
Yu et al. Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
Chen et al. MapReduce skyline query processing with a new angular partitioning approach
CN109359115B (en) Distributed storage method, device and system based on graph database
Belesiotis et al. Spatio-textual user matching and clustering based on set similarity joins
Sun et al. Interactive spatial keyword querying with semantics
CN112084781A (en) Standard term determination method, device and storage medium
Kalyvas et al. Skyline and reverse skyline query processing in SpatialHadoop
Moutafis et al. Algorithms for processing the group K nearest-neighbor query on distributed frameworks
Tayal et al. A new MapReduce solution for associative classification to handle scalability and skewness in vertical data structure
CN108959318A (en) Distributed keyword query method based on RDF graph
US10229186B1 (en) Data set discovery engine comprising relativistic retriever
Niu et al. Semi-supervised plsa for document clustering
Li et al. GAP: Genetic algorithm based large-scale graph partition in heterogeneous cluster
CN110209895A (en) Vector index method, apparatus and equipment
Zhang et al. Towards distributed node similarity search on graphs
CN110162580A (en) Data mining and depth analysis method and application based on distributed early warning platform
Zheng et al. User preference-based data partitioning top-k skyline query processing algorithm
Benjamas et al. Impact of I/O and execution scheduling strategies on large scale parallel data mining
Nasridinov et al. A two-phase data space partitioning for efficient skyline computation
Du et al. A novel KNN join algorithms based on Hilbert R-tree in MapReduce

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181207