CN105550332B - A kind of provenance graph querying method based on the double-deck index structure - Google Patents
A kind of provenance graph querying method based on the double-deck index structure Download PDFInfo
- Publication number
- CN105550332B CN105550332B CN201510969332.5A CN201510969332A CN105550332B CN 105550332 B CN105550332 B CN 105550332B CN 201510969332 A CN201510969332 A CN 201510969332A CN 105550332 B CN105550332 B CN 105550332B
- Authority
- CN
- China
- Prior art keywords
- index
- provenance graph
- data
- inquiry
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013461 design Methods 0.000 claims abstract description 16
- 238000010845 search algorithm Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 6
- 230000009471 action Effects 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 abstract description 4
- 230000004044 response Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of provenance graph querying method based on the double-deck index structure comprising the steps of: firstly, inquiring towards provenance graph, proposes a kind of double-deck index structure;Secondly, design is based on dictionary sheet global index, matching relationship and provenance graph ID between origination data and data are recorded in table;Then, it proposes to be based on bitmap partial indexes, according to provenance graph RDF query mode, proposes the index and three kinds of join inquiry modes for meeting Triple Pattern inquiry, and based on the corresponding search algorithm of Index Design.Finally, demonstrating the feasibility and validity of the provenance graph querying method based on the double-deck index structure by test.
Description
Technical field
The present invention relates to the management of the origination data of big data management domain, are directed to the query scheme of data origin figure emphatically
Design and realization.The present invention provides a kind of provenance graph issuer based on the double-deck index structure according to data origin figure feature
Method.This method is designed from global and local two levels respectively: on the one hand can be with matched data and its by dictionary sheet
Relationship between source data proposes to be based on dictionary sheet global index algorithm;On the other hand origin institute is quickly positioned according to provenance graph ID
It is stored in cloud computing server node, proposes to be based on bitmap partial indexes structure, including 6 kinds of different selection indexes and 3 kinds
Join sitation index, and devise corresponding search algorithm.
Background technique
Data origin is the information to the entire history of data processing, the source including data and the institute for handling these data
There is subsequent process.How efficiently to have inquired source information with the continuous development of big data, under cloud computing environment becomes especially to weigh
It wants, how efficiently to have inquired source information becomes a urgent problem to be solved.
The present invention is directed to data origin under cloud computing environment and inquires problem, a kind of double-deck index structure is introduced, respectively from complete
It is analyzed in terms of office's index and partial indexes two, devises a kind of provenance graph querying method, and feasible to method, effective
It is verified.
Summary of the invention
Goal of the invention: aiming at the problems existing in the prior art, the present invention provides a kind of rising based on the double-deck index structure
Source figure querying method.
Technical solution: a kind of provenance graph querying method based on the double-deck index structure mentions firstly, inquiring towards provenance graph
A kind of double-deck index structure out.Secondly, design is based on dictionary sheet global index, records in table and matched between origination data and data
Relationship and provenance graph ID, the relationship that can be associated between origin and data, and the stored cloud in origin can be navigated to rapidly
Server node is to reduce the user query response time;Then, it proposes to be based on bitmap partial indexes, according to provenance graph RDF query
Mode is proposed the index and three kinds of join inquiry modes for meeting eight kinds of Triple Pattern inquiry, and is set based on index
Corresponding search algorithm is counted.
The double-deck index structure towards provenance graph inquiry
Store origination data under previous distributed environment, inquiry origin only rely only on master node come the task of distributing into
Row is searched, it usually needs is traversed entire cluster, is consumed a large amount of time and resource.And storage system in origin under existing distributed environment
System is substantially based on major key come quick search, lacks efficient index structure, cannot provide the inquiry such as multi-dimensional query and join.
Efficient index structure can effectively improve search efficiency, shorten response time when user query.
To improve search efficiency, in conjunction with provenance graph feature, a kind of double-deck index structure is proposed.Index structure includes being based on
Dictionary sheet global index and be based on bitmap partial indexes.The server node that global index's inquiry provenance graph is stored, local rope
Draw the server node refined queries inquired to global index, and then inquires required origination data.Global index's distribution
It, only need to can referring to global index's structure of local server when user requests to reach under cloud environment on each node
Node location where obtaining the provenance graph inquired.Partial indexes are only to establish the origination data stored in local server
It indexes, there is no dependences for the partial indexes between each node.
Global index and global query's algorithm based on dictionary sheet
Dictionary table structure is provided first, on this basis, completes the querying flow based on global index.
1, dictionary table structure
According to data origin feature, dictionary sheet HCPTable is designed in terms of two.Firstly, storage provenance graph title and correspondence
Data item.Data item is exactly the described data that originate from, and all data in one action stream is all corresponded to a provenance graph, slightly
Relationship between the description origin of granularity and data.Secondly, storing provenance graph title and corresponding ID.The execution of workflow each time
A data provenance graph can be then generated, origin ID is then generated in storing process according to Hash (key) mapping.It is risen in global index
Source figure ID is the input item of consistency hash index algorithm, can quickly calculate provenance graph institute storage server according to origin ID
Node.
2, based on the querying flow of global index
It is begun stepping through from the root node of provenance graph to leaf node according to provenance graph ID is inquired in HCPTable, according to leaf
Node obtains provenance graph storage server.Global index's querying flow is as follows:
(1) it searches dictionary sheet and obtains provenance graph ID number
(2) child node met the requirements is searched according to query demand
(3) output child node number is calculated
Partial indexes and local queries algorithm based on bitmap
In order to improve inquiry provenance graph data efficiency, consider user query when sentence diversity, make up selection index Is,
The deficiency of Ip, Io in the inquiry to single Triple Pattern, to triple known to Subject-Verb design index Isp and
Ips, designs index Ipo and Iop to triple known to predicate object, designs index Iso to triple known to subject object
And Ios, form complete local bitmap index structure, including selection index Is, Ip, Io, Isp, Ipo, Iso and join index
Is'、Io'、Iso'。
Partial indexes support the refined queries to the origin diagram data on single cloud storage service device node.Provenance graph inquiry
Include two parts: single Triple Pattern inquiry and join inquiry.
(1) single Triple Pattern inquiry
Selection index Is, Ip, Io, Isp, Ipo, Iso are to subject, predicate, object, Subject-Verb, predicate object, subject guest
Language carries out the inquiry of single Triple Pattern.
(2) join is inquired
For handling, subject shared variable, object shared variable and subject object are shared to be become selection index Is', Io', Iso'
Amount carries out join inquiry.
Detailed description of the invention
Fig. 1 is the double-deck index structure;
Fig. 2 is the origin querying flow figure based on global index;
Fig. 3 is consistency binary tree distributed model;
Fig. 4 is RDF triple join type;
Fig. 5 is that index space occupies analysis graph;
Fig. 6 is query performance analysis graph.
Specific embodiment
Combined with specific embodiments below, the present invention is furture elucidated, it should be understood that these embodiments are merely to illustrate the present invention
Rather than limit the scope of the invention, after the present invention has been read, those skilled in the art are to various equivalences of the invention
The modification of form falls within the application range as defined in the appended claims.
Provenance graph querying method based on the double-deck index structure proposes a kind of double-deck index firstly, inquiring towards provenance graph
Structure.Secondly, design is based on dictionary sheet global index, matching relationship and provenance graph between origination data and data are recorded in table
ID, the relationship that can be associated between origin and data, and the stored Cloud Server node in origin can be navigated to rapidly to subtract
Few user query response time;Then, it proposes to be based on bitmap partial indexes, according to provenance graph RDF query mode, proposes satisfaction
The index and three kinds of join inquiry modes of eight kinds of Triple Pattern inquiry, and based on the corresponding query operator of Index Design
Method.
The double-deck index structure towards provenance graph inquiry
Store origination data under previous distributed environment, inquiry origin only rely only on master node come the task of distributing into
Row is searched, it usually needs is traversed entire cluster, is consumed a large amount of time and resource.And storage system in origin under existing distributed environment
System is substantially based on major key come quick search, lacks efficient index structure, cannot provide the inquiry such as multi-dimensional query and join.
Efficient index structure can effectively improve search efficiency, shorten response time when user query.
Index structure includes based on dictionary sheet global index and being based on bitmap partial indexes.Provenance graph institute is inquired by global index
The server node of storage, the server node refined queries that partial indexes inquire global index, and then inquire required
Origination data.Global index is distributed under cloud environment on each node, only need to be referring to local clothes when user requests to reach
Global index's structure of business device can obtain node location where the provenance graph inquired.Partial indexes are only to establish in local clothes
The index of origination data that business device is stored, there is no dependences for the partial indexes between each node.The bilayer of design
Index structure is specifically as shown in Figure 1.
Global index and global query's algorithm based on dictionary sheet
Dictionary table structure is provided first, on this basis, completes the querying flow based on global index.
1, dictionary table structure
According to data origin feature, dictionary sheet HCPTable is designed in terms of two.Firstly, storage provenance graph title and correspondence
Data item.Data item is exactly the described data that originate from, and all data in one action stream is all corresponded to a provenance graph, slightly
Relationship between the description origin of granularity and data.Secondly, storing provenance graph title and corresponding ID.The execution of workflow each time
A data provenance graph can be then generated, origin ID is then generated in storing process according to Hash (key) mapping.It is risen in global index
Source figure ID is the input item of consistency hash index algorithm, can quickly calculate provenance graph institute storage server according to origin ID
Node.The storage organization example of the dictionary sheet HCPTable of design is as shown in table 1.
The storage organization of 1 dictionary sheet HCPTable of table
2, the provenance graph memory node querying flow based on global index
Provenance graph memory node querying flow based on global index is as shown in Figure 2.
(1) it searches dictionary sheet and obtains provenance graph ID
(2) it searches since the root node of tree, the server section in tree is stored according to provenance graph ID inquiry origin
Point, formula 1 calculate selection child node.ID is provenance graph ID number in formula 1, and root.Number is the number of root node.
Nodenum=ID%root.Number (1)
According to calculated result select child node, verifying child node node.Isleaf determined property whether leaf node, if
It is that leaf node thens follow the steps (4), it is no to then follow the steps (3).
(3) using the node as new root node, continuation is executed since step (2).
(4) it calculates and exports this node serial number.
Firstly, the execution of process each time can all select a leaf node for tree, execute since root node to leaf
Child node.Querying method is similar to binary search, so time complexity is O (log (n)).Secondly, the present invention uses consistency two
Fork tree distribution storage, such binary tree structure storage mode also can be more much higher than the efficiency of other multiway trees.
3, consistency binary tree distribution storage
The thought of consistency binary tree distribution storage is to carry out server layering grouping, and consistency Hash is combined to calculate
Data are evenly dispersed in each Cloud Server by method.Each server section in consistency binary tree leaf node expression cloud
Point is used to store origin diagram data.
Consistency binary tree distributed model is based on binary tree structure, is divided into multiple mutually disjoint in each Hierarchy nodes
Finite aggregate in, wherein each set itself again be one tree, so that all memory nodes to be assigned to the difference of different levels
In group.Corresponding server number is stored in leaf node.
Define 1 consistency y-bend distribution tree: the binary tree that consistency binary tree is made of the finite aggregate T of n node, T
={ V, E }, V are the set of node, and E is the set on side.
Each leaf indicates cloud computing server position in finite aggregate T.For each node, unique one can be used
Serial No. definition, successively represent the number in the lived through path of the node from left to right, wherein subtree from left to right according to
Secondary number 0,1,00 ....It is 11 as inquired D node serial number in Fig. 3, is also just uniquely determined in this consistency distribution tree
Specific location of the D node in tree, i.e. inquiry D node pass through 1 and 1 liang of paths.When the volume of leaf node all in tree
When number all completion, logical construction of this tree is also determined that.It therebetween is the relationship singly mapped.With consistency y-bend
When tree, it can be abstracted as a two-dimensional array, so that it may safeguard tree structure with two-dimensional array.
Algorithm is realized:
The purpose of global query is server node where positioning provenance graph, Design consistency distributed storage of the present invention
Different Origin figure is uniformly stored in leaf node different in tree.When inquiring provenance graph, inquired according to provenance graph ID
Source is stored in the server node in tree.
Originating node search algorithm Match_Node is specific as follows:
Partial indexes and local queries algorithm based on bitmap
Originate from diagram data in the present invention using triple as unit progress sequential storage, the number of triple uses pos from 1 to n
(ti) indicate each triple tiStorage location i in figure, uses pos-1(ti) return to triple tiPosition i in figure,
Wherein ti∈ G, G are the triplet sets of a RDF graph, and D is the set of RDF graph, G={ t1,t2,...tn},Gi∈ D, D=
{G1,G2,...Gn}。
RDF data is respectively indicated using S, P and O concentrates subject, predicate and object set.As shown in formula 2
S=S1∪S2∪....∪Sn,Si=s | (s, p, o) ∈ Gi},Gi∈ D, D={ G1,G2,...Gn}
P=P1∪P2∪....∪Pn,Pi=p | (s, p, o) ∈ Gi},Gi∈ D, D={ G1,G2,...Gn} (2)
O=O1∪O2∪....∪On,Oi=o | (s, p, o) ∈ Gi},Gi∈ D, D={ G1,G2,...Gn}
1, to the inquiry of single Triple Pattern
Since subject, predicate and object may be variable in single Triple Pattern, then being directed to single triple
Inquiry need to design multi-dimensional indexing.The main thought of multi-dimensional indexing is by the non-variables query interface in triple.
In SPARQL inquiry clause such as to the expression formula of single triple Triple Pattern clause inquiry and represented semanteme
Shown in table 2.
2 Triple Pattern expression formula of table and its meaning
Clause's expression formula | Meaning | |
1 | (s,p,o) | If triple exists, triple is returned, null value is otherwise returned |
2 | (? s, p, o) | Given predicate, object, return to the subject result set for meeting triple |
3 | (s,? p, o) | Given predicate, object, return to the subject result set for meeting triple |
4 | (s, p,? o) | Given predicate, object, return to the subject result set for meeting triple |
5 | (? s,? p, o) | Given predicate, object, return to the subject result set for meeting triple |
6 | (s,? p,? o) | Given predicate, object, return to the subject result set for meeting triple |
7 | (? s, p,? o) | Given predicate, returns to the subject and object result set for meeting triple |
8 | (? s,? p,? o) | Return to all triples |
Define 2 bitmap index Is: index Is is the set { (s of all triple subjects in RDF graph G1,v1),(s2,
v2),....,(sn,vn)}.Wherein, s ∈ S.viFor figure G in a bit vector, and the k location in vector be 1 and if only if
There are triple t in figure Gk=pos (k), tk∈G,tk.s=si。
The purpose of Is Index Design is that the query statement to inquiry subject can quickly be found accordingly in RDF graph
Triple.Wherein the size of Is is fixed, identical comprising the number of RDF triple with the provenance graph of place.
Similarly, same mode establishes index Ip and Io, can quickly inquire using predicate or object as keyword
Triple.If subject, predicate and object are all it is known that so can be Is, Io and Ip tri- in conjunction with index in query statement
Search index: Is ∧ Ip ∧ Is.
Define 3 bitmap index Isp: index Isp is all triple Subject-Verb set { (s in RDF graph G1p1,v1),
(s2p2,v2),....,(snpn,vn)}.Wherein, s ∈ S, p ∈ P.viIt to scheme a bit vector in G, and is the k in vector
Position is 1 and if only if there are triple t in figure Gk=pos-1(k),tk∈G,tk.sp=sipi。
Similarly, same mode establishes index Ips, Ipo, Iop, Iso, Ios, can quickly inquire and be called with subject
The triple of language, predicate object, object predicate, subject object and object subject as keyword.
2, containing the inquiry of join
Relevance between triple is judged by whether there is unbound variable of the same name between triple.Root
Incidence relation can be turned to three kinds: Subject-Subject link, Object-Object according to the position of occurrences of the same name
Link and Object-Subject link, RDF triple join type are as shown in Figure 4.
Define bitmap index Is': index Is' be in RDF graph G it is all comprising identical subject triplet sets (1,
v1),(2,v2),....,(n,vn)}.Wherein n=| G |, 1,2...n is then that continuous position identifies in figure.viFor one in figure G
Bit vector, and be k location in vector be 1 and if only if there are triple t in figure Gk=pos (k), tk∈G,ti=pos
(i),tk.s=ti.s。
Similarly index Io' establishes similar Is'.Herein without establishing Ipp for predicate, because the amount of predicate is opposite in figure
Triple that is less and inquiring identical predicate has no meaning for subject and object.
Define 4 bitmap index Iso': index Iso is all triple collection comprising identical subject and object in RDF graph G
Close { (1, v1),(2,v2),....,(n,vn)}.Wherein n=| G |, 1,2...n is then that continuous position identifies in figure.viFor in figure G
A bit vector, and be k location in vector be 1 and if only if there are triple t in figure Gk=pos (k), tk∈G,ti
=pos (i), ti∈G,tk.o=ti.s.Index Ios' is then the transposition for indexing Iso': Ios'=Iso'T。
Index Isp, Ips, Ipo, Iop, Iso, Ios be selection index, for handle known Subject-Verb, predicate object or
The inquiry request of person's subject object, wherein Isp and Ips, Ipo and Iop, the described triple of Iso and Ios index are in practical figure
Middle storage location is identical, therefore only needs Isp, Ipo and Iso.
To sum up, the present invention is using index Is, Ip, Io, Isp, Ipo and Iso to subject, predicate, object, Subject-Verb, meaning
Triple known to language object or subject object is inquired.Index Is', Io', Iso', Ios' is shared for handling subject
The join inquiry request of variable, object shared variable and subject object shared variable.Bitmap index storing framework TDSuch as 3 institute of table
Show.
3 bitmap index storing framework T of tableD
3, algorithm is realized
The unknown is inquired according to known terms in triple to the search algorithm ASI_TP of single Triple Pattern, it is as follows
It is shown;
It can be in the respective subject of two triples, object, subject and predicate, predicate to the algorithm AJI_TP of join inquiry
When identical with object, subject and predicate difference can Rapid matching, as follows;
And the algorithm Match_BGP to BGP inquiry, it is as follows:
The process algorithm that is called when wherein ASI_TP and AJI_TP is inquires BGP.Match_BGP algorithm will be in BGP
All trple pattern are pre-processed, that is, are resequenced, the specific steps are as follows:
1, it is forward to establish the high trple pattern sequence of selectance, trple pattern selectance from high to low suitable
Sequence is as follows:
(1) Subject non-variables
(2) Subject, Predicate and Object all non-variables and the non-rdf:type of predicate
(3) Subject is variable, Predicate and Object non-variables and predicate is rdf:type
(4) Subject and Predicate is variable, Object non-variables
(5) Subject and Object is variable, Predicate non-variables and the non-rdf:type of predicate
(6) Subject and Object is variable, Predicate rdf:type
(7) Subject, Object and Predicate are variable.
2, function ASI_TP algorithm is called to look into first trple pattern in the RDF triple collection bgp after sequence
It askes, returned variable storage is in vseva.And according to vsevaIt obtains a result and collects S.If result set S is sky, directly return empty
Collection.
3, next trple pattern is taken, the trple pattern and trple before are first judged before inquiry
Whether pattern has shared variable, if there is shared variable, then calls algorithm AJI_TP, records at current trple pattern
The result set S of reasontpi, merge current results collection and vsevaThe result set obtained.
4, third step is repeated, until all result sets all poll-finals.
5, the bit vector in result set S is replaced, specific RDF triple is obtained according to bit vector.Return to the result of inquiry
Collection.
Experimental verification
1, space hold is analyzed
This paper partial indexes technology increases three new indexes to accelerate search efficiency, so storing occupied space more three
Memory space shared by a index.
The triple that identical subject, predicate or object in 400 RDF triples are generated in one action stream can be multiple
Occur, index Is, Ip and Io and does not have to establish index entry to each triple.Triple containing identical element uses
The position bit of bitmap vector marks.Such as the position that identical subject only needs to establish the subject for the first time in bitmap index
Corresponding position 1 in figure vector.The position indicates its logical place stored in the database.
The triple of identical Subject-Verb, predicate object and subject object in the origination data of workflow record is repeated
Same very much, then only set need to be set by different location in the vector established for the first time for duplicate keys, when storage, is only needed
Store a bitmap index.Therefore, three index entries are added on the basis of original 6 indexes herein, the quantity of index increases
Add 50%, and indexed memory space and only increase 25% or so, as shown in Figure 5.
2, query performance is analyzed
The present invention for University of Texas's origination data standard data set test respectively 11 UTPB query statements come
The query performance of index structure designed by the test present invention.
Experiment is thought to have carried out 11 sentences respectively under Hadoop cluster environment to five data sets of D1, D2, D3, D4, D5
Inquiry test.Each inquiry respectively runs 5 average values for taking the corresponding time on five data sets, and query performance is analyzed such as
Shown in Fig. 6.
It is analyzed by example implementing result, it was demonstrated that feasibility of the invention also demonstrates the double-deck index structure proposed
When coping with the storage of mass data origin, with the increase of data volume, storage and inquiry property are relatively superior, customer inquiries request
Response is timely.Data performance in face of complicated inquiry request and magnanimity is still fine.
Claims (2)
1. a kind of provenance graph querying method based on the double-deck index structure, which is characterized in that comprise the steps of: firstly, towards
Provenance graph inquiry proposes a kind of double-deck index structure;Secondly, design is based on dictionary sheet global index, origination data is recorded in table
Matching relationship and provenance graph ID between data;Then, it proposes to be based on bitmap partial indexes, according to provenance graph RDF query side
Formula proposes the index and three kinds of join inquiry modes for meeting Triple Pattern inquiry, and based on Index Design phase
The search algorithm answered;
The double-deck index structure towards provenance graph inquiry includes based on dictionary sheet global index and being based on bitmap partial indexes;It is global
The server node that search index provenance graph is stored, partial indexes look into the server node refinement that global index inquires
It askes, and then inquires required origination data;Global index is distributed under cloud environment on each node, when user requests to reach
When, node location where the provenance graph inquired need to can be only obtained referring to global index's structure of local server;Local rope
Drawing is the index for only establishing the origination data stored in local server, the partial indexes between each node there is no according to
The relationship of relying;
Global index and global query's algorithm based on dictionary sheet are as follows:
Dictionary table structure is provided first, on this basis, completes the querying flow based on global index;
1), dictionary table structure
According to data origin feature, dictionary sheet HCPTable is designed in terms of two;Firstly, storage provenance graph title and corresponding data
?;Data item is exactly the described data that originate from, and all data in one action stream are all corresponded to a provenance graph, coarseness
Description origin data between relationship;Secondly, storing provenance graph title and corresponding ID;The execution of workflow then can each time
A data provenance graph is generated, origin ID is then generated in storing process according to Hash (key) mapping;Provenance graph in global index
ID is the input item of consistency hash index algorithm, can quickly calculate provenance graph institute storage server section according to origin ID
Point;
2), based on the querying flow of global index
It is begun stepping through from root node to leaf node according to provenance graph ID is inquired in HCPTable, origin is obtained according to leaf node
Figure storage server;Global index's querying flow is as follows:
(1) it searches dictionary sheet and obtains provenance graph ID
(2) child node met the requirements is searched
(3) it calculates and exports this node serial number.
2. the provenance graph querying method according to claim 1 based on the double-deck index structure, which is characterized in that be based on bitmap
Partial indexes and local queries algorithm are as follows:
Provenance graph inquiry includes two parts: single Triple Pattern inquiry and join inquiry;
(1) single Triple Pattern inquiry
Selection index Is, Ip, Io, Isp, Ipo, Iso to subject, predicate, object, Subject-Verb, predicate object, subject object into
The inquiry of the single Triple Pattern of row;
(2) join is inquired
Selection index Is', Io', Iso' for handle subject shared variable, object shared variable and subject object shared variable into
Row join inquiry.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510969332.5A CN105550332B (en) | 2015-12-21 | 2015-12-21 | A kind of provenance graph querying method based on the double-deck index structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510969332.5A CN105550332B (en) | 2015-12-21 | 2015-12-21 | A kind of provenance graph querying method based on the double-deck index structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105550332A CN105550332A (en) | 2016-05-04 |
CN105550332B true CN105550332B (en) | 2019-03-29 |
Family
ID=55829521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510969332.5A Expired - Fee Related CN105550332B (en) | 2015-12-21 | 2015-12-21 | A kind of provenance graph querying method based on the double-deck index structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105550332B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709000B (en) * | 2016-12-22 | 2020-07-14 | 河海大学 | Key view discovery method based on PageRank and origin graph abstraction |
CN107016065A (en) * | 2017-03-16 | 2017-08-04 | 陕西科技大学 | It is customizable to rely on semantic effective origin filter method |
CN108733681B (en) * | 2017-04-14 | 2021-10-22 | 华为技术有限公司 | Information processing method and device |
US10949467B2 (en) * | 2018-03-01 | 2021-03-16 | Huawei Technologies Canada Co., Ltd. | Random draw forest index structure for searching large scale unstructured data |
CN109857743A (en) * | 2019-02-12 | 2019-06-07 | 浙江水利水电学院 | The construction method and device querying method and system of symmetrical canonical multi-dimensional indexing platform |
CN112817538B (en) * | 2021-02-22 | 2022-08-30 | 腾讯科技(深圳)有限公司 | Data processing method, device, equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831225A (en) * | 2012-08-27 | 2012-12-19 | 南京邮电大学 | Multi-dimensional index structure under cloud environment, construction method thereof and similarity query method |
-
2015
- 2015-12-21 CN CN201510969332.5A patent/CN105550332B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102831225A (en) * | 2012-08-27 | 2012-12-19 | 南京邮电大学 | Multi-dimensional index structure under cloud environment, construction method thereof and similarity query method |
Non-Patent Citations (5)
Title |
---|
"Matix "Bit" loaded:a scalable lightweight join query processor for RDF data";Medha Atre 等;《WWW"10 Proceedings of the 19th intrnational conference on word wide web》;20101231;第41-50页 * |
"Storing,Indexing and Querying Large Provence Data Sets as RDF Graphs in Apache HBase";Artem Chebotko 等;《2013 IEEE Ninth World Congress on Services》;20131107;第1-8页 * |
"分布式存储系统中一致性哈希算法的研究";杨彧剑 等;《电脑知识与技术》;20110831;第7卷(第22期);第5295-5296页 * |
"分片位图索引:一种适用于运输局管理的辅助索引机制";孟必平 等;《计算机学报》;20121130;第35卷(第11期);第2306-2316页 * |
"基于一致性树分布的数据分布式存储方法";郭栋 等;《计算机应用》;20131201;第33卷(第12期);第3432-3436页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105550332A (en) | 2016-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105550332B (en) | A kind of provenance graph querying method based on the double-deck index structure | |
US11120022B2 (en) | Processing a database query using a shared metadata store | |
Özsu | A survey of RDF data management systems | |
US9449115B2 (en) | Method, controller, program and data storage system for performing reconciliation processing | |
US9507875B2 (en) | Symbolic hyper-graph database | |
Junghanns et al. | Gradoop: Scalable graph data management and analytics with hadoop | |
Abraham et al. | Distributed storage and querying techniques for a semantic web of scientific workflow provenance | |
Chen et al. | SparkRDF: elastic discreted RDF graph processing engine with distributed memory | |
Huang et al. | Query optimization of distributed pattern matching | |
US20140324882A1 (en) | Method and system for navigating complex data sets | |
Liagouris et al. | An effective encoding scheme for spatial RDF data | |
Madkour et al. | WORQ: workload-driven RDF query processing | |
Azhir et al. | Query optimization mechanisms in the cloud environments: A systematic study | |
Alaoui | A categorization of RDF triplestores | |
CN108241709A (en) | A kind of data integrating method, device and system | |
Pawar et al. | Keyword search in information retrieval and relational database system: Two class view | |
Svoboda et al. | Linked data indexing methods: A survey | |
Schroeder et al. | A data distribution model for RDF | |
Pandat et al. | Load balanced semantic aware distributed RDF graph | |
JP5464017B2 (en) | Distributed memory database system, database server, data processing method and program thereof | |
Bugiotti et al. | SPARQL Query Processing in the Cloud. | |
Kondylakis et al. | Enabling joins over cassandra NoSQL databases | |
Troullinou et al. | DIAERESIS: RDF data partitioning and query processing on SPARK | |
Jose et al. | Semantic Web Query Join Optimization Using Modified Grey Wolf Optimization Algorithm. | |
Valenta et al. | Distributed evaluation of XPath axes queries over large XML documents stored in MapReduce clusters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190329 |