CN114116785A - Distributed SPARQL query optimization method based on minimum attribute cut - Google Patents

Distributed SPARQL query optimization method based on minimum attribute cut Download PDF

Info

Publication number
CN114116785A
CN114116785A CN202111451035.3A CN202111451035A CN114116785A CN 114116785 A CN114116785 A CN 114116785A CN 202111451035 A CN202111451035 A CN 202111451035A CN 114116785 A CN114116785 A CN 114116785A
Authority
CN
China
Prior art keywords
attribute
graph
partition
distributed
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111451035.3A
Other languages
Chinese (zh)
Other versions
CN114116785B (en
Inventor
彭鹏
田桢
秦拯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202111451035.3A priority Critical patent/CN114116785B/en
Priority claimed from CN202111451035.3A external-priority patent/CN114116785B/en
Publication of CN114116785A publication Critical patent/CN114116785A/en
Application granted granted Critical
Publication of CN114116785B publication Critical patent/CN114116785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed SPARQL query optimization method based on minimum attribute cut, which belongs to the field of distributed systems and comprises the following steps: (1) reading an original RDF data graph, and storing an edge attribute set L; (2) calculating the weakly connected component and the corresponding cost of each edge attribute; (3) selecting internal attributes as much as possible to obtain a coarsening graph of the data graph; (4) carrying out vertex division on the coarsening graph, and carrying out anti-coarsening treatment to obtain a final partition; (5) decomposing the SPARQL query into a set of independently executable subqueries; (6) and executing the decomposed sub-queries in parallel in each partition to obtain a matching result. The invention expands the query types which can be independently executed in the distributed RDF system, reduces the connection between the partitions, reduces the data communication time and improves the query efficiency.

Description

Distributed SPARQL query optimization method based on minimum attribute cut
Technical Field
The present invention relates to the field of distributed systems, and more particularly to data partitioning and query processing for distributed RDF systems.
Background
Rdf (resource Description framework) is a data model organized by W3C, and represents attributes and relationships of web resources in the basic form of triples < subject, predicate, object >, and is currently applied in the fields of knowledge graphs, social network analysis, and the like. The RDF data model has flexible representation form, and can be represented not only as a table in a relational database, but also as a graph model. When RDF is represented as a graph, a triple represents a directed edge pointing from the subject to the object and two vertices connecting the directed edge, the subject and the object are two vertices of the edge, and the predicate is a label on the directed edge. W3C proposes a standard query language SPARQL (simple protocol and RDFquery language) at the same time of proposing RDF. SPARQL, like RDF, can also be represented as a graphical model. Edges in the query graph are called a triplet mode, and the subject, predicate and object in the triplet mode can be variables or constants. Because both SPARQL and RDF can be represented as graph models, SPARQL queries can be transformed into subgraph matching problems.
With the rapid development of the internet, the scale of the RDF data set is continuously increased, and the traditional single machine system cannot effectively process massive RDF data, so that a distributed RDF system appears. In a distributed system, data partitioning is one of the most basic processes. Specifically, the RDF data graph G is divided into a group of subgraphs { F }1,F2,…,FkEach subgraph, called a partition, is distributed among different machines. Currently, a data partitioning method used in a distributed RDF system is to partition data by vertex, that is, to partition each vertex into different partitions, for example, a common hash partition. In this type of approach, some edges may be "split" between partitions, i.e., the two vertices of an edge are divided into different partitions. To ensure graph integrity, these segmented edges are repeatedly saved in two partitions, called one-hop replication. An edge is called an inner edge if two vertices of the edge are in the same partition; otherwise called crossing edges.
The matching type of the query is the same as the type of the edge, and can be divided into two types: internal matching, wherein the matching result is only contained in one partition; across matches, the match results are contained within multiple partitions. When the query to be executed has only an internal match, it only needs to be executed independently in each partition. For a query with cross matching, most of the existing methods decompose the query into a set of star queries, then independently execute the star queries in each partition, and finally execute inter-partition connection to obtain a final result. However, the inter-partition connection involves data communication and extra computational overhead, and has a large impact on query performance. Moreover, in the conventional method of partitioning by vertex, the query that can be executed independently can only be a star, which is greatly limited, and when processing a general query, distributed connection is usually performed, so the query efficiency is not high.
Disclosure of Invention
The existing distributed RDF system only judges whether the query can be executed independently according to the structure of the query graph, and the query graph is considered to be executed independently only when the query graph is a star. The present invention extends the types of queries that can be executed independently, and not just star queries, after considering the attributes of edges in graph data. One of the objectives of the present invention is to provide a graph data partitioning method based on minimum attribute segmentation, which can reduce the number of spanning attributes, thereby avoiding connection operations between partitions and reducing data communication time. The second purpose of the present invention is to provide a query decomposition method, which can decompose an original query that cannot be executed independently into a set of sub-queries that can be executed independently, thereby making full use of the advantage of minimum attribute segmentation data partitioning and improving query efficiency.
The invention provides a distributed SPARQL query optimization method based on minimum attribute segmentation, which comprises the following steps:
step S1: reading an original RDF data graph G, and storing edge attributes into a set L;
step S2: calculating the weakly connected component and the corresponding cost of each edge attribute;
step S3: selecting internal attributes as much as possible to obtain a coarsening graph of the data graph;
when static graph data is processed, the number of edge attributes is fixed and unchanged, and the types only include an internal attribute and a spanning attribute. Therefore, more internal attributes are selected as much as possible by using a heuristic greedy algorithm, so that the minimum cross-attribute is realized, namely the minimum attribute cut is achieved. And after the internal attribute is selected, each weakly connected component in the internal attribute is used as a super point to obtain a coarsened graph of the data graph.
Step S4: carrying out vertex division on the coarsening graph, and carrying out anti-coarsening treatment to obtain a final partition;
when the coarsened graph is subjected to vertex division, any one of the vertex-division algorithms such as hash and METIS may be used. But ensures that the number of vertices in each partition does not exceed (1+ epsilon) × V |/k at the time of partitioning to achieve inter-partition load balancing. Wherein epsilon is the user-defined, maximum imbalance ratio, and k is the number of partitions.
Step S5: decomposing the SPARQL query into a set of independently executable subqueries;
the original SPARQL query is decomposed according to the cross attribute obtained in step S3, the sub-queries obtained by decomposition can be executed independently within the partition, and the shape of the sub-query is not limited to the star query.
Step S6: and executing the decomposed sub-queries in parallel in each partition to obtain a matching result.
By adopting the invention, the following technical effects can be achieved:
the invention provides a distributed SPARQL query optimization method based on minimum attribute segmentation. The present invention then decomposes queries that cannot be executed independently into a set of sub-queries that can be executed independently. Different from the traditional method, the sub-queries which can be independently executed are not limited to star queries, so that the number of invalid intermediate matching results can be further reduced, and the filtering effect is improved.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram illustrating a process of coarsening a data graph according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a query decomposition process according to an embodiment of the present invention.
Detailed Description
The following further description of embodiments of the present invention is provided in conjunction with the accompanying drawings so that those skilled in the art can more easily understand the present invention. It should be noted that the embodiment described below is only one embodiment of the present invention, and not all embodiments. Other embodiments, which can be derived by those skilled in the art from the embodiments of the present invention without making any creative effort, are within the protection scope of the present invention.
For convenience of description and understanding, the symbols and concepts related to the embodiments of the invention are explained as follows:
g: RDF data graphs.
L: the set of attributes for an edge in the RDF data graph.
q (v): and querying the query graph to which the point v belongs.
G [ L' ]: the induced subgraph of the L 'attribute set is a subgraph formed by attribute edges in L'.
DS (L'): and the L' attribute set corresponds to the union check set.
Inside limit: an edge is said to be an internal edge if both vertices of the edge are within the same partition.
Crossing edges: an edge is said to be a spanning edge if two vertices of the edge are within two different partitions, respectively.
Internal attributes: an attribute is said to be an internal attribute if it does not have a crossing edge.
Span attribute: if at least one crossing edge exists in one attribute, the attribute is called as a crossing attribute, namely, at least one attribute crossing the edge is the attribute.
Queries can be performed independently: if a SPARQL query Q is in the RDF graph G partition F ═ { F ═ F1,F2,…,FkAre independently executable, then the matching of query Q does not require inter-partition connections.
Partitioning by minimum attribute: given an RDF data graph G and a positive integer k, the smallest attribute partition F of G ═ F1,F2,…,FkAnd F satisfies: (1) number of spanning attributes | LcrossL is minimum; (2) the number of vertices in each partition does not exceed (1+ ε) x V/k, where ε is the user-defined, maximum imbalance ratio and k is the number of partitions.
The invention provides a distributed SPARQL query optimization method based on minimum attribute segmentation, the flow of which is shown in figure 1 and comprises the following steps:
s1: reading an original RDF data graph G, and storing edge attributes into a set L;
s2: traversing the set L, and calculating a weakly connected component WCC (G { p }) corresponding to each edge attribute p and a corresponding Cost (G { p });
in calculating the weakly connected component, different calculation methods may be used. In an embodiment of the invention, the optimization calculations are performed using a parallel-lookup data structure. Step S2 specifically includes:
s2.1: traversing each attribute p in the set L, and respectively executing the steps S2.2-S2.4 to the attribute p;
s2.2: a union set DS ({ p } is initialized for attribute p). In the parallel lookup set, each node u corresponds to a tree and contains three attribute values u ({ p }) parent, u ({ p }) rank, and u ({ p }) size. Wherein u ({ p }) parent is a root node of u in DS ({ p }), and the initial value is u itself; u ({ p }) rank is the height value from the u node to the root node, and the initial value is 0; u ({ p }) size is the number of root vertices in the tree, with an initial value of 1;
s2.3: for edges in RDF graphs
Figure BDA0003385254220000041
If its attribute is p, the trees corresponding to u and u 'in the union set DS ({ p }) can be merged, i.e., weakly connected components containing u and u' can be merged. During the merging process, the root vertex of the tree with smaller rank points to the root vertex of the tree with larger rank. After all the edges with the attribute p are processed, if the induced subgraph G of the attribute p [ { p }]Two vertices are in the same connected component, and then the two vertices are also in the same tree in the union set DS ({ p });
s2.4: calculating an attribute p as the cost of the internal attribute;
because the method of the present invention requires that the number of vertices in each partition does not exceed (1+ epsilon) × V |/k in order to ensure load balancing between partitions, the cost is defined based on the size of the weakly connected component in this embodiment. In particular, for a set of attributes
Figure BDA0003385254220000042
The cost of L' as an internal attribute is defined as follows:
Figure BDA0003385254220000043
where c is a weakly connected component in WCC (G [ L ], | c | represents the number of vertices in c. Based on the cost function, the cost of the weakly connected component corresponding to each attribute can be calculated.
S3: selecting as many internal attributes L as possible from the attribute set LinSo as to minimize the spanning attributes, each internal attribute LinThe corresponding weakly connected component in the data graph is used as a super point to obtain a coarsened graph of the data graph. In the coarsened graph, the super points may be connected by a spanning edge;
giving a minimum attribute partition of the data graph G, assigning a unique attribute to each edge in the G, and obtaining a data graph marked as
Figure BDA0003385254220000044
At this time, at
Figure BDA0003385254220000045
The minimum attribute cut is calculated in G. Also, because the minimum edge-cut problem is an NP-complete problem, the minimum attribute-cut problem is also an NP-complete problem. Just because the minimum attribute cut problem has this characteristic, in this embodiment, a heuristic greedy algorithm is used to select the internal attribute, which specifically includes the following steps:
s3.1: set the internal attributes LinInitialization is null;
s3.2: judging whether the attribute set L is empty, and if the attribute set L is empty, ending the iteration; otherwise, respectively executing the steps S3.3-S.3.8, and continuing the next iteration;
s3.3: minimum cost mincost set to infinity, optimal attribute poptSet to null;
s3.4: traversing the attribute set L, and respectively executing the steps S3.5 and S3.6 on the attribute p;
s3.5: calculating WCC (G [ L ]in∪{p}]);
In this embodiment, in order to improve the computational efficiency of the weakly connected components, the co-query data structure is used for optimization. Initially, the set DS (L) will be looked upinU { p }) is set to DS (L)in). For vertex u in DS ({ p }), root vertex uRoot of the tree corresponding to DS ({ p }) can be obtained in a recursive manner. Then, at DS (L)inU and uRoot vertexes are respectively obtained in U { p }), and if the u and uRoot vertexes are different, corresponding trees are merged.
S3.6: if Cost (L)inp) Less than (1+ ε) × V |/k, and at the same time less than mincost, will Cost (L)inp) Assign mincost to p and assign p to poptThen, the step S3.4 is carried out; otherwise, mincost and poptKeeping unchanged, and directly switching to the step S3.4;
s3.7: if after steps S3.4-S3.6, the optimal property poptIf the state is still empty, the process ends in step S3, and proceeds to step S4; otherwise, go to step S3.8;
s3.8: deleting an attribute p from an attribute set LoptThen p is addedoptAdding to an internal Property set LinThen, step S3.2 is carried out to continue to select the internal attribute;
taking fig. 2 as an example, the original data map has 12 vertices and 6 edge attributes, and after the processing of step S3, the internal attribute L is selectedin{ starring, residual, producer, spout, found date }. The edges of the internal property are the thickened edges in fig. 2, which form two weakly connected components. In the coarsened graph, the two weakly connected components each form a super point, and the super points are connected by an edge spanning the property birthPlace.
S4: and (3) carrying out division on the super points in the coarsening graph by using a vertex partition algorithm, and ensuring that the number of the vertexes in each partition does not exceed (1+ epsilon) × V/k during the division. Wherein epsilon is the user-defined maximum imbalance proportion, and k is the partition number;
because the number of the vertexes in the coarsened graph is far smaller than that of the original data graph, the vertexes in the coarsened graph can be partitioned by using any partitioning algorithm divided by the vertexes at the moment without worrying about long time consumption. For example, hash partitioning, METIS partitioning, etc. are used. In this embodiment, S4 specifically includes:
s4.1: taking the number of vertexes inside the overtop in the coarsening graph as the weight of the overtop, thereby using weighted Hash division on the coarsening graph and ensuring that the number of vertexes of the final data partition does not exceed (1+ epsilon) x V/k;
s4.2: the super point set divided into the same partition in the step S4.1 is inversely coarsened into a final partition, namely, an original data point contained in the super point set is divided into a partition in an original data graph;
taking fig. 2 as an example, if the number of partitions is 2, the two super points in fig. 2 are each a partition, that is, the original data map is divided into two partitions by the dashed line in fig. 2, so as to obtain the final minimum attribute divided partition.
S5: decomposing the SPARQL query to be processed into a group of sub-queries which can be executed independently;
in the real SPARQL query task, the query is likely not executable independently. In order to fully utilize the advantages of the minimum attribute segmentation data partitioning and reduce the connection between partitions, the original query needs to be decomposed into a group of sub-queries which can be executed independently. In this embodiment, step S5 specifically includes:
s5.1: initializing an empty set
Figure BDA0003385254220000055
The set is used for storing the decomposed sub-queries;
s5.2: deleting the edges with the edge attribute as variable or spanning attribute in the SPARQL query to obtain a group of weakly connected components WCCs (q)'1,q′2,...,q′x};
S5.3: traversing the edge with the edge attribute as variable or crossing attribute in SPARQL query
Figure BDA0003385254220000051
Executing steps S5.4-S5.5 to the edge;
s5.4: if v is1And v2If they belong to the same sub-query, add edges to the sub-query in which they are located
Figure BDA0003385254220000052
Then, the step S5.3 is carried out to continue a new iteration; otherwise, go to step S5.5;
s5.5: if | q (v)1) | is less than or equal to | q (v)2) If you want to be able to put the edge on
Figure BDA0003385254220000053
Addition to q (v)2) Otherwise, add to q (v)1) In, i.e. to be edged
Figure BDA0003385254220000054
Addition to v1And v2The sub-query with more vertexes belongs to. Then, step S5.3 is carried out to continue a new iteration;
s5.6: traversing sub-queries q 'in WCCs'iIf q'iThe number of vertexes in is more than 1, then q'iJoin to a collection
Figure BDA0003385254220000061
In (1). Here, the query with the number of vertices 1 is not considered because: such queries contain only one query point, the number of matching results is large and meaningless, and other queries contain the query point;
taking FIG. 3 as an example, after step S5.2, three sub-queries q 'are obtained'1、q′2、q′3. Because of query q'1Is greater than q'2So as to cross attribute edges
Figure BDA0003385254220000062
Add to query q'1In (1). Because of q'2And q'3The number of vertices is the same, so the edges
Figure BDA0003385254220000063
May be added to either one of the two. Hypothetical edge
Figure BDA0003385254220000064
Is added to q'2In (3), the final decomposed sub-query is q in FIG. 31、q2
S6: and executing the decomposed sub-queries in parallel in each partition to obtain a matching result. In this embodiment, step S6 specifically includes:
s6.1: the main node of the distributed RDF system broadcasts the decomposed sub-queries to all the slave nodes, and after the slave nodes receive the sub-queries, sub-graph matching is executed in parallel inside the partitions to obtain an intermediate matching result;
s6.2: and carrying out inter-partition connection on the intermediate matching results in each node to obtain a final matching result, and collecting the result into the main node.
In summary, the invention provides a distributed SPARQL query optimization method based on minimum attribute segmentation on the basis of considering the edge attribute in the RDF data graph, so that query types capable of being independently executed are expanded, connection between partitions is reduced, data communication time is reduced, and query efficiency is improved.
The above embodiments are only preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements, etc. made thereto should be included within the scope of the present invention.

Claims (5)

1. A distributed SPARQL query optimization method based on minimum attribute cut is characterized by comprising the following steps:
(1) reading an original RDF data graph, and storing an edge attribute set L;
(2) calculating the weakly connected component and the corresponding cost of each edge attribute;
(3) selecting internal attributes as much as possible to obtain a coarsening graph of the data graph;
(4) carrying out vertex division on the coarsening graph, and carrying out anti-coarsening treatment to obtain a final partition;
(5) decomposing the SPARQL query into a set of independently executable subqueries;
(6) and executing the decomposed sub-queries in parallel in each partition to obtain a matching result.
2. The distributed SPARQL query optimization method based on minimum attribute segmentation as claimed in claim 1, wherein step 2 is to use the size of the weakly connected component as the cost of the attribute in order to measure the attribute when selecting the internal attribute when calculating the weakly connected component.
3. The distributed SPARQL query optimization method based on minimum attribute segmentation as claimed in claim 1, wherein in step 3, when processing static graph data, the number of edge attributes is fixed and unchanged, and the types are only internal attributes and two types of cross attributes; by using a heuristic greedy algorithm to select more internal attributes as much as possible, the minimum spanning attributes are realized, namely the minimum attribute cutting purpose is achieved; and after the internal attribute is selected, each weakly connected component in the internal attribute is used as a super point to obtain a coarsened graph of the data graph.
4. The distributed SPARQL query optimization method based on minimum attribute segmentation as claimed in claim 1, wherein in step 4, when the vertex partition is performed on the coarsened graph, any one of partition algorithms divided by the vertex, such as hash and METIS, can be used, but when the partition is performed, the number of vertices in each partition is ensured not to exceed (1+ epsilon) x V/k, so as to achieve load balance between the partitions; wherein epsilon is the user-defined, maximum imbalance ratio, and k is the number of partitions.
5. The method of claim 1, wherein step 5 decomposes the original SPARQL query according to the cross-attribute obtained in step 3, the decomposed subqueries can be executed independently in partitions, and the shape of the subqueries is not limited to star queries.
CN202111451035.3A 2021-12-01 Distributed SPARQL query optimization method based on minimum attribute cut Active CN114116785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111451035.3A CN114116785B (en) 2021-12-01 Distributed SPARQL query optimization method based on minimum attribute cut

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111451035.3A CN114116785B (en) 2021-12-01 Distributed SPARQL query optimization method based on minimum attribute cut

Publications (2)

Publication Number Publication Date
CN114116785A true CN114116785A (en) 2022-03-01
CN114116785B CN114116785B (en) 2024-09-24

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356977A (en) * 2022-03-16 2022-04-15 湖南大学 Distributed RDF graph query method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101854284B1 (en) * 2016-12-26 2018-05-03 충북대학교 산학협력단 Distributed RDF query processing system for reducing join cost and communication cost
CN108520035A (en) * 2018-03-29 2018-09-11 天津大学 SPARQL parent map pattern query processing methods based on star decomposition
CN109710638A (en) * 2019-01-01 2019-05-03 湖南大学 A kind of multi-query optimization method on federation type distribution RDF data library
CN112835920A (en) * 2021-01-22 2021-05-25 河海大学 Distributed SPARQL query optimization method based on hybrid storage mode
CN112883063A (en) * 2021-02-15 2021-06-01 湖南大学 SPARQL query processing method on partition-based distributed RDF system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101854284B1 (en) * 2016-12-26 2018-05-03 충북대학교 산학협력단 Distributed RDF query processing system for reducing join cost and communication cost
CN108520035A (en) * 2018-03-29 2018-09-11 天津大学 SPARQL parent map pattern query processing methods based on star decomposition
CN109710638A (en) * 2019-01-01 2019-05-03 湖南大学 A kind of multi-query optimization method on federation type distribution RDF data library
CN112835920A (en) * 2021-01-22 2021-05-25 河海大学 Distributed SPARQL query optimization method based on hybrid storage mode
CN112883063A (en) * 2021-02-15 2021-06-01 湖南大学 SPARQL query processing method on partition-based distributed RDF system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨程: "分布式环境下大规模资源描述框架数据划分方法综述", 《计算机应用》, vol. 40, no. 11, 22 July 2020 (2020-07-22) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356977A (en) * 2022-03-16 2022-04-15 湖南大学 Distributed RDF graph query method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108959613B (en) RDF knowledge graph-oriented semantic approximate query method
Kim et al. Taming subgraph isomorphism for RDF query processing
Sun et al. Efficient subgraph matching on billion node graphs
Zeng et al. A distributed graph engine for web scale RDF data
US9292570B2 (en) System and method for optimizing pattern query searches on a graph database
Meimaris et al. Extended characteristic sets: graph indexing for SPARQL query optimization
CN110134714B (en) Distributed computing framework cache index method suitable for big data iterative computation
CA2973356A1 (en) Distributed storage and distributed processing query statement reconstruction in accordance with a policy
Huang et al. Query optimization of distributed pattern matching
CN104298598B (en) The adjustment method of RDFS bodies under distributed environment
CN104699698A (en) Graph query processing method based on massive data
US20130275410A1 (en) Live topological query
US20070078816A1 (en) Common sub-expression elimination for inverse query evaluation
CN108520035A (en) SPARQL parent map pattern query processing methods based on star decomposition
Chaitanya et al. Efficient multicore algorithms for identifying biconnected components
CN111881160A (en) Distributed query optimization method based on equivalent expansion method of relational algebra
CN105550332A (en) Dual-layer index structure based origin graph query method
CN110032676B (en) SPARQL query optimization method and system based on predicate association
CN110245271B (en) Large-scale associated data partitioning method and system based on attribute graph
CN116383247A (en) Large-scale graph data efficient query method
CN114116785A (en) Distributed SPARQL query optimization method based on minimum attribute cut
CN114116785B (en) Distributed SPARQL query optimization method based on minimum attribute cut
Muhammad et al. Multi query optimization algorithm using semantic and heuristic approaches
Wang et al. RDF partitioning for scalable SPARQL query processing
CN109063048A (en) A kind of matched data cleaning method of knowledge based library figure and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant