CN107291807B - SPARQL query optimization method based on graph traversal - Google Patents

SPARQL query optimization method based on graph traversal Download PDF

Info

Publication number
CN107291807B
CN107291807B CN201710343003.9A CN201710343003A CN107291807B CN 107291807 B CN107291807 B CN 107291807B CN 201710343003 A CN201710343003 A CN 201710343003A CN 107291807 B CN107291807 B CN 107291807B
Authority
CN
China
Prior art keywords
data
rdf
node
traversal
bigtable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710343003.9A
Other languages
Chinese (zh)
Other versions
CN107291807A (en
Inventor
李亮
沈志宏
周园春
黎建辉
朱小杰
刘东江
李跃鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201710343003.9A priority Critical patent/CN107291807B/en
Publication of CN107291807A publication Critical patent/CN107291807A/en
Application granted granted Critical
Publication of CN107291807B publication Critical patent/CN107291807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation

Abstract

The invention discloses a SPARQL query optimization method based on graph traversal. The method comprises the following steps: 1) representing the triples in the RDF data by using the attribute map, and then storing the RDF data by using a Bigtable model to obtain Bigtable data corresponding to the RDF data; 2) converting the SPARQL query into traversal of the RDF attribute graph; 3) traversing all nodes meeting the conditions in the Bigtable data according to the traversal sequence obtained in the step 2), and completing SPARQL query. On one hand, the method eliminates the dependence of the traditional SPARQL query on data structures such as Hash and the like, reduces the generation of intermediate data, and avoids the connection calculation of large-scale RDF data; on the other hand, the large data processing technology based on Bigtable can be effectively utilized to store and manage RDF mass associated knowledge network data, and query and analysis of the RDF associated data are accelerated.

Description

SPARQL query optimization method based on graph traversal
Technical Field
The invention relates to a graph traversal-based SPARQL query execution method, in particular to a big data association-oriented storage and query method and system.
Background
The graph data mining and analysis is a new field of big data, and supports information mining and scientific discovery based on data association by establishing association relations of associated world wide web resources, microbial strain resources, scientific research resources and the like. A Resource Description Framework (RDF) is a language for expressing information about World Wide Web resources, and can express information about anything that can be identified on the internet, such as page titles, authors, and modification times, and associations between different data. The RDF specification provides a basic vocabulary for describing resources, and defines rules that must be followed when a resource vocabulary is described by various field applications, such as WDCM (Mercen World Data Centre for microorganisms). Sparql (sparql Protocol and RDF Query language) is a Query language and data acquisition Protocol developed for RDF, defined by the RDF data model recommended by the international standards organization of W3C, for querying any information resource that can be represented by RDF. The SPARQL protocol and RDF query language (SPARQL) formally became a recommendation for W3C on month 1 and 15 of 2008.
Because RDF uses structured XML data, retrieval and query can understand the precise meaning of metadata, the search becomes more intelligent and accurate, and the condition that irrelevant data is often returned in retrieval is effectively avoided. The RDF file comprises a plurality of resource descriptions, each resource description is composed of a plurality of statements, each statement is composed of a resource, an attribute type and an attribute value to form a triple, and the triple indicates that the resource has one attribute. The resource corresponds to a subject in a natural language, the attribute type corresponds to a predicate, the attribute value corresponds to an object, and the plurality of RDF resource files form a complete resource description and association diagram. With the data size of the associated network becoming larger and larger, the data types expressed and processed by the associated network become more and more, and the real-time property of RDF data storage and SPARQL query is challenged. Therefore, the storage and management efficiency of RDF data is improved by adopting an expandable novel big data architecture, and the SPARQL query speed and the analysis capability are improved very importantly. Graph data storage and management frameworks based on large graph data processing technologies such as Bigtable are a new direction for knowledge-graph networks due to their excellent large-scale data processing capabilities.
At present, RDF data mainly adopts a relational database table or a KV data warehouse to store and manage RDF triples, implement subgraph matching and SPARQL query of subjects, predicates and objects of RDF triples in a self-connection manner, and support quick query and retrieval of local data through Hash or Index, which is typically implemented as Virtuoso RDF graph database. The distributed version mainly adopts a federal mode, and the RDF data query and distributed computation framework are fused into a unified framework, specifically comprising the following steps: and analyzing and distributing the SPARQL query to each node, and operating subgraph matching calculation by each node and then summarizing the matching result of each node. The framework is simple and easy to implement, distributed query and quick return of RDF data are supported, and design and development facing to a large-scale knowledge correlation network are simplified.
However, in a distributed query mode based on federation and subgraph matching, each query needs to be decomposed into subgraph matching, the subgraph matching is distributed to a plurality of nodes, and the subgraph matching is executed and results are returned, so that a large amount of node communication and intermediate data are easily caused. When the data size is very large, the system faces the following problems:
1) high overhead self-join operations. For distributed systems, the join operation of the data tables results in a large amount of data communication between the nodes of the system. When the data volume is large and the number of machine nodes is large, the self-connection cost is large, the query delay is obviously increased, and the system is not beneficial to transverse expansion.
2) A large amount of intermediate data. After the SPARQL query is decomposed, the SPARQL query is distributed to a plurality of nodes to be operated respectively, each node is equivalent to a query engine, and the operation of the node generates a large amount of intermediate data, so that the memory consumption of the system is increased, and the number of SPARQLs processed by the system in parallel is reduced.
3) The requirements for data fragmentation are high. The federated query distributes the SPARQL query to each node to be completed respectively, and the requirement on the quality of the data fragments is high. If a large number of incidence relations exist among different partitions, SPARQL subgraph matching cannot be executed in parallel on a plurality of data nodes, and the operating efficiency of the system is reduced.
Due to the problems, the SPARQL query based on the federal mode is difficult to effectively deal with the large-scale growth of large-scale RDF associated data and meet the real-time query requirement of knowledge network associated application, and the query time is increased along with the growth of the data size. However, the Bigtable-based data processing technology is difficult to be applied to processing of massive RDF knowledge network associated data due to the lack of large-scale table connection operation (Join).
Disclosure of Invention
Aiming at the problems of RDF big data SPARQL query, the invention aims to provide a SPARQL query optimization method based on graph traversal.
The technical scheme of the invention is as follows:
a SPARQL query optimization method based on graph traversal comprises the following steps:
1) representing the triples in the RDF data by using the attribute map, and then storing the RDF data by using a Bigtable model to obtain Bigtable data corresponding to the RDF data;
2) converting the SPARQL query into traversal of the RDF attribute graph;
3) traversing all nodes meeting the conditions in the Bigtable data according to the traversal sequence obtained in the step 2), and completing SPARQL query.
The method for storing RDF data by using the Bigtable model comprises the following steps:
21) for each RDF triple (sub, pre, obj) in the RDF data, storing a subject sub as a node v as one line in a Bigtable model;
22) judging the type of the object obj: a) if the object obj is rdf, i.e. the predicate pre is the attribute of the subject sub, taking the object obj as the attribute value of the node v, and then taking the predicate pre as the attribute name of the node v to be stored as a storage unit Cell in the row of the node v; b) if the object obj is rdf, resource, namely the predicate pre is the edge of the subject sub associated to other nodes, the object obj is taken as an independent node w and is stored as one line in the Bigtable model; then, the predicate pre is taken as the outgoing edge of the node v, points to the node w, and is stored as a Cell of the row where the node v is located, and the predicate pre is taken as the incoming edge of the node w, comes from the node v, and is stored as a Cell of the row where the node w is located.
Further, in the step 2), the SPARQL query is converted into a Gremlin graph traversal, so that the SPARQL query is converted into a traversal of the RDF attribute graph.
Further, the method for converting the SPARQL query into the Gremlin graph traversal comprises the following steps: for each triple (sub, pre, obj) in the where clause of the SPARQL query, if the triple is a triple in which a predicate pre represents an attribute in an attribute graph, converting the triple into a filter has (pre, obj) of a node in the attribute graph represented by a subject sub, where obj is a filter condition and pre is an attribute name represented by the predicate pre; if the triple is a triple of which the predicate pre represents an edge in the attribute graph, converting the triple into traversal sub.out (pre) - > obj from a node represented by the subject sub to a node represented by the object obj, wherein out represents the edge, and pre represents an associated label of the edge represented by the predicate pre; and processing all the triples of the where clause to obtain the traversal sequence.
Further, according to the traversal sequence obtained in step 2), the method for traversing all nodes meeting the conditions in the Bigtable data comprises the following steps: the traversal sequence is formed by filtering and associated edges; firstly, forming a traversal path according to the associated edges, and then eliminating an invalid traversal path by filtering to obtain an effective traversal path; and traversing all nodes meeting the conditions in the Bigtable data according to the effective traversal path, and then organizing the attribute values of the nodes and edges of the effective traversal path.
The method comprises the steps of Bigtable data storage, SPARQL to Gremlin traversal conversion and Gremlin graph traversal execution.
The function of which is described below:
1) storing RDF data based on Bigtable model
RDF is an internet-oriented graph data format, and represents graph data by using a < Subject, predicate, object > tuple, where the Subject (Subject) represents a graph node, the predicate is an attribute name of a sub-Subject node, and the corresponding object obj is an attribute. The Object mainly comprises two cases of rdf, namely literal and rdf, resource, wherein the former represents the attribute of the subject node, and the attribute is named as predicate; the latter represents other nodes to which the subject node is associated, the predicate identifies edges of the subject associated with the other nodes, and the labels of the edges are predicates. The attribute map corresponding to the RDF map is shown in fig. 1(a), and its Bigtable is shown in fig. 1 (b).
The invention uses Bigtable to store RDF graphs, aiming at RDF data triples: the Subject (Subject), Predicate (Predicate), and Object (Object) are structurally analyzed according to their characteristics and the RDF model (i.e., RDF ontology) as follows:
and aiming at the RDF triple of sub pre obj, storing the subject sub as a node v as one line in the Bigtable model.
If the object obj is rdf: literal, the object obj is used as the attribute value of the node v, the predicate pre is used as the attribute name of the node v, and the predicate is stored as one Cell (storage unit) of the row where v is located. Otherwise, the obj is taken as an independent node w and stored as a row in the Bigtable. Taking the predicate pre as an outgoing edge of a subject sub node v, pointing to the node w, and storing the predicate pre as a Cell in a row of the node v; the predicate pre is taken as an entry of the object obj node w, comes from the node v (i.e. the departure point of the entry is the node v), and is stored as one Cell on the row where the node w is located.
Lieral is a type of object, and is analyzed by Apache Jena or other RDF tools. The Bigtable is a storage structure of mass data, and can efficiently support storage and management of mass graph data. 2) Converting SPARQL queries into Gremlin graph traversals
As shown in fig. 2, the SPARQL query represents sub-graph matching using multiple RDF triple sequences of where clauses. The invention realizes SPARQL subgraph matching by graph traversal, and converts triple sequences in the where clause of the SPARQL query into filtering and traversal of nodes and edges in the graph. For each sub pre obj. And converting the triple of the attribute represented by the predicate pre into a filter has (pre, obj) of the node in the attribute graph represented by the sub, wherein obj is a filter condition, and pre is an attribute name represented by the predicate pre. For the triple of the edges in the attribute graph represented by the predicate pre, the triple is converted into the traversal from the node represented by sub to the node represented by obj, the gremlin code is sub.out (pre) - > obj, the out represents the edge, and the pre represents the associated label of the edge represented by the predicate pre.
And processing all the triples of the where clause to obtain a traversal sequence consisting of the filtering and the associated edges.
3) Graph traversal execution
And (3) traversing all nodes meeting the conditions in the Bigtable data obtained by storing the RDF data based on the Bigtable model in the step 1) aiming at the traversal sequence obtained in the step (2). Its associated edges constitute a traversal path, and filtering is used to eliminate invalid traversal paths. And after traversing is finished, organizing the attribute values of the nodes and edges of the effective traversing path, and returning according to the requirements of users. Because the access is directly to Bigtable, the query process basically has no generation of intermediate data. The RDF data refers to a graph data representation method defined by the W3C International Standard organization, the association relationship between the data and the data is represented by adopting triples, and the subject, the predicate and the object refer to a standard data structure of the RDF data. The Gremlin refers to a standard language for traversal of the map-oriented data.
The invention has the beneficial effects that:
aiming at the problems of poor expandability and low query efficiency of the existing large-scale RDF graph data storage, a Bigtable-based RDF graph data storage and query method is provided, and the horizontal expansion of the large-scale graph data is supported by converting the RDF graph data into a data format of a Bigtable model; by converting the SPARQL query into the graph traversal-based Bigtable data access, the problems of high cost and poor expansibility caused by RDF data connection are avoided, and the generation of intermediate data in the query process is reduced. And because the SPARQL query process is converted into the access to Bigtable, the access times can be reduced by utilizing the cache.
The method solves the problems of distributed storage and low-delay query of massive large-scale RDF data, eliminates the dependence of the traditional SPARQL query on data structures such as Hash and the like, reduces the generation of intermediate data, and avoids the connection calculation of the large-scale RDF data; on the other hand, the large data processing technology based on Bigtable can be effectively utilized to store and manage RDF mass associated knowledge network data, and the query and analysis of the RDF associated data are accelerated by utilizing a cache technology and an index technology.
Drawings
FIG. 1 is a representation of storing RDF data based on a Bigtable model;
(a) RDF data attribute graph, (b) RDF data graph based on Bigtable model;
FIG. 2 is a diagram traversal-based SPARQL query execution flow diagram;
fig. 3 is a comparison graph of a small data set vs a large data set.
Detailed Description
In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.
As shown in FIG. 2, the SPARQL execution engine based on graph traversal is composed of Bigtable data storage, SPARQL to Gremlin conversion and graph traversal execution. According to the RDF-faulral characteristic and the RDF-resource characteristic of the RDF triple object, the incidence relation and the faulral value of the RDF triple are represented by using the attribute map, RDF data are stored and managed by using the Bigtable data model, and query and analysis of the RDF incidence data by using SPARQL are realized by using graph traversal.
At present, a data warehouse facing RDF data stores and manages RDF knowledge network data by taking triples as basic units, and subgraph matching is realized by relying on table self-connection, so that SPARQL query and analysis of RDF data are realized.
According to the invention, Bigtable is adopted to store and manage RDF data, SPARQL query and analysis are converted into traversal of an RDF attribute graph, and SPARQL query is completed through access to Bigtable. Therefore, the distributed expansion of the RDF mass data is realized, and the adverse effect of the connection operation on the SPARQL query is avoided. Bigtable storage and SPARQL to Gremlin translation of RDF data is described below as designed to support rapid retrieval and analysis of RDF data as shown in FIG. 1:
bigtable data storage: as shown in fig. 1, RDF triple data is represented as an attribute map as shown in fig. 1 (a). If < tax1> < type > < tax > ", its object type is rdf, then it is converted into attribute value of the node represented by the subject tax1, and the attribute name is predicate; "< tax1> < x-taxon > < gene1 >", where the object is RDF: resource, it is converted into the edge where the node represented by the subject tax1 points to the node of the gene1 represented by the object, and the predicate is the label of the edge, so all RDF triples and attribute maps are corresponded, and the Bigtable data structure is used to store the attributes and the edges.
SPARQL to Gremlin: and converting the triples of the where clause in the SPARQL query into the traversal of the attribute graph, wherein the SPARQL is a query language facing RDF data, and the Gremlin is a traversal language facing the attribute graph.
Triple for where clause "? tax < type > < tax index > ", because type represents the attribute in the attribute map, turn this association into the filtration has (type, tax index) step to attribute type of node in gremlin, its execution process is according to filtering to BigTable data of tax index, reduce and traverse the scope, raise and traverse the efficiency.
For "? tax < x-taxon >? "since x-taxon is an edge between nodes in the attribute graph, the association is converted into a traversal out (x-taxon) step of the gremlin for the edge of the node represented by tax, which is implemented by accessing the Bigtable data.
The method selects a 3 hundred million scale small data set, a 30 hundred million scale large data set and 16 standard SPARQL queries of a microorganism associated data set WDCM to test the invention, provides a specific implementation process of the SPARQL execution engine which is based on graph traversal and faces large data, and comprises the steps of repeating the query process for 10 times and removing the maximum value and the minimum value.
The testing environment is 4 HBase clusters supporting BigTable, the HBase version is 0.98.23.hadoop1, each node is 32G memory, 12-core CPU and 28T disk, and the nodes are interconnected through a gigabit switch. The Gremlin query and analysis engine was Titan 1.0.0, and the SparQLToGremlin transformation was developed by the project team.
The query run times obtained using the system of the present invention are as follows:
as shown in fig. 3, for a small data set of 3 hundred million size, the time required for query is around 1 s.
For a large data set with the size of 30 hundred million triples, because the traversal load of the graph is effectively dispersed in a plurality of nodes, the time of 16 query statements is less than 1s, and is better than the query time of a small data set.
The experimental result shows that aiming at the continuously increased RDF data, the method can effectively utilize the advantages of Bigtable distributed data storage and the advantages of graph traversal query, keep the query time constant, and well solve the problem that the SPARQL query time is obviously increased when the RDF data is increased in a large scale at present.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and a person skilled in the art can make modifications or equivalent substitutions to the technical solution of the present invention without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims (4)

1. A SPARQL query optimization method based on graph traversal comprises the following steps:
1) representing the triples in the RDF data by using the attribute map, and then storing the RDF data by using a Bigtable model to obtain Bigtable data corresponding to the RDF data;
2) converting the SPARQL query into traversal of the RDF attribute graph;
3) traversing all nodes meeting the conditions in the Bigtable data according to the traversal sequence obtained in the step 2), and completing SPARQL query;
the method for storing RDF data by using the Bigtable model comprises the following steps: 21) for each RDF triple (sub, pre, obj) in the RDF data, storing a subject sub as a node v as one line in a Bigtable model; 22) judging the type of the object obj: a) if the object obj is rdf, i.e. the predicate pre is the attribute of the subject sub, taking the object obj as the attribute value of the node v, and then taking the predicate pre as the attribute name of the node v to be stored as a storage unit Cell in the row of the node v; b) if the object obj is rdf, resource, namely the predicate pre is the edge of the subject sub associated to other nodes, the object obj is taken as an independent node w and is stored as one line in the Bigtable model; then, the predicate pre is taken as the outgoing edge of the node v, points to the node w, and is stored as a Cell of the row where the node v is located, and the predicate pre is taken as the incoming edge of the node w, comes from the node v, and is stored as a Cell of the row where the node w is located.
2. The method of claim 1, wherein in step 2), converting the SPARQL query into a Gremlin graph traversal, implements converting the SPARQL query into a traversal of the RDF attribute graph.
3. The method of claim 2, wherein the method of translating the SPARQL query into a Gremlin graph traversal is by: for each triple (sub, pre, obj) in the where clause of the SPARQL query, if the triple is a triple in which a predicate pre represents an attribute in an attribute graph, converting the triple into a filter has (pre, obj) of a node in the attribute graph represented by a subject sub, where obj is a filter condition and pre is an attribute name represented by the predicate pre; if the triple is a triple of which the predicate pre represents an edge in the attribute graph, converting the triple into traversal sub.out (pre) - > obj from a node represented by the subject sub to a node represented by the object obj, wherein out represents the edge, and pre represents an associated label of the edge represented by the predicate pre; and processing all the triples of the where clause to obtain the traversal sequence.
4. The method of claim 1, wherein the method of traversing all nodes satisfying the condition in the Bigtable data according to the traversal sequence obtained in step 2) is: the traversal sequence is formed by filtering and associated edges; firstly, forming a traversal path according to the associated edges, and then eliminating an invalid traversal path by filtering to obtain an effective traversal path; and traversing all nodes meeting the conditions in the Bigtable data according to the effective traversal path, and then organizing the attribute values of the nodes and edges of the effective traversal path.
CN201710343003.9A 2017-05-16 2017-05-16 SPARQL query optimization method based on graph traversal Active CN107291807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710343003.9A CN107291807B (en) 2017-05-16 2017-05-16 SPARQL query optimization method based on graph traversal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710343003.9A CN107291807B (en) 2017-05-16 2017-05-16 SPARQL query optimization method based on graph traversal

Publications (2)

Publication Number Publication Date
CN107291807A CN107291807A (en) 2017-10-24
CN107291807B true CN107291807B (en) 2020-10-16

Family

ID=60095186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710343003.9A Active CN107291807B (en) 2017-05-16 2017-05-16 SPARQL query optimization method based on graph traversal

Country Status (1)

Country Link
CN (1) CN107291807B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309334B (en) * 2018-04-20 2023-07-18 腾讯科技(深圳)有限公司 Query method, system, computer device and readable storage medium for graph database
EP3794466B1 (en) * 2018-06-15 2023-07-19 Huawei Cloud Computing Technologies Co., Ltd. System for handling concurrent property graph queries
CN109033260B (en) * 2018-07-06 2021-08-31 天津大学 Knowledge graph interactive visual query method based on RDF
CN109271458A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of network of personal connections querying method and system based on chart database
CN110110034A (en) * 2019-05-10 2019-08-09 天津大学深圳研究院 A kind of RDF data management method, device and storage medium based on figure
CN110543585B (en) * 2019-08-14 2021-08-31 天津大学 RDF graph and attribute graph unified storage method based on relational model
CN111026747A (en) * 2019-10-25 2020-04-17 广东数果科技有限公司 Distributed graph data management system, method and storage medium
CN110990426B (en) * 2019-12-05 2022-10-14 桂林电子科技大学 RDF query method based on tree search
CN112256927A (en) * 2020-10-21 2021-01-22 网易(杭州)网络有限公司 Method and device for processing knowledge graph data based on attribute graph
CN112559780A (en) * 2020-12-09 2021-03-26 南京航空航天大学 RDF data search method based on graphic database Neo4j
CN113448964B (en) * 2021-06-29 2022-10-21 四川蜀天梦图数据科技有限公司 Hybrid storage method and device based on graph-KV
CN114625899B (en) * 2022-03-14 2023-09-08 北京百度网讯科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN114817262B (en) * 2022-04-27 2023-03-28 电子科技大学 Graph traversal algorithm based on distributed graph database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding
CN105210058A (en) * 2012-12-14 2015-12-30 微软技术许可有限责任公司 Graph query processing using plurality of engines

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129039B2 (en) * 2011-10-18 2015-09-08 Ut-Battelle, Llc Scenario driven data modelling: a method for integrating diverse sources of data and data streams
US10387496B2 (en) * 2015-05-21 2019-08-20 International Business Machines Corporation Storing graph data in a relational database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105210058A (en) * 2012-12-14 2015-12-30 微软技术许可有限责任公司 Graph query processing using plurality of engines
CN104462609A (en) * 2015-01-06 2015-03-25 福州大学 RDF data storage and query method combined with star figure coding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于图数据库的RDF数据分布式存储;项灵辉等;《计算机应用与软件》;20141115;第31卷(第11期);正文第2-3节 *
大规模RDF图数据的正则路径查询研究;姜龙翔;《中国优秀硕士学位论文全文数据库信息科技辑》;20150515;正文第3章 *

Also Published As

Publication number Publication date
CN107291807A (en) 2017-10-24

Similar Documents

Publication Publication Date Title
CN107291807B (en) SPARQL query optimization method based on graph traversal
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
Das et al. A Tale of Two Graphs: Property Graphs as RDF in Oracle.
US11941034B2 (en) Conversational database analysis
CN109446279A (en) Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN105138661A (en) Hadoop-based k-means clustering analysis system and method of network security log
JP6964384B2 (en) Methods, programs, and systems for the automatic discovery of relationships between fields in a mixed heterogeneous data source environment.
Elzein et al. Managing big RDF data in clouds: Challenges, opportunities, and solutions
CN110909111B (en) Distributed storage and indexing method based on RDF data characteristics of knowledge graph
CN103678550A (en) Mass data real-time query method based on dynamic index structure
Das et al. A study on big data integration with data warehouse
Ghotiya et al. Migration from relational to NoSQL database
CN103226608A (en) Parallel file searching method based on folder-level telescopic Bloom Filter bit diagram
Cuzzocrea et al. MapReduce-based algorithms for managing big RDF graphs: state-of-the-art analysis, paradigms, and future directions
Chawla et al. Research issues in RDF management systems
Mittal et al. Efficient random data accessing in MapReduce
CN106021306A (en) Ontology matching based case search system
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
Liu et al. Finding smallest k-compact tree set for keyword queries on graphs using mapreduce
Haque et al. Distributed RDF triple store using hbase and hive
Li et al. Research on storage method for fuzzy RDF graph based on Neo4j
Liu et al. PAIRPQ: an efficient path index for regular path queries on knowledge graphs
MahmoudiNasab et al. AdaptRDF: adaptive storage management for RDF databases
CN110321456B (en) Massive uncertain XML approximate query method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant