CN110543585A

CN110543585A - RDF graph and attribute graph unified storage method based on relational model

Info

Publication number: CN110543585A
Application number: CN201910748425.3A
Authority: CN
Inventors: 王鑫; 柳鹏凯; 张然; 郭谢帆
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2019-12-06
Anticipated expiration: 2039-08-14
Also published as: CN110543585B

Abstract

the invention discloses a relational model-based RDF graph and attribute graph unified storage method, which is characterized in that two logic models of an RDF graph and an attribute graph are stored on the bottom layer by a physical model of a relational table according to the definition and the characteristics of two data models of an RDF graph model and an attribute graph model of a knowledge graph and the storage concept of the relational table, and the RDF graph and the attribute graph include the bottom layer storage of the RDF graph and the bottom layer storage of the attribute graph. The method is characterized in that all semantic information of the RDF graph and the attribute graph is stored in a relation table form at the bottom layer, and for the RDF three-hypergraph structure, one system table is maintained to realize the management of edge point duplicate storage of certain edges in the RDF graph. The invention can solve the problem of the difference of semantic expression capacities of the RDF graph and the attribute graph, realizes large-scale data storage and management of the knowledge graph, effectively reduces the redundancy of data, realizes efficient query and has stronger application value.

Description

RDF graph and attribute graph unified storage method based on relational model

Technical Field

The invention relates to the field of knowledge graphs, RDF storage and attribute graph storage.

Background

Knowledge graph is an important foundation for artificial intelligence as the latest result of symbolic meaning development. The construction and distribution of large-scale knowledge graphs in various domains presents new challenges to data management of knowledge graphs. The knowledge graph data model is based on graph structure, the relationship between entities is represented by vertexes and edges, and the general data representation can naturally describe the wide connection between things in the real world.

Currently, the knowledge graph has two main data models, namely, an RDF (Resource Description Framework) model and an attribute graph model. The former has been standardized by W3C (World Wide Web Consortium), and the latter has been widely used in graphic databases. The RDF graph model has stronger logical theory background and more perfect data model characteristics. Similar to the RDF model, the attribute graph model has built-in support for the attributes of nodes and edges. While the attribute maps have not been standardized, they have gained wide acceptance in the industry as graph databases are used. The hypergraph structure of the RDF graph proves that the RDF graph model has stronger expression capability than an attribute graph model, but a unified storage scheme is still not available so far to effectively store and manage the knowledge graph.

Over decades of development, relational models have had a high degree of maturity. Relational data models have a compact and general relational structure and represent relational operations and constraints using relational algebraic expressions with strict mathematical definitions. Therefore, the method provides a solid theoretical foundation for uniformly storing the RDF and the attribute graph by using the relational data model.

1. Existing RDF graph storage schemes:

The typical management scheme of the existing RDF graph data mainly comprises two types: namely a relationship-based approach and a graph-based approach. The relationship-based approach maps RDF graph data into relationship tables in a number of ways and then executes SPARQL queries on them. Another graph-based management scheme models RDF and SPARQL queries as graphs and queries are performed by subgraph matching.

(1) Relationship-based storage scheme

Relational databases are currently the most widely used database management systems. The storage scheme based on the relational database is a main storage method of the current knowledge graph data. The ternary group table directly stores RDF data; each row of the horizontal table records all predicates and objects of one subject; the attribute table constructs a data table according to the class of the subject, and provides an attribute table scheme and technology to solve the problem of query performance in the three-tuple table scheme; the vertical division builds data according to predicates; the six-fold index establishes 6 tables corresponding to all 6 permutations of triples. And DB2RDF in recent years improves the effect of the query by establishing an entity-oriented storage structure to reduce operations of Cartesian product in the query.

(2) Graph-based storage scheme

The graph-based storage scheme has the advantage that it can give the desired semantics of maintaining the original representation of the RDF data and enforcing SPARQL. For example, both the gStore and chameleon-db systems follow this scheme. The disadvantage of this scheme is that the cost of subgraph matching is too large and graph homomorphism is NP-complete.

2. The existing attribute map storage scheme is as follows:

The attribute graph model has built-in support for node attributes and edge attributes. The attribute map is a directed, labeled, multi-attribute map. Neo4j is a native database supporting transactional applications and graphical analysis. It is currently the most popular attribute graph database.

Disclosure of Invention

aiming at the prior art, the invention designs a unified storage scheme of the knowledge graph according to the definitions and characteristics of two data models, namely an RDF graph model and an attribute graph model, of the knowledge graph and the storage concept of a relation table, retains all semantic information of the RDF graph and the attribute graph, solves the problem of difference of semantic expression capacities of the RDF graph and the attribute graph, reduces redundancy of large-scale data, and realizes efficient query.

In order to solve the technical problems, the technical scheme of the invention is as follows: the RDF graph and attribute graph unified storage method based on the relational model stores two logic models of the RDF graph and the attribute graph in a physical model of a relational table at the bottom layer, and comprises the bottom layer storage of the RDF graph and the bottom layer storage of the attribute graph.

further, the RDF graph and attribute graph unified storage method based on the relational model is disclosed, wherein the bottom storage of the RDF graph comprises conversion of points in the RDF graph, conversion of edges in the RDF graph and conversion of materialization technologies in the RDF;

for the conversion of points in the RDF graph, the steps are as follows:

1-1) reading in RDF triples; if the RDF triple is in the form of < U1 > < RDF: type > < U2 >, execute 1-2), if the RDF triple is in the form of < U1 > < U2 > < L >, and U1 is a tuple in the node type relationship table, execute 1-4);

1-2) checking whether a relation table for recording the node type U2 is created or not, and if the relation table for recording the node type U2 is created, executing 1-3); if not, a relationship table of node type U2 is created, which has two columns of attributes: id. properties;

1-3) setting an id value for the node U1, inserting the U1 into a node relation table U2 as a tuple, and executing 1-1) circularly reading in RDF triples;

1-4) adding { U2: L } into properties of the U1 tuple, and executing 1-1) circularly reading in the RDF triple;

for the conversion of edges in the RDF graph, the steps are as follows:

2-1) reading in RDF triples; if the RDF triple is of the form < U1 > < U2 > < U3 >, perform 2-2); if the RDF triple is in the form of < U1 > < U2 > < L > and U1 is a tuple in the edge type relation table, execute 2-4);

2-2) checking whether a relation table with a record edge type of U2 is created; if the relation table of the edge node type U2 is created, executing 2-3); if not, a relationship table of the edge type U2 is created, and the relationship table has four attributes: id. start, end, properties;

2-3) setting an id value for the relation, assigning the id of the node U1 to start, assigning the id of the node U3 to end, and inserting the tuple into the relation table of the edge type U2; executing 2-1) circularly reading in the RDF triples;

2-4) adding { U2: L } into properties of the U1 tuple, and executing 2-1) circularly reading in the RDF triple;

In the invention, the bottom layer of the RDF graph is stored in a manner that the RDF triple form < U1 > < RDF, and the type > < U2 > is stored in a form of one tuple U1 in a relation table U2 of storage points at the bottom layer; RDF triple form < U1 > < U2 > < L > has attribute U2 in properties column in the underlying relationship table with storage points, whose values are L { U2: l one tuple is stored in the form of U1; RDF triple form < U1 > < U2 > < U3 > is stored in the relation table of storage edges U2 at the bottom layer in a tuple form having the id value of the start point U1 and the id value of the end point U3 as attributes.

The invention designs a specific storage form aiming at the specific technology in RDF: i.e., a predicate in one triple as the subject or object in another triple. When the RDF graph data is stored, the predicate is stored in a table of a storage Edge and a relation table of a storage point respectively, the unique point id and the Edge id are used for identification, and an Edge _ Vertex table is additionally maintained in the system and is used for storing the one-to-one correspondence between the Edge value and the id value of the predicate stored in the two relation tables of the Edge and the point. The conversion for the embodied technique in RDF is as follows:

3-1) reading in a triple of (U1 > < U2 > < U3 > and U1 is a tuple in an edge type relation table; checking whether a relation table with a record edge type of U2 is created; if the relation table of the edge node type U2 is created, executing 3-2); if not, a relationship table of the edge type U2 is created first, and has four attributes: id. start, end, properties;

2) setting point id for the Edge U1, and inserting the point id and the Edge id of U1 into a system table Edge _ Vertex;

3) setting an id value for the relation, assigning the point id of the edge U1 to start, assigning the id of the node U3 to end, and inserting the tuple into the relation table of the edge type U2; and executing 3-1) loop to read in the RDF triples.

The invention relates to a relational model-based RDF graph and attribute graph unified storage method, wherein the bottom storage of the attribute graph comprises the following steps: storing the nodes or edges with the same label in the attribute graph by using a relation table named by the label, and maintaining the attributes and attribute values of each point and edge in the attribute graph; wherein, the relation table of the point label has two attributes, including the unique identifier id of the node and the attribute properties of the node; the relation table of the edge label has four attributes, including the unique identifier id of the edge, the start point identifier start of the edge, the end point identifier end of the edge, and the attribute properties of the edge.

In the invention, the bottom storage of the attribute graph is that nodes of the attribute graph are stored in a relation table of a system bottom storage point in the form of a tuple (id, properties); the edge of the attribute graph is stored in the form of one primitive ancestor (id, start, end, properties) in the relation table of the storage edge at the bottom layer of the system.

compared with the prior art, the invention has the beneficial effects that:

The RDF graph and attribute graph unified storage method based on the relational model can simultaneously realize storage query of the RDF graph and the attribute graph and effectively realize management of the RDF in a database. The storage method provided by the invention verifies the open source database AgensGraph, realizes storage and query of large-scale RDF data and graph data, and preliminarily realizes interoperation of the RDF graph and the attribute graph.

drawings

FIG. 1 is a design flow chart of a relational model-based RDF graph and attribute graph unified storage method of the invention;

FIG. 2 is a schematic diagram of a relational table-based RDF graph and attribute graph unified storage method according to the invention;

FIG. 3 is a storage form of the RDF graph in the underlying relationship table in the present invention;

FIG. 4 is a storage form of the RDF graph embodied in the present invention in the underlying relational table;

FIG. 5 is a storage form of the attribute map in the underlying relationship table in the present invention

FIG. 6 is a graph showing the variation of storage time with storage amount in the storage method of the present invention;

FIG. 7 is a graph showing the variation trend of storage space with storage amount in the storage method of the present invention;

FIG. 8 is a comparison graph of the storage method of the present invention and the storage time introduced using AgensGraph after conversion.

Detailed Description

the invention will be further described with reference to the following figures and specific examples, which are not intended to limit the invention in any way.

As shown in fig. 1, the design idea of the relational model-based RDF graph and attribute graph unified storage method of the present invention is to analyze the structure and semantics of a main data model of a knowledge graph, namely, an RDF model and an attribute graph model, store semantic information expressed by the relational model, design a relational model-based knowledge graph unified storage model, import and store a knowledge graph based on a source database, test storage space and storage time, and perform simple update query operations to verify the validity of a knowledge graph storage scheme.

As shown in fig. 2, the method for uniformly storing an RDF graph and an attribute graph based on a relational model, provided by the invention, stores two logical models of the RDF graph and the attribute graph in a physical model of a relational table at a bottom layer, including a bottom storage of the RDF graph and a bottom storage of the attribute graph. The method is characterized in that all semantic information of the RDF graph and the attribute graph is stored in a relation table form at the bottom layer, and for the RDF three-hypergraph structure, one system table is maintained to realize the management of edge point duplicate storage of certain edges in the RDF graph.

first, the bottom storage of the RDF graph in the invention

Referring to FIG. 3, the underlying storage of the RDF graph in the present invention includes translations for points in the RDF graph, translations for edges in the RDF graph, and translations for materialization techniques in the RDF; the specific contents are as follows:

for the conversion of points in the RDF graph, the steps are as follows:

1-2) checking whether a relation table for recording the node type U2 is created or not, and if the relation table for recording the node type U2 is created, executing 1-3); if not, firstly creating a relation table of the node type U2, wherein the relation table has two columns of attributes (id, properties);

For the conversion of edges in the RDF graph, the steps are as follows:

2-2) checking whether a relation table with a record edge type of U2 is created; if the relation table of the edge node type U2 is created, executing 2-3); if not, firstly creating a relation table of the edge type U2, wherein the relation table has four columns of attributes (id, start, end and properties);

In the present invention, the bottom layer of the RDF graph stores the following conditions:

RDF triple form < U1 > < RDF type > < U2 > is stored at the bottom in the form of one tuple U1 in the relation table of storage points U2.

RDF triple form < U1 > < U2 > < L > has attribute U2 in properties column in the underlying relationship table with storage points, whose values are L { U2: l one tuple U1.

RDF triple form < U1 > < U2 > < U3 > is stored at the bottom in a relational table of storage edges U2 in the form of one tuple with attributes of start point (id of U1), end point (id of U3).

Specific techniques in RDF: i.e., a predicate in one triple as the subject or object in another triple. When the RDF graph data is stored, the predicate is stored in a table of a storage Edge and a relation table of a storage point respectively, the unique point id and the Edge id are used for identification, and an Edge _ Vertex table is additionally maintained in the system and is used for storing the one-to-one correspondence between the Edge value and the id value of the predicate stored in the two relation tables of the Edge and the point.

as shown in FIG. 3, nodes U1 and U2 each have an edge of type rdf: type pointing to U3, then the ids of the two nodes are stored in the relationship table U3, and node U1 has an edge of type U5 with an attribute value L1. Then properties with the attribute value of { U5: L1} are added to the tuple of id1 of the U3 relationship table.

The specific technique by which the system maintains a special relationship table (Edge _ Vertex) is,

3-1) reading in a triple of (U1 > < U2 > < U3 > and U1 is a tuple in an edge type relation table; checking whether a relation table with a record edge type of U2 is created; if the relation table of the edge node type U2 is created, directly executing 3-2); if not, firstly, a relation table of the edge type U2 is created, and the relation table has four columns of attributes (id, start, end, properties)

3-2) setting a point id for the Edge U1, and inserting the point id and the Edge id of the U1 into a system table Edge _ Vertex;

3) An id value is set for the relationship, the point id of edge U1 is assigned to start, the id of node U3 is assigned to end, and the tuple is inserted into the relationship table of edge type U2. And executing 3-1) loop to read in the RDF triples.

In FIG. 4, U1, U2 and U3 all have rdf, the type of edge points to U4, namely the types are all U4, and the id of three nodes is stored in a U4 relation table. The Edge U5 between U1 and U2 is embodied by inserting Edge id (id5) and point id (id5') of U5 in the system table Edge _ Vertex, connected to the node U3 through the Edge U6.

the RDF graph is similar to the above case in that nodes point to edges and edges point to edges.

Second, the bottom storage of the attribute map in the invention

the RDF graph is in a three-hypergraph structure, and the attribute graph is in a directed graph structure, so the expression capability of the attribute graph is weaker than that of the RDF graph. Therefore, for storing the attribute map in the relation table, conversion is only needed according to the basic storage scheme.

And respectively creating a relation table of storage nodes and edges corresponding to the label types for the nodes and the edges in the attribute graph, and inserting all the nodes and the edges in the attribute graph and the respective attributes and attribute values thereof as new tuples into the relation table corresponding to the labels.

Storing the nodes or edges with the same label in the attribute graph by using a relation table named by the label, and maintaining the attributes and attribute values of each point and edge in the attribute graph; wherein, the relation table of the point label has two attributes, including the unique identifier id of the node and the attribute properties of the node; the relation table of the edge label has four attributes, including the unique identifier id of the edge, the start point identifier start of the edge, the end point identifier end of the edge, and the attribute properties of the edge.

As shown in fig. 5, a storage form of the attribute graph in the bottom-level relationship table is shown, nodes n1 and n2 are connected through r3, nodes n1 and n2 correspond to id1 and id2 of Vlabel0 in the relationship table, respective attributes properties1 and properties2 are stored, edges r3 correspond to Elabel0 in the relationship table, and ids of start and end nodes of the edges and attribute values properties3 are stored.

In the invention, the points and edges in the attribute graph are stored: the nodes of the attribute graph are stored in a relation table of a system bottom storage point in the form of a tuple (id, properties); the edge of the attribute graph is stored in the form of one primitive ancestor (id, start, end, properties) in the relation table of the storage edge at the bottom layer of the system.

third, experimental verification

experimental environment, hardware configuration: one associated notebook (ThinkPad), 2-core CPU, Intel i5 processor, frequency of 2.31GHz, memory of 8GB, and disk capacity of 512 GB. Software configuration: the operating system is 64-bit Centos 7.0. The implementation language is C language.

the standard synthetic dataset LUBM10, LUBM20, LUBM30, LUBM40 and LUBM50 were partitioned.

The storage method of the invention is verified in the open source database AgensGraph, and the storage time (shown in FIG. 6), the storage space (shown in FIG. 7), the data set characteristics (shown in Table 1), the number of stored points and the number of stored edges (shown in Table 2) of the introduction of the LUBM10, the LUBM20, the LUBM30, the LUBM40 and the LUBM50 into the AgensGraph are measured.

Referring to fig. 8, the storage time of converting the RDF dataset into the attribute graph data format supported by the AgensGraph is compared with the storage time of directly mapping the RDF dataset into the underlying relational table, and it is found that the storage time of the RDF dataset is significantly reduced by the scheme.

According to the storage method, the RDF data set imported from the open source graph database AgensGraph is shown in a table 3, 14 LUBM standard SPRQL queries are converted into Cypher queries in the LUBM50 data set, correct query results and short query time are obtained, and query interoperation between the SPRQL and the Cypher is achieved.

TABLE 1 characteristics of the experimental data set

TABLE 2 number of point edges of RDF graphs in Experimental dataset

table 3 query performance testing on the dataset of LUBM50

Experiments prove that the storage method realizes large-scale knowledge map data storage and management, effectively reduces data redundancy and has strong application value.

While the present invention has been described with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments, which are illustrative only and not restrictive, and various modifications which do not depart from the spirit of the present invention and which are intended to be covered by the claims of the present invention may be made by those skilled in the art.

Claims

1. a method for uniformly storing RDF graphs and attribute graphs based on a relational model is characterized in that two logic models of the RDF graphs and the attribute graphs are stored on the bottom layer by a physical model of a relational table, and the RDF graphs and the attribute graphs comprise the bottom layer storage of the RDF graphs and the bottom layer storage of the attribute graphs.

2. The relational model-based RDF graph and attribute graph unified storage method according to claim 1, wherein the RDF graph is stored at the bottom layer and comprises conversion of points in the RDF graph, conversion of edges in the RDF graph and conversion of materialization technologies in the RDF;

for the conversion of points in the RDF graph, the steps are as follows:

For the conversion of edges in the RDF graph, the steps are as follows:

For the conversion of the embodied technique in RDF, the steps are as follows:

3-1) reading in a triple of (U1 > < U2 > < U3 > and U1 is a tuple in an edge type relation table; checking whether a relation table with a record edge type of U2 is created; if the relation table of the edge node type U2 is created, directly executing 3-2); if not, a relationship table of the edge type U2 is created, and the relationship table has four attributes: id. start, end, properties

3-3) setting an id value for the relation, assigning the point id of the edge U1 to start, assigning the id of the node U3 to end, and inserting the tuple into the relation table of the edge type U2; and executing 3-1) loop to read in the RDF triples.

3. The relational model-based RDF graph and attribute graph unified storage method according to claim 2, wherein RDF triple form < U1 > < RDF type > < U2 > is stored at the bottom layer in the form of one tuple U1 in the relational table U2 of storage points.

4. The relational model-based RDF graph and attribute graph unified storage method according to claim 3, wherein the RDF triple form < U1 > < U2 > < L > has an attribute U2 with a value of L { U2: l one tuple U1.

5. the relational model-based RDF graph and attribute graph unified storage method according to claim 4, wherein RDF triple form < U1 > < U2 > < U3 > is stored in a tuple form with the id value of the starting point U1 and the id value of the ending point U3 as attributes in the relational table U2 of storage edges at the bottom layer.

6. The relational model-based RDF graph and attribute graph unified storage method according to claim 1, wherein the bottom storage of the attribute graph comprises the following steps:

7. the relational-model-based RDF graph and attribute graph unified storage method according to claim 6, wherein the nodes of the attribute graph are stored in a relational table of the system bottom storage point in the form of a tuple, and the tuple comprises two columns of attributes: id, properties.

8. the relational model-based RDF graph and attribute graph unified storage method according to claim 6, wherein the edge of the attribute graph is stored in the form of a meta-ancestor in a relational table of a system bottom storage edge, and the meta-ancestor has four columns of attributes: id, start, end, properties.