WO2023160137A1

WO2023160137A1 - Graph data storage method and system, and computer device

Info

Publication number: WO2023160137A1
Application number: PCT/CN2022/138771
Authority: WO
Inventors: 秦朝阳
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2022-02-25
Filing date: 2022-12-13
Publication date: 2023-08-31
Also published as: CN114564620A

Abstract

A graph data storage method and system, and a computer device. The method comprises: decomposing and processing graph data to be stored to acquire vertex identifier data, attribute data and relationship identifier data, wherein the vertex identifier data comprises starting vertex identifier data and ending vertex identifier data (S210); creating relationship bucket metadata including several relationship buckets, wherein each relationship bucket stores, in a KV key form, the graph data to be stored that has the same relationship identifier data (S220); creating a vertex attribute set on the basis of the vertex identifier data and the attribute data, and creating a relationship attribute set on the basis of the attribute data and the relationship identifier data (S230); and storing the vertex attribute set, the relationship attribute set, an attribute inverted index and the relationship bucket metadata in a persistent memory medium in a distributed manner (S240). A persistent memory is used, only one piece of key value data is stored for unique data, and a relationship network is stored in different buckets, thus facilitating horizontal expansion; and key data is directly acquired by means of an identifier during a query, and a query time is not related to the size of a data set, thus processing graph data storage and queries more quickly.

Description

Graph data storage method, system and computer equipment

Cross References to Related Applications

This application claims the priority of a Chinese patent application filed with the China Patent Office on February 25, 2022, with application number 202210178094.6, and the application name is "Graph Data Storage Method, System, and Computer Equipment", the entire contents of which are incorporated by reference in this application middle.

technical field

The present application relates to the field of graph data processing, in particular to a graph data storage method, system and computer equipment.

Background technique

With the rapid development of social networking, e-commerce, finance, retail, Internet of Things and other industries, a huge and complex relationship network has been woven in the real world. It is difficult for traditional databases to handle relational operations. The database of relational computing, the graph database was born under this background. For any data of a certain size or value, graph databases are the best way to represent and query this relational data.

Graph storage is a crucial part of graph databases, usually including native graph storage such as adjacency matrix or adjacency linked list and non-native graph storage such as JanusGraph (an open source distributed graph database), native graph storage requires special customization and optimization , instead of native graph storage, there are different degrees of consumption such as read amplification. At present, the relationship between the data that needs to be processed in the big data industry grows geometrically with the amount of data, and the requirements for efficiency are increasing day by day. The existing graph storage methods are faced with Huge graph data storage and breakthrough challenges in query performance.

Contents of the invention

The purpose of this application is to provide a graph data storage method, system and computer equipment that can improve graph data storage and query efficiency.

The technical solution of the present application is: In the first aspect, the present application provides a method for storing graph data, including:

Decompose and process the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data. The vertex identification data includes start vertex identification data and end vertex identification data;

Create relational bucket metadata that includes several relational buckets, and each relational bucket stores graph data to be stored with the same relational identification data in the form of a KV key;

Create a vertex attribute set based on the vertex identification data and attribute data and create a relationship attribute set based on the attribute data and the relationship identification data;

Distributed storage of vertex attribute sets, relational attribute sets, attribute inverted indexes, and relational bucket metadata to persistent memory media.

In some embodiments, the relationship bucket metadata including several relationship buckets is created, and each relationship bucket stores the relationship identification data in the form of a KV key. Before the same graph data to be stored, it also includes:

The KV key form data is constructed with the combination of the start vertex identification data and the relationship identification data as the key, and the end vertex identification data as the value.

In some embodiments, create relational bucket metadata including several relational buckets, each relational bucket stores relational identification data in the form of a KV key. The same graph data to be stored includes:

Create a relational bucket that stores the same graph data to be stored as the relational identification data;

Establish the ID of each relationship bucket and associate the ID of the relationship bucket with the relationship identification data of the graph data to be stored stored in the relationship bucket;

Create relational bucket metadata including all relational buckets and store the ID associated content of the relational bucket in the relational bucket metadata.

In some embodiments, creating a set of vertex attributes based on the vertex identification data and attribute data and creating a set of relationship attributes based on the attribute data and the relationship identification data includes:

Based on the vertex identification data and attribute data, create a vertex attribute set that stores vertex identification data and attribute data in a key-value model. The vertex attribute set uses the vertex identification data as the key and the attribute data as the value;

Based on attribute data and relationship identification data, create a relationship attribute set that uses a key-value model to store attribute data and relationship identification data. The relationship attribute set uses the relationship bucket ID and relationship identification data as keys, and uses attribute data as values.

In some embodiments, the content associated with the relationship bucket ID includes at least: the relationship type, the amount of data corresponding to the relationship type, the IDs of all the relationship buckets associated with the relationship type, and the positions of all the relationship buckets associated with the relationship type.

In some embodiments, after creating the vertex attribute set based on the vertex identification data and the attribute data and creating the relationship attribute set based on the attribute data and the relationship identification data, the method further includes:

Create an attribute inverted index based on the vertex attribute set and the relationship attribute set.

In some embodiments, also include:

Obtain the graph data to be written;

Decomposing the graph data to be written to obtain the start vertex identification data of the graph data to be written, the relationship identification data of the graph data to be written, the attribute data of the graph data to be written and the end vertex identification data of the graph data to be written;

Find the corresponding target relationship bucket based on the relationship identification data;

Use the relationship identifier corresponding to the start vertex identification data of the graph data to be written and the relationship identifier data of the graph data to be written as the key, and the end vertex identification data of the graph data to be written as the value to construct the KV key form data and write it to the target in the relationship bucket;

Write the attribute data to be written into the graph data in the relationship attribute collection.

In some embodiments, also include:

Receive a graph data query request, the graph data query request includes at least query type, target query vertex identification data and target relationship identification data;

Locate the target relationship bucket based on the target query vertex identification data and the target relationship identification data;

Obtain the target query result based on the query type, target relationship vertex ID and target relationship bucket.

In the second aspect, the present application also provides a graph data storage system, including:

The decomposition processing module is used to decompose and process the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, and the vertex identification data includes start vertex identification data and end vertex identification data;

The first creation module is used to create relational bucket metadata that includes several relational buckets, and each relational bucket stores the graph data to be stored identical to the relational identification data in the form of a KV key;

The second creation module is used to create a vertex attribute set based on the vertex identification data and attribute data and create a relationship attribute set based on the attribute data and the relationship identification data;

The distributed storage module is used for distributed storage of vertex attribute sets, relational attribute sets, attribute inverted indexes and relational bucket metadata to persistent memory media.

In a third aspect, the present application also provides a computer device, including:

one or more processors; and

A memory associated with one or more processors, the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform any method according to the first aspect.

The advantage of the present application is: improve a graph data storage method, system and computer equipment, the method includes: decomposing and processing the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, the vertex identification data includes start vertex identification data and End vertex identification data; create relational bucket metadata including several relational buckets, each relational bucket stores graph data to be stored with the same relational identification data in the form of KV key; create vertex attribute set based on vertex identification data and attribute data and based on attribute data Create relationship attribute sets with relationship identification data; distribute and store vertex attribute sets, relationship attribute sets, attribute inverted indexes, and relationship bucket metadata to persistent memory media; use the DAX feature of persistent memory to directly store data of a specific structure, uniqueness The data only stores one copy of key-value data, and the relationship network is stored in buckets, which is convenient for horizontal expansion. The data in the bucket can be recursively iterated, and the key data can be directly obtained with the identifier when querying, so that the time required for complex relationship network queries has nothing to do with the size of the data set , capable of processing graph data storage and query at a faster speed.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

Fig. 1 is the architectural diagram of carrying out graph data storage in this application;

FIG. 2 is a flow chart of a method for storing graph data provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of the data structure stored in the relation bucket in the graph data storage method provided by the embodiment of the present application;

Fig. 4 is a first flow chart of querying target graph data based on the current storage content in the graph data storage method provided by the embodiment of the present application;

Fig. 5 is a second flow chart of querying target graph data based on the current storage content in the graph data storage method provided by the embodiment of the present application;

FIG. 6 is a third flow chart of querying target graph data based on the current storage content in the graph data storage method provided by the embodiment of the present application;

FIG. 7 is an architecture diagram of a graph data storage system provided by an embodiment of the present application;

FIG. 8 is an architecture diagram of a computer device provided by an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the application clearer, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only Some embodiments of this application are not all embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

Such as the background technology, graph relational data: Graph data, Graph Data (graph data), store and query data with the data structure of "graph", instead of storing image data, its data model is mainly based on nodes and relationships ( side) to reflect. The current graph data storage usually adopts native graph storage such as adjacency matrix or adjacency linked list and non-native graph storage such as JanusGraph, which requires a large amount of I/O (computer data read and write operations) during query and scan. low efficiency.

In order to solve the above problems, this application proposes a graph data storage method, system and computer equipment, a new graph storage data structure based on relational buckets, wherein the storage of graph data mainly includes a vertex attribute set, a relational attribute set and several A relationship bucket, in which the start vertex identifier and the relationship identifier are combined into a key, and the key-value structure with the end vertex identifier as the value stores the graph relationship data. Only one copy of the key-value data is stored for the unique data, and the relationship network is stored in buckets. It is convenient for horizontal expansion, and the time required for query has nothing to do with the size of the data set, which can process graph data storage and query more quickly. The database query optimization method, system and computer equipment proposed in this application will be introduced below in conjunction with some embodiments.

Embodiment 1: Some embodiments of this application introduce the architecture for storing graph data in this application.

Referring to FIG. 1, the architecture includes: a persistent memory (PMem) storage medium, in which a set of vertex attributes, a set of relational attributes, an attribute inverted index, and metadata of relational buckets are stored in a distributed manner. Among them, the relationship bucket metadata includes several relationship buckets. One relationship bucket stores graph relationship data of the same relationship. When the number of graph relationship data of a certain relationship is particularly large, it can be split into multiple sub-buckets for storage. Bucket metadata stores split sub-bucket storage records. In addition, each relationship bucket has its own unique ID, and the metadata of the relationship bucket also stores the content associated with the ID of the relationship bucket. The content associated with the ID of the relationship bucket includes at least: the relationship type, the amount of data corresponding to the relationship type, and all the relationship buckets associated with the relationship type ID and all relationship bucket locations associated with the relationship type.

Embodiment 2: Based on the architecture for storing graph data described in Embodiment 1 above, some embodiments of the present application provide an introduction to the process of the graph data storage method in this application, as shown in FIG. 2 .

Specifically, referring to FIG. 2, the process of storing graph data in the graph data storage method provided by some embodiments of the present application includes:

S210. Decompose and process the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, where the vertex identification data includes start vertex identification data and end vertex identification data.

Specifically, the graph data to be stored usually includes two relationship subjects and the relationship between them. For example: a piece of graph data is: Xiao Wang is Xiao Ming’s father, then the graph data can be decomposed into the starting vertex identification data—Xiao Wang , end vertex identification data—Xiao Ming, attribute data—the attributes of Xiao Wang and Xiao Ming, and relationship identification data—parent-child relationship identification data. Relationship identification data and relationship content are preset to generate and correspond to each other. For example, relationship identification data 0001 represents father and son, relationship identification data 0002 represents friends, and relationship identification data 0003 represents mother and child.

In some embodiments, before S220, the method includes:

SA1. Construct data in the form of a KV key with the combination of the start vertex identification data and the relationship identification data as the key and the end vertex identification data as the value.

Specifically, the key-value K-V structure is constructed. The key (K) is composed of the start vertex identification data and the relationship identification data. The vertex identifier is the only identification data pointing to the vertex, which can be regarded as the equivalent of the address pointer and can be obtained directly. Vertex attributes, and vice versa, the unique identifier can be quickly obtained from the vertex collection; the relationship identifier is a series of fixed-digit serial numbers belonging to the vertex that represent a certain relationship, and the value (V) is the end vertex identifier.

In some embodiments, the combination of fixed-length start vertex identifier and fixed-length relationship identifier in the graph data is used as a key, and the end vertex identifier is used as a value to construct KV key form data. Divide data in the form of KV keys by relationships, which facilitates subsequent distributed storage of graph relationship data of the same relationship.

S220. Create relational bucket metadata including several relational buckets, and each relational bucket stores graph data to be stored with the same relational identification data in the form of a KV key.

In some embodiments, this step includes:

S221. Create a relation bucket that stores graph data to be stored that is the same as the relation identification data.

Specifically, data in the form of KV keys is divided by relationship types, and graph data of the same type of relationship are stored in a relationship bucket, that is, the relationship identification data of the graph data included in a relationship bucket is the same, and the relationship of graph data in a relationship bucket What kind of relationship the identification data represents is determined by the type of the current bucket. The design of the relationship bucket (RB, meaning relations bucket) is as follows:

RB=[K->V]=[(P1+Rs)->P2], where: RB represents the relationship bucket, K represents the key, V represents the value, P1 represents the start vertex identifier, Rs represents the relationship identifier, and P2 represents the end Vertex identifier. Refer to Figure 3 for the data structure in the relational bucket.

S222. Establish the ID of each relation bucket and associate the ID of the relation bucket with the relation identification data of the graph data to be stored stored in the relation bucket.

S223. Create relational bucket metadata including all relational buckets and store ID-associated content of the relational buckets in the relational bucket metadata.

Specifically, the ID association content of the relationship bucket at least includes: the relationship type, the amount of data corresponding to the relationship type, the IDs of all the relationship buckets associated with the relationship type, and the locations of all the relationship buckets associated with the relationship type. That is, the content represented by the ID of the relationship bucket is recorded by the metadata of the relationship bucket, including but not limited to what type of relationship the relationship attribute set key represents, how many items of this type, which relationship buckets this type is distributed in, and the location of the relationship bucket, etc. .

S230. Create a vertex attribute set based on the vertex identification data and the attribute data, and create a relationship attribute set based on the attribute data and the relationship identification data.

Specifically, the vertex identification data uniquely points to the vertex. Exemplarily, the vertex identification data is a vertex ID. The vertex ID and attribute are stored in the vertex attribute set, and the attribute and relation ID are stored in the relation attribute set.

In some embodiments, this step includes:

S231. Based on the vertex identification data and attribute data, create a vertex attribute set using a key-value model to store the vertex identification data and attribute data. The vertex attribute set uses the vertex identification data as a key and the attribute data as a value.

Specifically, the vertex attribute set uses the vertex ID as the key, and the content is the persistent memory ID (PMEMoid) of the data object, which is usually 64 bits in size. The persistent memory ID can be easily converted into a memory pointer; the attribute value of the vertex attribute set is a separate Tag values and opaque binary data with variable size.

S232. Based on the attribute data and the relationship identification data, create a relationship attribute set that uses a key-value model to store the attribute data and the relationship identification data. The relationship attribute set uses the relationship bucket ID and the relationship identification data as keys, and uses the attribute data as values.

Specifically, the key in the relationship attribute set is composed of a 16-bit relationship bucket ID + a 16-bit relationship identifier, a total of 32 bits; the highest bit of the relationship bucket ID, 0 or 1, respectively indicates whether the relationship is unidirectional or bidirectional, for example Parent-child relationship or friend relationship, the remaining 7 digits form a serial number. The data structure of the attribute value of the relationship attribute collection is similar to that of the vertex attribute collection.

In some embodiments, after S230, the method further includes:

SA2. Create an attribute inverted index based on the vertex attribute set and the relationship attribute set.

Specifically, the inverted index of the attribute is generated according to the data stored in the vertex attribute set and the relationship attribute set, which is convenient for locating and searching the data in the vertex attribute set and the relationship attribute set, and for searching the attribute value in the vertex attribute set.

S240. Distribute and store the vertex attribute set, the relationship attribute set, the attribute inverted index, and the metadata of the relationship bucket into a persistent memory medium.

Persistent memory (PMem, Persistent Memory) is a new generation of storage media, byte addressable, high read and write performance, and has the traditional memory (DRAM, Dynamic Random Access Memory) does not have the advantages.

In some embodiments, the method also includes:

SA3. Write the graph data to be written according to the current storage content, including:

SA31. Obtain graph data to be written.

SA32. Decompose and process the graph data to be written to obtain the start vertex identification data of the graph data to be written, the relationship identification data of the graph data to be written, the attribute data of the graph data to be written, and the end vertex identifier of the graph data to be written data.

Exemplarily, the graph data to be written is the graph relationship data that Xiao Wang is Xiao Ming’s father, and the graph data to be written is decomposed and processed to obtain the start vertex identification data of the graph data to be written—Xiao Wang’s ID, the graph to be written The relationship identification data of the data - the identification of the father relationship 00000001, the attribute data of the data to be written into the graph - the year when the father became a father: 2020 and the end vertex identification data of the data to be written into the graph - Xiaoming ID.

For example, Xiao Wang is Xiao Ming's father, first find the relationship bucket of "Dad", then write the key-value pair data such as (Xiao Wang ID+"0000 0001")->(Xiao Ming ID), and record this relationship in Rc For example, the attribute of "Dad-0000 0001" is the year when the father became a father: 2020.

SA33. Search for a corresponding target relationship bucket based on the relationship identification data.

The ID of the relationship bucket is associated with the relationship identification data of the graph data stored in the relationship bucket, so the corresponding target relationship bucket can be quickly searched through the relationship identification data.

SA34, using the relationship identification corresponding to the start vertex identification data of the graph data to be written and the relationship identification data of the graph data to be written as a key, and the end vertex identification data of the graph data to be written as a value to construct KV key form data and write into the target relationship bucket.

Specifically, following the above example, construct (Xiao Wang ID+"0000 0001")->(Xiao Ming ID) such KV key-value pair data, and write the ID association relationship identification data into the target relationship bucket of "0000 0001".

SA35. Write the attribute data to be written into the graph data in the relationship attribute set.

Specifically, following the above example, record the specific attribute of the graph data to be written in the relationship attribute set. Since the attribute of "Dad-0000 0001" is the year of being a father: 2020, the content in the relationship attribute set is the relationship The bucket ID and relationship identification data are used as the key, the attribute data is used as the value, and the ID of the target relationship bucket is A01, then the content written in the relationship attribute set is (0000 0001+A01)->(2020).

In some embodiments, the method also includes:

SA4. Query the target graph data based on the current storage content, including:

SA41. Receive a graph data query request. The graph data query request includes at least query type, target query vertex identification data, and target relationship identification data.

Exemplarily, in some embodiments, as shown in FIG. 4, the query type is a second-degree query, the target query vertex data is the vertex ID of A, and the target relationship identification data is the identification data of the "friend" relationship, that is, graph data query Request to check whether there is a person named X among the friends of A's friends.

SA42. Locate the target relationship bucket based on the target query vertex identification data and the target relationship identification data.

Following the above example, specifically, query and locate the target relationship bucket storing the friend relationship graph data according to the relationship identification data corresponding to the friend relationship.

SA43. Obtain a target query result based on the query type, the target relationship vertex ID, and the target relationship bucket.

Continuing from the above example, this step specifically includes querying all vertex ID series B pointed to by the relationship beginning with the vertex ID of A in the target relationship bucket, and then getting all the vertex ID series C pointed to by series B from this bucket, and from the vertex attribute Get the name attribute corresponding to the vertex series C in the collection.

In some embodiments, as shown in FIG. 5 , the query type is a statistical query, the target query vertex identification data is the vertex ID of A, and the target relationship identification data is the identification data of the "friend" relationship, that is, how many friends A has to be queried . Then, according to the identification data query of the "friend" relationship, locate the target relationship bucket that stores the friend relationship graph data, query all the relationship identification series beginning with the vertex ID of A in the target relationship bucket, and count the number of data in this series. In the query result, if there is more than one relational bucket, just add the values together.

In some embodiments, as shown in FIG. 6 , the query type is a phone range query within one month, the target query vertex identification data is the vertex ID of A, and the target relationship identification data is the identification data of the "call" relationship, that is, to query Who has A called the most in the last month? Then get the relationship bucket ID from the relationship bucket metadata, filter the relationship entry series T with a time of nearly one month from the relationship entries starting with the bucket ID in the relationship attribute set, and take out the relationship entry series T from the "call" relationship bucket For all the vertex series H of the ID+T series, the Y that appears most in the H series is the result.

The graph data storage method provided by some embodiments of the present application includes: decomposing and processing the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, where the vertex identification data includes start vertex identification data and end vertex identification data; Relationship bucket metadata of several relationship buckets, each relationship bucket stores graph data to be stored with the same relationship identification data in the form of KV key; create vertex attribute set based on vertex identification data and attribute data, and create relationship attributes based on attribute data and relationship identification data Collection; distributed storage of vertex attribute collection, relational attribute collection, attribute inverted index and relational bucket metadata to persistent memory medium; using the DAX (Data Analysis eXpressions, programming data analysis language) feature of persistent memory to directly store specific structure Data, unique data only stores one copy of key-value data, and the relationship network is stored in buckets, which is convenient for horizontal expansion. The data in the bucket can be recursively iterated, and the key data can be directly obtained with the identifier when querying, so that the time required for complex relationship network queries and Dataset size is independent, enabling faster processing of graph data storage and query.

Embodiment 3: Corresponding to Embodiment 1 and Embodiment 2 above, the graph data storage system provided by this application will be introduced below with reference to FIG. 7 . Wherein, the system may be realized by means of hardware or software, or by a combination of software and hardware, which is not limited in this application.

As shown in Figure 4 in an example, the present application provides a graph data storage system, the system includes:

The decomposition processing module 710 is used to decompose and process the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, and the vertex identification data includes start vertex identification data and end vertex identification data;

The first creating module 720 is used to create relational bucket metadata that includes several relational buckets, and each relational bucket stores the graph data to be stored identical to the relational identification data in the form of a KV key;

The second creation module 730 is used to create a vertex attribute set based on the vertex identification data and attribute data and create a relationship attribute set based on the attribute data and the relationship identification data;

The distributed storage module 740 is configured to distribute and store the vertex attribute set, relational attribute set, attribute inverted index and relational bucket metadata in a persistent memory medium.

In some embodiments, the system also includes:

The data construction module 750 is used to create relational bucket metadata including several relational buckets in the first creation module 720, and each relational bucket stores relational identification data in the form of a KV key before the same graph data to be stored, starting with the vertex identification data and The relationship identification data is combined as a key, and the KV key form data is constructed with the end vertex identification data as a value.

In some embodiments, the first creation module 720 includes:

The first creating unit 721 is configured to create a relationship bucket that stores the same graph data to be stored as the relationship identification data;

Establish an association unit 722, which is used to establish the ID of each relationship bucket and associate the ID of the relationship bucket with the relationship identification data of the graph data to be stored stored in the relationship bucket;

The second creating unit 723 is configured to create relational bucket metadata including all relational buckets and store ID-associated content of the relational buckets in the relational bucket metadata.

In some embodiments, the second creation module 730 includes:

The third creating unit 731 is configured to create a vertex attribute set that stores vertex identification data and attribute data in a key-value model based on the vertex identification data and attribute data, and the vertex attribute set uses the vertex identification data as a key and the attribute data as a value;

The fourth creation unit 732 is configured to create a relationship attribute set that stores attribute data and relationship identification data in a key-value model based on the attribute data and the relationship identification data. The relationship attribute set uses the relationship bucket ID and the relationship identification data as keys, and the attribute data as value.

In some embodiments, the system also includes:

The third creating module 760 is configured to create an attribute set based on the vertex attribute set and the relationship attribute set after the second creating module 730 creates the vertex attribute set based on the vertex identification data and the attribute data and creates the relationship attribute set based on the attribute data and the relationship identification data. row index.

In some embodiments, the system also includes:

Write module 770, including:

The first acquiring unit 771 is configured to acquire the image data to be written;

Decomposition unit 772, configured to decompose the graph data to be written to obtain the start vertex identification data of the graph data to be written, the relationship identification data of the graph data to be written, the attribute data of the graph data to be written and the end vertex identification data;

A search unit 773, configured to search for a corresponding target relation bucket based on the relation identification data;

The first writing unit 774 is configured to use the relationship identification corresponding to the start vertex identification data of the graph data to be written and the relationship identification data of the graph data to be written as a key, and use the end vertex identification data of the graph data to be written as a value Construct data in the form of KV key and write it into the target relational bucket;

The second writing unit 775 is configured to write attribute data to be written into graph data in the relational attribute set.

In some embodiments, the system also includes:

Inquiry module 780, comprising:

The receiving unit 781 is configured to receive a graph data query request, and the graph data query request includes at least query type, target query vertex identification data and target relationship identification data;

A positioning unit 782, configured to locate the target relationship bucket based on the target query vertex identification data and the target relationship identification data;

The second acquiring unit 783 is configured to acquire the target query result based on the query type, the target relationship vertex ID and the target relationship bucket.

Embodiment 4: Corresponding to Embodiment 1 to Embodiment 3 above, the computer equipment provided by this application will be introduced below with reference to FIG. 8 . In an example shown in Figure 8, the present application provides a computer device, the computer device includes:

one or more processors; and

A memory associated with one or more processors. The memory is used to store program instructions. When the program instructions are read and executed by one or more processors, the following operations are performed:

Wherein, FIG. 8 exemplarily shows the architecture of a computer device, which may specifically include a processor 810 , a video display adapter 811 , a disk drive 812 , an input/output interface 813 , a network interface 814 , and a memory 820 . The processor 810 , video display adapter 811 , disk drive 812 , input/output interface 813 , network interface 814 , and the memory 820 can be connected by communication bus 830 .

Wherein, the processor 810 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., for Relevant programs are executed to realize the technical solutions provided by this application.

The memory 820 can be implemented in the form of ROM (Read Only Memory, read-only memory), RAM (Random Access Memory, random access memory), static storage device, dynamic storage device, etc. The memory 820 may store an operating system 821 for controlling the operation of the computer device 800 , and a basic input output system (BIOS) 822 for controlling low-level operations of the computer device 800 . In addition, a web browser 823, a data storage management 824, an icon font processing system 825, etc. can also be stored. The above-mentioned icon font processing system 825 may be an application program in the embodiment of the present application that specifically implements the operations of the aforementioned steps. In a word, when implementing the technical solution provided by this application through software or firmware, the relevant program codes are stored in the memory 820, and are called and executed by the processor 810.

The input/output interface 813 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.

The network interface 814 is used to connect the communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).

Bus 830 includes a path for carrying information between the various components of the device (eg, processor 810, video display adapter 811, disk drive 812, input/output interface 813, network interface 814, and memory 820).

In addition, the computer device 800 can also obtain information about specific claim conditions from the virtual resource object claim condition information database 841 for condition judgment, and so on.

It should be noted that although the above computer device 800 only shows a processor 810, a video display adapter 811, a disk drive 812, an input/output interface 813, a network interface 814, a memory 820, a bus 830, etc., in some embodiments , the computer equipment may also include other components necessary for proper operation. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to realize the solution of the present application, and does not necessarily include all the components shown in the figure.

It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the present application can be implemented by means of software plus a necessary general-purpose hardware platform. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, disk , optical disc, etc., including several instructions to make a computer device (which may be a personal computer, cloud server, or network device, etc.) execute the methods of various embodiments or some parts of the embodiments of the present application.

Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the related parts, please refer to the part of the description of the method embodiment. The above-described system embodiments are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place , or can also be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the goals of the solutions in some embodiments of the present application. It can be understood and implemented by those skilled in the art without creative effort.

In addition, it should be noted that the terms "first" and "second" in this application are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. Of course, the above-mentioned embodiments are only to illustrate the technical concept and characteristics of the present application, and the purpose is to enable those familiar with this technology to understand the content of the present application and implement it accordingly, and not to limit the protection scope of the present application. All modifications made according to the spirit of the main technical solutions of this application shall fall within the scope of protection of this application.

Claims

A graph data storage method, characterized in that the method comprises:

Decomposing and processing the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, the vertex identification data includes start vertex identification data and end vertex identification data;

Create relational bucket metadata including several relational buckets, each of the relational buckets stores the graph data to be stored identical to the relational identification data in the form of a KV key;

creating a vertex attribute set based on the vertex identification data and attribute data and creating a relationship attribute set based on the attribute data and relationship identification data;

The vertex attribute set, the relationship attribute set, the attribute inverted index, and the relationship bucket metadata are stored in a persistent memory medium in a distributed manner.
The graph data storage method according to claim 1, wherein said creating relational bucket metadata includes a plurality of relational buckets, each of said relational buckets stores said waiting list with the same relational identification data in the form of a KV key. Before storing the graph data, the method also includes:

KV key form data is constructed with the combination of the start vertex identification data and the relationship identification data as a key and the end vertex identification data as a value.
The graph data storage method according to claim 2, wherein said combining the start vertex identification data and the relationship identification data as a key and using the end vertex identification data as a value to construct KV key form data includes :

KV key form data is constructed with the combination of the fixed-length start vertex identifier and the fixed-length relationship identifier as a key and the end vertex identifier data as a value.
The graph data storage method according to claim 2, wherein said creating relational bucket metadata includes a plurality of relational buckets, and each of said relational buckets stores said waiting list with the same relational identification data in the form of a KV key. Stored graph data includes:

Create a relational bucket that stores the same graph data to be stored as the relational identification data;

Establishing the ID of each of the relationship buckets and associating the ID of the relationship bucket with the relationship identification data of the graph data to be stored stored in the relationship bucket;

Create relational bucket metadata including all the relational buckets and store ID-associated content of the relational buckets in the relational bucket metadata.
The graph data storage method according to claim 4, wherein said creating a vertex attribute set based on said vertex identification data and attribute data and creating a relationship attribute set based on said attribute data and relationship identification data comprises:

Create a vertex attribute set that stores the vertex identification data and attribute data in a key-value model based on the vertex identification data and attribute data, the vertex attribute set uses the vertex identification data as a key and the attribute data as a value;

Based on the attribute data and relationship identification data, create a relationship attribute set that stores the attribute data and relationship identification data in a key-value model, the relationship attribute set uses the relationship bucket ID and relationship identification data as keys, and uses the attribute Data is a value.
The graph data storage method according to claim 4, wherein the relational bucket ID associated content at least includes: a relational type, a data volume corresponding to the relational type, and IDs of all relational buckets associated with the relational type, All relation bucket locations associated with the relation type.
The graph data storage method according to claim 1, wherein the graph data to be stored includes two relationship subjects and a relationship between the two relationship subjects.
The graph data storage method according to claim 4, wherein the content indicated by the ID of the relational bucket is recorded by the metadata of the relational bucket.
The graph data storage method according to claim 1, wherein the vertex attribute set stores vertex IDs and attributes, and the relationship attribute set stores attributes and relationship IDs.
The graph data storage method according to claim 1, characterized in that, the corresponding relationship between the relationship identification data and the relationship content is generated by preset and one-to-one.
The graph data storage method according to claim 1, wherein after creating a vertex attribute set based on the vertex identification data and attribute data and creating a relationship attribute set based on the attribute data and relationship identification data, the method Also includes:

An attribute inverted index is created based on the vertex attribute set and the relationship attribute set.
The graph data storage method according to claim 11, wherein said creating an attribute inverted index based on said vertex attribute set and said relationship attribute set comprises:

An attribute inverted index is generated according to the data stored in the vertex attribute set and the relationship attribute set.
The graph data storage method according to claim 12, wherein the attribute inverted index is used to locate and search the data in the vertex attribute set and the relationship attribute set, and the attributes in the vertex attribute set value.
The graph data storage method according to claim 1, further comprising:

Write the graph data to be written according to the current storage content.
The graph data storage method according to claim 14, wherein said writing the graph data to be written according to the current storage content comprises:

Obtain the graph data to be written;

Decomposing the graph data to be written to obtain the start vertex identification data of the graph data to be written, the relationship identification data of the graph data to be written, the attribute data of the graph data to be written, and the The end vertex identification data of the input data;

Finding a corresponding target relationship bucket based on the relationship identification data;

Using the start vertex identification data of the graph data to be written and the relationship identification corresponding to the relationship identification data of the graph data to be written as a key, and the end vertex identification data of the graph data to be written as a value to construct a KV key form data and write it into the target relational bucket;

Write the attribute data of the graph data to be written in the relational attribute set.
The graph data storage method according to claim 15, wherein the attribute data of the graph data to be written in the relational attribute set includes:

In the relation attribute set, the ID of the target relation bucket and the relation identification data of the graph data to be written are used as a key, and the attribute data of the graph data to be written is used as a value to be written.
The graph data storage method according to claim 1, further comprising:

Query the target graph data based on the current storage content.
The graph data storage method according to claim 17, wherein the querying target graph data based on the current storage content comprises:

Receiving a graph data query request, the graph data query request at least including query type, target query vertex identification data and target relationship identification data;

Locating a target relationship bucket based on the target query vertex identification data and the target relationship identification data;

Obtain a target query result based on the query type, the target relationship vertex ID, and the target relationship bucket.
A graph data storage system, characterized in that the system includes:

The decomposition processing module is used to decompose and process the graph data to be stored to obtain vertex identification data, attribute data and relationship identification data, and the vertex identification data includes start vertex identification data and end vertex identification data;

The first creating module is used to create relational bucket metadata including several relational buckets, each of the relational buckets stores the same graph data to be stored as the relational identification data in the form of a KV key;

The second creating module is used to create a vertex attribute set based on the vertex identification data and attribute data and create a relationship attribute set based on the attribute data and relationship identification data;

A distributed storage module, configured to distribute and store the vertex attribute set, the relationship attribute set, the attribute inverted index, and the relationship bucket metadata in a persistent memory medium.
A computer device, characterized in that it includes:

one or more processors; and

A memory associated with the one or more processors, the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform any a method as described.