CN112287182B - Graph data storage and processing method and device and computer storage medium - Google Patents

Graph data storage and processing method and device and computer storage medium Download PDF

Info

Publication number
CN112287182B
CN112287182B CN202011192437.1A CN202011192437A CN112287182B CN 112287182 B CN112287182 B CN 112287182B CN 202011192437 A CN202011192437 A CN 202011192437A CN 112287182 B CN112287182 B CN 112287182B
Authority
CN
China
Prior art keywords
vertex
target
data
edge
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011192437.1A
Other languages
Chinese (zh)
Other versions
CN112287182A (en
Inventor
余利峰
陈哲嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202011192437.1A priority Critical patent/CN112287182B/en
Publication of CN112287182A publication Critical patent/CN112287182A/en
Application granted granted Critical
Publication of CN112287182B publication Critical patent/CN112287182B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a graph data storage and processing method, a graph data storage and processing device and a computer storage medium, and belongs to the technical field of graph databases. The method comprises the following steps: obtaining a target vertex sequence number of a target vertex stored in a target partition, wherein the target partition stores a plurality of vertices, and the vertices respectively correspond to the vertex sequence numbers; obtaining the partition number of the partition where the other vertex of each edge of one or more edges associated with the target vertex is located, the vertex sequence number of the other vertex and the direction of each edge, and obtaining target edge data corresponding to the target vertex sequence number; and writing the target edge data into an edge data file in the target partition, wherein the edge data file is used for storing edge data corresponding to each vertex sequence number. The embodiment of the application takes the partition number of the partition where the vertex is located and the vertex sequence number of the vertex in the partition as the mark, thereby reducing the memory required by graph calculation and further improving the calculation performance of the calculation node.

Description

Graph data storage and processing method and device and computer storage medium
Technical Field
The embodiment of the application relates to the technical field of graph databases, in particular to a graph data storage and processing method, a graph data storage and processing device and a computer storage medium.
Background
Graph database a non-relational database that characterizes entities and relationships between entities in a graphical manner such as vertices and edges. Wherein vertices in the graph database indicate an entity and edges between two vertices indicate a relationship between the two entities. An entity may characterize objects in real life and a relationship may characterize the association between different objects. The data stored in the graph database may be referred to as graph data, which includes vertex data and edge data. Wherein, the data of the vertex is used for indicating the related information of the entity, and the data of the edge is used for indicating the related information of the relation. In addition, different storage modes of the graph data in the graph database can have different effects on subsequent graph data processing, so how to store the graph data in the graph database is a hotspot of current research.
In the related art, vertex data and edge data in the graph data are typically stored in the graph database by a key-value manner. Wherein, the key of the vertex is used for indicating the ID (identity) of the vertex, and the value of the vertex is used for indicating the attribute of the vertex. The key of an edge is used to indicate the IDs (identities) of the two vertices at both ends of the edge, and the value of the edge is used to indicate the properties of the edge.
After the graph database is organized according to the mode of storing the graph data, when the subsequent computing nodes perform related computation of the graph database, all the graph data in the graph database need to be loaded into the memory of the computing nodes. In this scenario, if the storage space occupied by the ID of the vertex is large, the storage space occupied by the key value of the vertex itself and the key value of the edge corresponding to the vertex are both large. In this way, the graph data loaded into the memory occupies a larger memory space, thereby affecting the computing performance of the computing node.
Disclosure of Invention
The embodiment of the application provides a graph data storage and processing method, a graph data storage and processing device and a computer storage medium, which can improve the computing performance of a computing node. The technical scheme is as follows:
in one aspect, a graph data storage method is provided, and is applied to a target node in a storage system storing graph data, where the target node is any node in the storage system, and the method includes:
obtaining a target vertex sequence number of a target vertex stored in a target partition, wherein the target partition is any storage partition on the target node, a plurality of vertices are stored in the target partition, the vertices respectively correspond to the vertex sequence numbers, the vertex sequence number of any vertex indicates the ordering of any vertex in the vertices, and the target vertex is any vertex in the vertices;
Obtaining a partition number of a partition where another vertex of one or more edges associated with the target vertex is located, a vertex sequence number of the other vertex, and a direction of each edge, and obtaining target edge data corresponding to the target vertex sequence number;
and writing the target edge data into an edge data file in the target partition, wherein the edge data file is used for storing edge data corresponding to each vertex sequence number.
Optionally, the edge data in the edge data file are sequentially stored according to the corresponding vertex sequence numbers;
the target partition is also stored with an edge index file, the edge index file stores the length of each edge data in the edge data file, and the lengths of each edge data stored in the edge index file are sequentially stored according to the corresponding vertex sequence numbers.
Optionally, the data type of the length of each side data stored in the side index file is an integer data type of 32 bytes or an integer data type of 64 bytes.
Optionally, the method further comprises:
writing the corresponding relation between the target vertex sequence number and the vertex identifications of the target vertices into a mapping table in the target partition, wherein the mapping table stores vertex sequence numbers respectively corresponding to the vertex identifications of the plurality of vertices.
Optionally, the method further comprises:
and writing the vertex data of the target vertex into a vertex data file in the target partition, wherein the vertex data file stores vertex data corresponding to each vertex sequence number.
Optionally, vertex data in the vertex data file are sequentially stored according to corresponding vertex sequence numbers;
and the target partition is also stored with a vertex index file, the vertex index file is stored with the lengths of the vertex data in the vertex data file, and the lengths of the vertex data stored in the vertex index file are sequentially stored according to the vertex sequence numbers corresponding to the vertex data.
Optionally, the vertex data of the target vertex includes a property of the target vertex.
In another aspect, a graph data processing method is provided, applied to a computing node in a storage system storing graph data, and the method includes:
determining edge data to be processed in an edge data file stored in a target partition of a target node;
carrying out graph data processing according to the edge data to be processed of the edge;
the target node is any node in the storage system, the target partition is any storage partition on the target node, the edge data file stores edge data corresponding to vertex sequence numbers of a plurality of vertices stored in the target partition, the edge data comprises partition numbers of partitions where another vertex of one or more edges related to the corresponding vertex is located, vertex sequence numbers of the other vertex, and directions of the edges, and the vertex sequence numbers of any vertex in the target partition indicate the ordering of the any vertex in the plurality of vertices.
Optionally, the edge data in the edge data file are sequentially stored according to the corresponding vertex sequence numbers, the edge index file is also stored in the target partition, the length of each edge data in the edge data file is stored in the edge index file, and the length of each edge data stored in the edge index file is sequentially stored according to the corresponding vertex sequence numbers;
the determining the edge data to be processed in the edge data file stored in the target partition of the target node includes:
and determining the edge data to be processed according to the length of each edge data stored in the edge index file.
Optionally, in the case that the graph data processing is iterative processing, the edge data to be processed is current edge data to be processed which is sequentially iterated according to the lengths of the edge data.
Optionally, in the case that the graph data processing is query processing, the edge data to be processed is edge data positioned according to the length of each edge data and the vertex sequence number of the vertex to be queried currently.
Optionally, the computing node further stores a position of edge data corresponding to a reference vertex sequence number in the edge data file, where the reference vertex sequence number is one or more vertex sequence numbers among vertex sequence numbers of the multiple vertices stored in the target partition;
The current edge data to be processed is edge data positioned according to the length of each edge data, the position of the edge data corresponding to the sequence number of the reference vertex in the edge data file, and the sequence number of the vertex to be queried currently.
Optionally, a vertex data file is stored in the target partition, and vertex data corresponding to vertex sequence numbers of the multiple vertices is stored in the vertex data file;
the method further comprises the steps of:
and obtaining vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number to be queried.
Optionally, vertex data of each of the plurality of vertices is sequentially stored according to corresponding vertex sequence numbers, a vertex index file is also stored in the target partition, lengths of all vertex data in the vertex data file are stored in the vertex index file, and lengths of all vertex data stored in the vertex index file are sequentially stored according to corresponding vertex sequence numbers;
the obtaining vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number includes:
Determining the position of vertex data corresponding to the target vertex sequence number in the vertex data file according to the target vertex sequence number and the index file;
and acquiring the vertex data corresponding to the target vertex sequence number from the vertex data file according to the position of the vertex data corresponding to the target vertex sequence number in the vertex data file.
Optionally, a mapping table is stored in the target partition, and a plurality of vertex sequence numbers corresponding to the plurality of vertex identifiers respectively are stored in the mapping table;
after the graph data stored on the target partition is processed according to the edge index file, the method further includes:
determining vertex sequence numbers of vertexes obtained after the graph data processing;
and obtaining vertex identifications corresponding to the determined vertex sequence numbers according to the mapping table, and taking the vertex identifications as graph data processing results.
In another aspect, a graph data storage device is provided, and the graph data storage device is applied to a target node in a storage system storing graph data, where the target node is any node in the storage system, and the device includes:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a target vertex sequence number of a target vertex stored in a target partition, the target partition is any storage partition on the target node, a plurality of vertices are stored in the target partition, the vertices respectively correspond to the vertex sequence numbers, the vertex sequence number of any vertex indicates the ordering of the any vertex in the vertices, and the target vertex is any vertex in the vertices;
The acquisition module is further used for acquiring the partition number of the partition where the other vertex of each side of the one or more sides related to the target vertex is located, the vertex sequence number of the other vertex and the direction of each side, and obtaining target side data corresponding to the target vertex sequence number;
and the writing module is used for writing the target edge data into an edge data file in the target partition, and the edge data file is used for storing the edge data corresponding to each vertex sequence number.
Optionally, the edge data in the edge data file are sequentially stored according to the corresponding vertex sequence numbers;
the target partition is also stored with an edge index file, the edge index file stores the length of each edge data in the edge data file, and the lengths of each edge data stored in the edge index file are sequentially stored according to the corresponding vertex sequence numbers.
Optionally, the data type of the length of each side data stored in the side index file is an integer data type of 32 bytes or an integer data type of 64 bytes.
Alternatively, the process may be carried out in a single-stage,
and the writing module is also used for writing the corresponding relation between the sequence number of the target vertex and the vertex identifications of the target vertex into a mapping table in the target partition, wherein the mapping table stores vertex sequence numbers respectively corresponding to the vertex identifications of the plurality of vertices.
Alternatively, the process may be carried out in a single-stage,
and the writing module is also used for writing the vertex data of the target vertex into a vertex data file in the target partition, wherein the vertex data file stores vertex data corresponding to each vertex sequence number.
Optionally, vertex data in the vertex data file are sequentially stored according to corresponding vertex sequence numbers;
and the target partition is also stored with a vertex index file, the vertex index file is stored with the lengths of the vertex data in the vertex data file, and the lengths of the vertex data stored in the vertex index file are sequentially stored according to the vertex sequence numbers corresponding to the vertex data.
Optionally, the vertex data of the target vertex includes a property of the target vertex.
In another aspect, there is provided a graph data processing apparatus for use with a computing node in a storage system in which graph data is stored, the apparatus comprising:
the determining module is used for determining the edge data to be processed in the edge data file stored in the target partition of the target node;
the processing module is used for carrying out graph data processing according to the edge data to be processed of the edge;
the target node is any node in the storage system, the target partition is any storage partition on the target node, the edge data file stores edge data corresponding to vertex sequence numbers of a plurality of vertices stored in the target partition, the edge data comprises partition numbers of partitions where another vertex of one or more edges related to the corresponding vertex is located, vertex sequence numbers of the other vertex, and directions of the edges, and the vertex sequence numbers of any vertex in the target partition indicate the ordering of the any vertex in the plurality of vertices.
Optionally, the edge data in the edge data file are sequentially stored according to the corresponding vertex sequence numbers, the edge index file is also stored in the target partition, the length of each edge data in the edge data file is stored in the edge index file, and the length of each edge data stored in the edge index file is sequentially stored according to the corresponding vertex sequence numbers;
the determining module is used for:
and determining the edge data to be processed according to the length of each edge data stored in the edge index file.
Optionally, in the case that the graph data processing is iterative processing, the edge data to be processed is current edge data to be processed which is sequentially iterated according to the lengths of the edge data.
Optionally, in the case that the graph data processing is query processing, the edge data to be processed is edge data located according to the length of each edge data and the vertex sequence number of the vertex to be queried currently.
Optionally, the computing node further stores a position of edge data corresponding to a reference vertex sequence number in the edge data file, where the reference vertex sequence number is one or more vertex sequence numbers among vertex sequence numbers of the multiple vertices stored in the target partition;
The current edge data to be processed is the edge data positioned according to the length of each edge data, the position of the edge data corresponding to the sequence number of the reference vertex in the edge data file, and the sequence number of the vertex to be queried currently.
Optionally, a vertex data file is stored in the target partition, and vertex data corresponding to vertex sequence numbers of the multiple vertices is stored in the vertex data file;
the apparatus further comprises:
and the acquisition module is used for acquiring vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number to be queried.
Optionally, vertex data of each of the plurality of vertices is sequentially stored according to corresponding vertex sequence numbers, a vertex index file is also stored in the target partition, lengths of all vertex data in the vertex data file are stored in the vertex index file, and lengths of all vertex data stored in the vertex index file are sequentially stored according to corresponding vertex sequence numbers;
the acquisition module is used for determining the position of vertex data corresponding to the target vertex sequence number in the vertex data file according to the target vertex sequence number and the index file;
And acquiring the vertex data corresponding to the target vertex sequence number from the vertex data file according to the position of the vertex data corresponding to the target vertex sequence number in the vertex data file.
Optionally, a mapping table is stored in the target partition, and a plurality of vertex sequence numbers corresponding to the plurality of vertex identifiers respectively are stored in the mapping table;
the determining module is used for determining the vertex sequence number of the vertex obtained after the graph data processing;
the device also comprises an acquisition module, which is used for acquiring the vertex identification corresponding to the determined vertex sequence number according to the mapping table, and taking the vertex identification as a graph data processing result.
In another aspect, a server is provided, the server comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the steps of any of the graph data storage methods or graph data processing methods described above.
In another aspect, a computer readable storage medium having stored thereon instructions which when executed by a processor perform the steps of any of the graph data storage methods or graph data processing methods described above is provided.
In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the method of the preceding description.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
(1) The information of the vertexes and the edges associated with the vertexes can be stored in a partition number and a vertex sequence number, so that when the graph calculation is performed later, the data amount processed by the graph calculation based on the partition number and the vertex sequence number of the vertexes is smaller than that of the graph calculation based on the vertex ID, and the calculation efficiency of the graph calculation is improved.
(2) In addition, in the graph database, the edge data of the edge associated with the vertex is only used for storing the partition number, the vertex sequence number and the edge direction of the vertex at the other end, and compared with the prior art in which the IDs of the vertices at the two ends of the edge are required to be stored, the storage method provided by the embodiment of the application can also save the storage space in a storage system.
(3) When the graph calculation is carried out later, the partition number and the vertex sequence number of the vertex can be used for replacing the ID of the vertex in the related information of the vertex loaded into the memory, so that the amount of data loaded into the memory is reduced, the problem that the graph calculation cannot be completed due to memory overflow is avoided, and the calculation performance of the calculation node is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a storage system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a format of an edge data file according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for storing graph data according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of an alternative identifier provided by an embodiment of the present application;
FIG. 5 is a flowchart of a method for processing graph data according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a structure of a data storage device according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.
Before explaining the embodiment of the present application in detail, an application scenario of the embodiment of the present application is explained.
In order to increase the data processing speed of the graph database, the current graph database is usually a graph database based on a distributed storage system. The distributed storage system comprises a plurality of nodes, and each node is used for storing part of graph data in the graph database. The distributed storage system may also be referred to as a clustered storage environment.
In addition, in order to facilitate management of graph data stored on the nodes, the storage space of the nodes is divided into different partitions, and then the graph data is placed in the different partitions, respectively. The storage space on any node may include a storage space on a storage medium such as a local disk of the node, or may include a storage space on another storage device hung under the node, or may include a virtual storage space configured on the node.
In addition, it is currently necessary in many scenarios to perform graph calculations based on a graph database. Graph computation typically involves determining the distribution of vertices or edges, etc. in a graph database. For example, the number of vertices with an ingress degree greater than 2 in the graph database needs to be counted, and in this case, the graph database can be subjected to graph calculation. The vertex's degree of ingress refers to the number of edges pointing to the vertex, and the vertex's degree of egress refers to the number of edges pointing to other vertices.
The method provided by the embodiment of the application can be applied to the scene of graph calculation on the graph database. In addition, the method provided by the embodiment of the application can be applied to the distributed storage system, and can be applied to a centralized storage system. Wherein the centralized storage system comprises only one node. The embodiments of the present application are not limited to a particular class of storage systems.
In order to solve the problem that the memory occupation is large when the graph calculation is performed on the large-scale graph database due to the memory structure of the graph data in the related art, the embodiment of the application provides a memory system, wherein for each vertex in the memory system, the partition number of the partition where the vertex is located and the vertex sequence number of the vertex in the partition are used as identifiers. Because the partition number of the partition where the vertex is located and the number of bytes occupied by the vertex sequence number of the vertex in the partition are small, compared with the case that the ID of the vertex is required to be used as an identifier to be loaded into a memory, the partition number of the partition where the vertex is located and the vertex sequence number of the vertex in the partition are used as identifiers to be loaded into the memory, and the space occupied by the identifier of the vertex in the memory can be reduced. Thereby improving the computational performance of the compute node.
It should be noted that the partition numbers are uniformly configured for the partitions on each node by the storage system based on the partition policy, and thus, it is possible to determine which partition on which node is currently based on the partition numbers. Therefore, in the embodiment of the present application, the partition number of the partition where the vertex is located and the vertex sequence number of the vertex in the partition may be used as the unique identifier of the vertex.
For convenience of explanation, the data organization method in the storage system provided by the embodiment of the application is explained first.
Fig. 1 is a schematic diagram of a storage system according to an embodiment of the present application. As shown in fig. 1, the storage system includes a node i, where the node i is any one of one or more nodes included in the storage system. The storage system shown in fig. 1 may include one node or may include a plurality of nodes. Only one node is illustrated in fig. 1. For other nodes in the storage system 100 in fig. 1, the data organization on the other nodes may refer to the data organization in node i shown in fig. 1.
As shown in fig. 1, the storage space on node i includes n partitions. These n partitions are labeled P1, P2, P3, …, pn, respectively, in fig. 1 (P1, P …, pn are illustrated in fig. 1). The way in which the data in the partitions is organized is described below with the first partition P1 as an example. The data organization in other partitions is basically the same as that in P1, except that the vertices stored are different, and will not be explained here too much.
As shown in fig. 1, the partition P1 includes an edge data file, an edge index file, a mapping table indicating a mapping relationship between vertex IDs and vertex sequence numbers, a vertex data file, and a vertex index file.
It is assumed that the partition P1 stores information about a plurality of vertices, and each of the plurality of vertices corresponds to a vertex sequence number. The vertex sequence numbers may be sequentially set in accordance with the time sequence of importing the related information of the vertices into the partition. For example, 3 vertices, namely, vertex a, vertex B and vertex C, are stored in the partition P1, and the sequence of importing the relevant information of the 3 vertices into the partition P1 is vertex B, vertex C and vertex a in sequence. Thus, the vertex sequence number allocated for vertex B is No. 1, the vertex sequence number allocated for vertex C is No. 2, and the vertex sequence number allocated for vertex a is No. 3.
In order to facilitate the subsequent need to query the vertex ID, after the vertex sequence number of the vertex is determined, the correspondence between the vertex ID and the vertex sequence number of the vertex may also be written into a mapping table for indicating the mapping relationship between the vertex ID and the vertex sequence number. The mapping table may be stored in the partition in a dictionary table manner, or may be stored in the partition in other storage manners, which is not limited in the embodiment of the present application.
The edge data file is used to store edge data for each of the plurality of vertices. The edge data of each vertex comprises a vertex sequence number of another vertex of one or more edges associated with the vertex, a partition number of a partition where the vertex is located and a direction of each edge. And, the edge data of each vertex in the edge data file corresponds to the vertex sequence number of the vertex. Thus, the edge data of a vertex can be queried based on the vertex sequence number of the vertex.
Corresponding vertex sequence numbers can be added at the initial storage positions of the edge data of all the vertexes in the edge data file, so that the edge data of any vertex can be directly obtained based on the edge data file.
Alternatively, the edge data of each vertex in the edge data file may be sequentially stored according to the corresponding vertex sequence number, and then the edge index file may be configured in the partition P1. The length of each side data is stored in the side index file, and the length of each side data is also sequentially stored according to the corresponding vertex sequence number. Therefore, the storage position of the edge data of a certain vertex in the edge data file can be quickly positioned according to the edge index file. For example, when the edge data of the vertex with the vertex sequence number 3 needs to be determined currently, the starting position of the edge data of the vertex with the vertex sequence number 3 in the edge data file can be located according to the first length data and the second length data stored in the edge index file.
It should be noted that, which location in the edge index file corresponds to storing the first length and which location corresponds to storing the second length may be predetermined. In one possible implementation, if the space size in which the respective edge data lengths are stored is the same, the respective edge data lengths stored in the edge index file may be identified according to the same space size. In another possible implementation manner, if the size of the space for storing the respective edge data lengths is different, the size of the space occupied by each edge data length may be determined according to a compression algorithm used when the edge data lengths are stored in advance, so that the respective edge data lengths stored in the edge index file may be identified.
Furthermore, in the current graph database, the number of sides to which points correspond does not normally exceed tens of millions, so that the length of each side data can be stored using an integer type (INT type) data type of 32 bytes in the side index file. In this case, the size M of the space occupied by the edge index file in the memory can be estimated as follows:
m=s×len (INT), where S in the above formula represents the number of vertices stored in the partition, len (INT) identifies the size of space occupied by each edge data length, here 32 bytes.
When the length of each edge data is stored using the INT type data type in the edge index file, 2 is allowed at most due to the 32-byte INT type ^ 32 values, so that at most 2 can be stored in the edge index file ^ 32 edge data lengths. Thus, at most 2 are allowed in the current partition ^ 32 vertices.
If the estimated number of vertices stored in a single partition may be greater than 2 ^ 32, the INT type may be replaced with a 64 byte integer type (LONG type) to increase the maximum number of vertices in the theoretically allowed partition. Alternatively, the number of partitions on a node may be increased, thereby reducing the number of vertices stored on each partition.
It can be known that, in the embodiment of the present application, the data type storing the length of each side data in the side index file may be an integer data type of 32 bytes or an integer data type of 64 bytes. Which manner is specifically used may be adaptively configured based on requirements, which is not limited by the embodiments of the present application.
Fig. 2 is a schematic diagram of a data organization manner in an edge data file according to an embodiment of the present application. As shown in fig. 2, it is assumed that k vertices whose vertex sequence numbers are 0, 1, …, and k are stored in the partition. Wherein e in FIG. 2 i j The jth edge representing the vertex with vertex sequence number i. The edge data of each edge stored in the edge data file includes an edge direction and a partition number of a partition where the vertex at the other end of the edge is located and a vertex sequence number of the vertex at the other end.
l i The length of the edge data representing the vertex with vertex sequence number i. As shown in fig. 2, the edge data of each vertex in the edge data file is sequentially stored in the edge data file according to the arrow direction in fig. 2, thereby realizing each of the edge data filesThe edge data of the vertexes are sequentially stored according to the corresponding vertex sequence numbers. In this case, the lengths of the respective edge data in the edge index file are also sequentially stored in accordance with the corresponding vertex sequence numbers. So that the storage position of the edge data of a certain vertex in the edge data file can be quickly positioned later.
In addition, in performing the graph computation, if the correlation computation of the attributes of the vertices is involved, the vertex data file may also be configured in the partition P1. The vertex data file is used for storing vertex data of each vertex in the plurality of vertices, wherein the vertex data comprises attributes of the vertex. Each vertex data in the vertex data file also corresponds to a vertex sequence number for the corresponding vertex, such that the vertex data for the corresponding vertex is determined based on the vertex sequence number. Alternatively, the vertex data file may be configured when the related computation of the attribute of the vertex is not involved in the graph computation, but only the vertex data of each vertex in the vertex data file is null.
In the case where the vertex data file is arranged, the vertex data may be sequentially stored in accordance with the vertex sequence number. In this case, the vertex index file may also be configured. The vertex index file is used for storing the lengths of the various vertex data in the vertex data file. Therefore, the storage position of the vertex data of a certain vertex in the vertex data file can be quickly positioned according to the vertex index file. Specific implementations may refer to the functions of the edge index file described above, and will not be described in detail herein.
As with the edge index file, which location in the vertex index file corresponds to the vertex data length for storing that vertex may be predetermined. The specific implementation is also referred to the function of the edge index file, and will not be described in detail here.
The graph data storage method and the graph data processing method according to the embodiments of the present application are explained below based on the storage system shown in fig. 1.
Fig. 3 is a flowchart of a graph data storage method according to an embodiment of the present application. The method shown in fig. 3 may be applied to any node in the storage system shown in fig. 1, and the following embodiments will be described with reference to a target node in the storage system as an example, that is, the target node is any node in the storage system storing graph data. As shown in fig. 3, the method includes the following steps.
Step 301: and obtaining a target vertex sequence number of a target vertex stored in a target partition, wherein the target partition is any storage partition on a target node.
The target partition stores a plurality of vertexes, the vertexes respectively correspond to vertex sequence numbers, the vertex sequence number of any vertex indicates the sequence of the vertex in the plurality of vertexes, and the target vertex is any vertex in the plurality of vertexes.
As can be seen from the storage system shown in fig. 1, the target partition may also have a mapping table stored therein. The mapping table stores vertex sequence numbers corresponding to the vertex identifications, respectively, wherein the vertex identifications are vertex IDs. Thus, in step 301, the implementation manner of obtaining the target vertex sequence number of the target vertex stored in the target partition may be: the vertex sequence number corresponding to the vertex identification of the target vertex is firstly obtained from the mapping table, and if the vertex sequence number corresponding to the vertex identification of the target vertex can be obtained from the mapping table, the obtained vertex sequence number is used as the target vertex sequence number. If the vertex sequence number corresponding to the vertex identification of the target vertex cannot be obtained from the mapping table, configuring a vertex sequence number for the target vertex.
The configuring of a vertex sequence number for the target vertex may be configured according to a preset vertex sequence number generation rule. The vertex sequence number generation rule may be to add the maximum vertex sequence number in the mapping table to the reference value to obtain a newly generated vertex sequence number. The reference value may be 1, 2, etc. It should be noted that, the embodiment of the present application is not limited to a specific manner of vertex sequence number generation rule.
In addition, when the target vertex sequence number is the reconfigured vertex sequence number, the target node may further write the correspondence between the target vertex sequence number and the vertex identifier of the target vertex into the mapping table, so as to complete updating of the mapping table.
In addition, in the case where there is no mapping table in the target partition, the target node may acquire the vertex sequence number of the target vertex based on other information. For example, the vertex sequence numbers of the target nodes are automatically generated based on vertex storage logs recorded in the target partition, wherein the vertex storage logs are used for recording time records of storage of all the vertices in the target partition to the target partition, so that the vertex sequence numbers are configured for all the vertices in sequence based on the sequence of storage of all the vertices to the target partition.
Step 302: and obtaining the partition number of the partition where the other vertex of each edge of one or more edges associated with the target vertex is located, the vertex sequence number of the other vertex and the direction of each edge, and obtaining target edge data corresponding to the target vertex sequence number.
Each of the one or more edges associated with the target vertex in step 302 refers to all edges associated with the target vertex to ensure the integrity of the edge data.
In addition, the edge associated with the target vertex may be an edge pointing to the target vertex, an edge pointing to another vertex through the target vertex, or both. In the case that the edge associated with the target vertex is the edge pointing to the target vertex or any one of the edges pointing to other vertices through the target vertex, the edges associated with other vertices are determined according to the unified rule. For example, information stored in any edge data in the target partition is the edge pointing to the vertex. Or the information of the edges pointing to other vertexes through the vertexes is stored in the data of any edge of the target partition.
In an embodiment of the present application, in order to improve the efficiency of the subsequent graph computation process, the edges associated with the target vertex may include edges pointing to the target vertex and edges pointing to other vertices through the target vertex. Therefore, after the vertex sequence number of a certain vertex is obtained, the information of all the edges associated with the vertex can be obtained based on the edge data corresponding to the vertex sequence number in the vertex data file. In this way, the information of all the edges associated with the vertex can be obtained without traversing the edge data of other vertices.
Step 303: and writing the target edge data into an edge data file in the target partition, wherein the edge data file is used for storing edge data corresponding to each vertex sequence number.
After the target edge data is obtained, the target edge data can be written into an edge data file so as to carry out subsequent graph calculation based on the edge data file.
In one possible implementation, the target edge data file is stored to the edge data file in an order of the target vertex sequence number among the vertex sequence numbers of all vertices stored by the target partition. Therefore, each edge data in the edge data file is sequentially stored according to the corresponding vertex sequence number.
In this scenario, since the edge index file is also stored in the target partition, and the length of each edge data in the edge data file is stored in the edge index file, the lengths of each edge data stored in the edge index file are sequentially stored according to the corresponding vertex sequence numbers. Therefore, after the target edge data is written into the edge data file, the length of the target edge data is further stored into the edge index file according to the order of the vertex sequence numbers of the target vertices in the vertex sequence numbers of all vertices stored in the target partition.
In another possible implementation, if the target partition does not have an edge index file stored therein, in this scenario, when the target edge data is written to the edge data file, an identifier may also be added at a start position or an end position of storage of the target edge data in the edge data file. The identifier is used for indicating that the currently written target edge data is the edge data corresponding to the sequence number of the target vertex, so that an edge index file is not required to be configured in the target partition.
In addition, as shown in FIG. 1, optionally, in the case of a process in which the graph computation may involve attributes of vertices, a vertex data file may also be configured in the target partition. Vertex data corresponding to each vertex sequence number is stored in the vertex data file. In this scenario, the target node may also write vertex data for the target vertex to the vertex data file.
The vertex data of the target vertex may include the attribute of the target vertex, or may be a null value.
In addition, vertex data in the vertex data file can be sequentially stored according to corresponding vertex sequence numbers. In this scenario, the target partition further stores a vertex index file, the vertex index file stores the lengths of the vertex data in the vertex data file, and the lengths of the vertex data stored in the vertex index file are sequentially stored according to vertex sequence numbers corresponding to the vertex data. And the follow-up quick searching of the vertex data of a certain vertex in the vertex data file based on the vertex index file is facilitated.
In this scenario, after the vertex data of the target vertex is written into the vertex data file, the length of the vertex data of the target vertex is also written into the vertex index file. The specific implementation manner may refer to writing the target edge data into the edge data file and writing the length of the target edge data into the edge index file, which are not described herein.
As with the edge data file, in the embodiment of the present application, the vertex index file may not be configured for the vertex data file. In this case, when the vertex data of the target vertex is written in the vertex data file, an identifier may be added to the vertex data of the target vertex at the start position or the end position of the storage of the vertex data file. The identifier is used for indicating that the vertex data of the currently written target vertex is the vertex data corresponding to the sequence number of the target vertex, so that the vertex index file is not required to be configured in the target partition.
The foregoing steps 301 to 303 are used to explain how to update the corresponding edge data file, edge index file, vertex data file, and vertex index file for a certain vertex. Alternatively, the foregoing steps 301 and 303 may also be applied in a scenario where an edge data file, an edge index file, a vertex data file, and a vertex index file are generated based on a graph database that has been constructed in the related art. In this scenario, the generation of data in these files may be illustrated by the following steps.
1. It is assumed that the current partition Ps needs to generate the several files described above. Then first pass through the distributed meter Computationally traversing all raw edge data e of all vertices stored in the current partition Ps in parallel<id s ,id t >. e represents an edge, id s ID, ID characterizing a vertex of the edge t The ID of the other vertex characterizing the edge. Wherein, id s To set a vertex stored in the current partition Ps.
2. The current partition Ps determines vertex ids according to the data partition strategy t The partition Pt where the vertex is located queries the vertex id t Vertex sequence numbers within partition Pt.
3. Partition Pt receives query id t After the request of the vertex sequence number of (1), the mapping table map storing the vertex ID and the vertex sequence number is stored<id i ,N t >Is searched. If inquire about id t The corresponding vertex sequence number is directly returned to the current partition Ps, otherwise, the vertex sequence number nt=nmax+1 (Nmax is the maximum vertex sequence number allocated to the partition Pt) is allocated to the vertex id t And will<id t ,Nt>Added to the mapping table. Id to be assigned t The corresponding vertex sequence number is returned to the current partition Ps.
4. Replacing e according to the inquired Nt and Pt<id s ,id t >Id of (3) t Step 3) querying id is then repeated on Ps partition s Vertex sequence number Ns of (c), and replace Ns with e<id s ,id t >Id of (3) s . E to be replaced by<Ns,Pt+Nt>And the direction of the edge e is stored in an edge log file.
5. When all edges have replaced the ID, the mapping table updated last time is stored in the current partition Ps. And then sorting records in the edge log file according to Ns, merging edges of the same vertex together to obtain edge data of the vertex, and writing the edge data of each vertex into the edge data file according to the sequence number of the vertex. And writing the length of the merged edge data into the edge index file according to the sequence number sequence of the vertex.
6. If the subsequent graph calculation process involves vertex attributes, after assigning sequence numbers to vertices, the attributes corresponding to the vertices can be written into the vertex data file as vertex data, and the vertex data length can be written into the vertex index file.
E is as in the above steps 1-4<id s ,id t >Replaced by e<Ns,Pt+Nt>The process of (2) can be further illustrated by figure 4. The specific process in fig. 4 is not described here.
By the graph data storage method shown in fig. 3, at least the following technical effects can be achieved in the embodiment of the application:
(1) The information of the vertexes and the edges associated with the vertexes can be stored in a partition number and a vertex sequence number, so that when the graph calculation is performed later, the data amount processed by the graph calculation based on the partition number and the vertex sequence number of the vertexes is smaller than that of the graph calculation based on the vertex ID, and the calculation efficiency of the graph calculation is improved.
(2) In addition, in the graph database, the edge data of the edge associated with the vertex is only used for storing the partition number, the vertex sequence number and the edge direction of the vertex at the other end, and compared with the prior art in which the IDs of the vertices at the two ends of the edge are required to be stored, the storage method provided by the embodiment of the application can also save the storage space in a storage system.
(3) When the graph calculation is carried out later, the partition number and the vertex sequence number of the vertex can be used for replacing the ID of the vertex in the related information of the vertex loaded into the memory, so that the amount of data loaded into the memory is reduced, and the problem that the graph calculation cannot be completed due to memory overflow is avoided.
Based on the storage system shown in fig. 1 and the graph data storage method shown in fig. 3, the embodiment of the application also provides a graph data processing method, which is used for explaining how to process graph data in the graph calculation process. Fig. 5 is a flowchart of a graph data processing method according to an embodiment of the present application. The method is applied to a computing node, and it should be noted that the computing node may be a node in the storage system shown in fig. 1, and in this scenario, the computing node and a target node in the following embodiments are the same node. Alternatively, the compute node may also be a node that is distributed to nodes of the storage system and is used solely for graph computation. As shown in fig. 5, the method comprises the following steps:
step 501: and determining the edge data to be processed in the edge data file stored in the target partition of the target node.
In the embodiment of the application, the edge index file stored in the target partition of the target node can be loaded into the memory of the computing node in advance. Based on the above-mentioned edge index file function, the edge data of each vertex in the edge data file in the target partition on the target node can be sequentially queried based on the edge index file loaded in the memory until the edge data to be processed is queried.
That is, in the case where the edge index file is cached in the memory, the edge data to be processed may be determined according to the length of each edge data stored in the edge index file.
Furthermore, the graph data processing involved in current graph computation processes typically includes two types of graph data processing. Graph data processing for vertices and graph data processing types for edges. The following description will be given by taking an example of graph data processing for the edges. The graph data processing for the edges can be divided into two types, one type is to iterate all edge data to count some indexes, and the other type is to query the edge data of a certain designated vertex.
Therefore, in the case where the graph data processing is an iterative processing, the side data to be processed is the current side data to be processed which is sequentially iterated according to the respective side data lengths. Under the scene, the indexes in the edge data and the edge data index file in the iteration process are sequentially read, and the designed storage structure is smaller, so that the iteration performance is better.
In the case that the graph data processing is query processing, the edge data to be processed is edge data positioned according to the length of each edge data and the vertex sequence number of the vertex to be queried currently. According to the vertex sequence number n and the edge index file, the starting position Pos-n of the edge data of the vertex in the edge data file is calculated, and can be expressed by the following formula:
Pos-n=∑ (i=0 to n-1) li, wherein li is the vertex sequence number preceding nThe length of the edge data corresponding to any vertex sequence number.
In addition, considering that each query is repeatedly calculated from the starting position of the edge index file, the position information of the edge data of the middle position of the cache part of the memory of the query structure can be based on the first iteration process. For example, the position information of the edge data corresponding to the vertex sequence number 1000 integer multiple is cached in the memory. Therefore, when the edge data of the vertex sequence numbers of the adjacent positions are queried next time, only the data in a small amount of edge index files are calculated according to the cache data.
That is, the computing node further stores a position of the edge data corresponding to the reference vertex sequence number in the edge data file, where the reference vertex sequence number is one or more vertex sequence numbers among the vertex sequence numbers of the plurality of vertices stored in the target partition. In this case, the current edge data to be processed is edge data located according to the length of each edge data, the position of the edge data corresponding to the sequence number of the reference vertex in the edge data file, and the sequence number of the vertex of the current vertex to be queried.
Optionally, the edge index file and the edge data file stored in the target partition of the target node may also be loaded into the memory of the computing node in advance. Under the scene, the query of the edge data to be processed can be realized in the memory of the computing node, so that the efficiency of determining the edge data to be processed is improved.
In addition, when the edge index file is not arranged in the target partition, the edge index file for each edge data of the edge data file in the target partition can be generated in the memory when each edge data of the edge data file in the target partition is traversed in sequence when the graph data processing is performed for the first time. When the graph data processing is carried out again later, the edge data to be processed is positioned based on the edge index file, instead of traversing all the edge data of the edge data file in the target partition again and then positioning the edge data to be processed, so that the computing performance of the computing node is improved.
Step 502: and carrying out graph data processing on the edge data to be processed.
The specific operation of graph data processing on the edge data to be processed in step 502 depends on the functions that the current graph computation needs to implement. The embodiment of the application is not limited to a specific implementation mode for carrying out graph data processing on the edge data to be processed.
In addition, step 501 and step 502 are described by taking edge data as an example. Optionally, in an embodiment of the present application, the graph data processing may further include an iterative process or a query process on vertex data.
In one possible implementation, a vertex data file is stored in the target partition, where vertex data for each of the vertex sequence numbers of the plurality of vertices is stored. In this scenario, vertex data corresponding to the target vertex sequence number may also be obtained from the vertex data file according to the target vertex sequence number to be queried.
In addition, vertex data of each of the plurality of vertices is sequentially stored according to corresponding vertex sequence numbers, a vertex index file is also stored in the target partition, the lengths of all the vertex data in the vertex data file are stored in the vertex index file, and the lengths of all the vertex data stored in the vertex index file are sequentially stored according to the corresponding vertex sequence numbers. In this scenario, the specific process of obtaining vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number may be: determining the position of vertex data corresponding to the target vertex sequence number in the vertex data file according to the target vertex sequence number and the index file; and acquiring the vertex data corresponding to the target vertex sequence number from the vertex data file according to the position of the vertex data corresponding to the target vertex sequence number in the vertex data file.
In addition, a mapping table is stored in the target partition, and a plurality of vertex sequence numbers corresponding to the plurality of vertex identifications respectively are stored in the mapping table. In this scenario, after the graph data processing is performed, the vertex sequence number of the vertex obtained after the graph data processing may also be determined; and obtaining vertex identifications corresponding to the determined vertex sequence numbers according to the mapping table, and taking the vertex identifications as graph data processing results, wherein the vertex identifications are also vertex IDs. And displaying the data processing result to a user. That is, the identification of the vertex ID is used as the identification of the vertex outside, and the partition number and the vertex sequence number of the partition where the vertex is located are used as the identification of the vertex inside the storage system. Therefore, the computing performance of the computing node and the storage performance of the storage system can be improved, and the outward display function of the vertex can be improved, so that a user can quickly analyze the graph data processing result in the next step.
By the graph data processing method shown in fig. 5, at least the following technical effects can be achieved in the embodiment of the present application:
(1) The information of the vertexes and the edges associated with the vertexes can be stored in a partition number and a vertex sequence number, so that when the graph calculation is performed later, the data amount processed by the graph calculation based on the partition number and the vertex sequence number of the vertexes is smaller than that of the graph calculation based on the vertex ID, and the calculation efficiency of the graph calculation is improved.
(2) When the graph calculation is carried out later, the partition number and the vertex sequence number of the vertex can be used for replacing the ID of the vertex in the related information of the vertex loaded into the memory, so that the amount of data loaded into the memory is reduced, and the problem that the graph calculation cannot be completed due to memory overflow is avoided.
All the above optional technical solutions may be combined according to any choice to form an optional embodiment of the present application, and the embodiments of the present application will not be described in detail.
Fig. 6 is a schematic structural diagram of a data storage device according to an embodiment of the present application. The method is applied to a target node in the storage system storing the graph data, wherein the target node is any node in the storage system.
As shown in fig. 6, the apparatus 600 includes:
the obtaining module 601 is configured to obtain a target vertex sequence number of a target vertex stored in a target partition, where the target partition is any storage partition on a target node, the target partition stores a plurality of vertices, the plurality of vertices respectively correspond to the vertex sequence numbers, the vertex sequence number of any vertex indicates a ranking of any vertex in the plurality of vertices, and the target vertex is any vertex of the plurality of vertices;
the obtaining module 601 is further configured to obtain a partition number of a partition where another vertex of one or more edges associated with the target vertex is located, a vertex sequence number of the other vertex, and a direction of each edge, to obtain target edge data corresponding to the target vertex sequence number;
The writing module 602 is configured to write target edge data into an edge data file in the target partition, where the edge data file is configured to store edge data corresponding to each vertex sequence number.
Optionally, the edge data in the edge data file are sequentially stored according to the corresponding vertex sequence numbers;
the target partition is also stored with an edge index file, the edge index file stores the length of each edge data in the edge data file, and the length of each edge data stored in the edge index file is sequentially stored according to the corresponding vertex sequence number.
Optionally, the data type storing the length of each side data in the side index file is an integer data type of 32 bytes or an integer data type of 64 bytes.
Alternatively, the process may be carried out in a single-stage,
the writing module is further configured to write a correspondence between the sequence numbers of the target vertices and vertex identifications of the target vertices into a mapping table in the target partition, where the mapping table stores vertex sequence numbers corresponding to the vertex identifications of the plurality of vertices.
Alternatively, the process may be carried out in a single-stage,
the writing module is also used for writing the vertex data of the target vertex into a vertex data file in the target partition, wherein the vertex data file stores vertex data corresponding to each vertex sequence number.
Optionally, vertex data in the vertex data file are sequentially stored according to corresponding vertex sequence numbers;
The target partition is also stored with a vertex index file, the vertex index file stores the length of each vertex data in the vertex data file, and the lengths of each vertex data stored in the vertex index file are sequentially stored according to the vertex sequence numbers corresponding to the vertex data.
Optionally, the vertex data of the target vertex includes attributes of the target vertex.
By means of the device shown in fig. 6, at least the following technical effects can be achieved in the embodiment of the present application:
(1) The information of the vertexes and the edges associated with the vertexes can be stored in a partition number and a vertex sequence number, so that when the graph calculation is performed later, the data amount processed by the graph calculation based on the partition number and the vertex sequence number of the vertexes is smaller than that of the graph calculation based on the vertex ID, and the calculation efficiency of the graph calculation is improved.
(2) In addition, in the graph database, the edge data of the edge associated with the vertex is only used for storing the partition number, the vertex sequence number and the edge direction of the vertex at the other end, and compared with the prior art in which the IDs of the vertices at the two ends of the edge are required to be stored, the storage method provided by the embodiment of the application can also save the storage space in a storage system.
(3) When the graph calculation is carried out later, the partition number and the vertex sequence number of the vertex can be used for replacing the ID of the vertex in the related information of the vertex loaded into the memory, so that the amount of data loaded into the memory is reduced, and the problem that the graph calculation cannot be completed due to memory overflow is avoided.
It should be noted that: in the graph data storage device provided in the above embodiment, when graph data storage is performed, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image data storage device provided in the above embodiment and the image data storage method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not repeated here.
Fig. 7 is a schematic structural diagram of a data processing device according to an embodiment of the present application. The method is applied to the computing nodes in the storage system with the graph data. As shown in fig. 7, the apparatus 700 includes:
a determining module 701, configured to determine edge data to be processed in an edge data file stored in a target partition of a target node;
The processing module 702 is configured to perform graph data processing according to edge data to be processed;
the target node is any node in a storage system storing graph data, the target partition is any storage partition on the target node, edge data corresponding to vertex sequence numbers of a plurality of vertexes stored in the target partition are stored in an edge data file, the edge data comprises partition numbers of partitions where the other vertex of one or more edges related to the corresponding vertex is located, vertex sequence numbers of the other vertex and directions of the edges, and the vertex sequence number of any vertex in the target partition indicates the sequence of any vertex in the plurality of vertexes.
Optionally, the edge data in the edge data file is sequentially stored according to the corresponding vertex sequence number, the target partition is further stored with an edge index file, the edge index file stores the length of each edge data in the edge data file, and the length of each edge data stored in the edge index file is sequentially stored according to the corresponding vertex sequence number;
the determining module is used for:
and determining the edge data to be processed according to the length of each edge data stored in the edge index file.
Optionally, in the case that the graph data processing is iterative processing, the edge data to be processed is current edge data to be processed which is sequentially iterated according to the lengths of the edge data.
Optionally, in the case that the graph data processing is query processing, the edge data to be processed is edge data located according to the length of each edge data and the vertex sequence number of the vertex to be queried currently.
Optionally, the computing node further stores a position of the edge data corresponding to a reference vertex sequence number in the edge data file, where the reference vertex sequence number is one or more vertex sequence numbers of the plurality of vertices stored in the target partition;
the current edge data to be processed is the edge data positioned according to the length of each edge data, the position of the edge data corresponding to the sequence number of the reference vertex in the edge data file, and the sequence number of the vertex of the current vertex to be inquired.
Optionally, a vertex data file is stored in the target partition, and vertex data corresponding to vertex sequence numbers of the multiple vertices is stored in the vertex data file;
the apparatus further comprises:
and the acquisition module is used for acquiring vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number to be queried.
Optionally, vertex data of each of the plurality of vertices is sequentially stored according to corresponding vertex sequence numbers, a vertex index file is also stored in the target partition, the lengths of the vertex data in the vertex data file are stored in the vertex index file, and the lengths of the vertex data stored in the vertex index file are sequentially stored according to the corresponding vertex sequence numbers;
The acquisition module is used for determining the position of vertex data corresponding to the target vertex sequence number in the vertex data file according to the target vertex sequence number and the index file;
and acquiring the vertex data corresponding to the target vertex sequence number from the vertex data file according to the position of the vertex data corresponding to the target vertex sequence number in the vertex data file.
Optionally, a mapping table is stored in the target partition, and the mapping table stores a plurality of vertex sequence numbers corresponding to the plurality of vertex identifications respectively;
the apparatus further comprises:
the determining module is used for determining the vertex sequence number of the vertex obtained after the graph data processing;
and the acquisition module is used for acquiring the vertex identifications corresponding to the determined vertex sequence numbers according to the mapping table, and taking the vertex identifications as graph data processing results.
By means of the device shown in fig. 7, at least the following technical effects can be achieved in the embodiment of the present application:
(1) The information of the vertexes and the edges associated with the vertexes can be stored in a partition number and a vertex sequence number, so that when the graph calculation is performed later, the data amount processed by the graph calculation based on the partition number and the vertex sequence number of the vertexes is smaller than that of the graph calculation based on the vertex ID, and the calculation efficiency of the graph calculation is improved.
(2) When the graph calculation is carried out later, the partition number and the vertex sequence number of the vertex can be used for replacing the ID of the vertex in the related information of the vertex loaded into the memory, so that the amount of data loaded into the memory is reduced, and the problem that the graph calculation cannot be completed due to memory overflow is avoided.
It should be noted that: in the graph data processing apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration when performing graph data processing, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the image data processing apparatus and the image data processing method embodiment provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not repeated herein.
Fig. 8 is a schematic structural diagram of a server according to an embodiment of the present application. Any of the foregoing embodiments or computing nodes may be implemented by the server. Specifically, the present application relates to a method for manufacturing a semiconductor device.
The server 800 includes a Central Processing Unit (CPU) 801, a system memory 804 including a Random Access Memory (RAM) 802 and a Read Only Memory (ROM) 803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The server 800 also includes a basic input/output system (I/O system) 806 for facilitating the transfer of information between various devices within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.
The basic input/output system 806 includes a display 808 for displaying information and an input device 809, such as a mouse, keyboard, or the like, for user input of information. Wherein both the display 808 and the input device 809 are connected to the central processing unit 801 via an input/output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input/output controller 810 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer readable medium (not shown) such as a hard disk or CD-ROM drive.
Computer readable media may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that computer storage media are not limited to the ones described above. The system memory 804 and mass storage device 807 described above may be collectively referred to as memory.
According to various embodiments of the application, server 800 may also operate by a remote computer connected to the network through a network, such as the Internet. I.e., server 800 may be connected to a network 812 through a network interface unit 811 connected to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.
The memory also includes one or more programs, one or more programs stored in the memory and configured to be executed by the CPU. The one or more programs include instructions for performing the graph data storage or processing methods provided by embodiments of the present application.
The embodiment of the present application also provides a non-transitory computer readable storage medium, which when executed by a processor of a server, enables the server to perform the graph data storage or processing method provided in the above embodiment.
The embodiment of the application also provides a computer program product containing instructions, which when run on a server, cause the server to execute the graph data storage or processing method provided by the embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the embodiments of the present application, but is intended to cover any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the embodiments of the present application.

Claims (19)

1. A graph data storage method, applied to a target node in a storage system storing graph data, where the target node is any node in the storage system, the method comprising:
obtaining a target vertex sequence number of a target vertex stored in a target partition, wherein the target partition is any storage partition on the target node, a plurality of vertices are stored in the target partition, the vertices respectively correspond to the vertex sequence numbers, the vertex sequence number of any vertex indicates the sequence of any vertex in the vertices, the sequence of the vertices indicates the time sequence of the vertices introduced into the target partition, and the target vertex is any vertex in the vertices;
Acquiring target edge data corresponding to the target vertex sequence number, wherein the target edge data comprises a partition number of a partition where another vertex of one or more edges associated with the target vertex is located, the vertex sequence number of the other vertex and the direction of each edge;
and writing the target edge data into an edge data file in the target partition, wherein the edge data file is used for storing edge data corresponding to each vertex sequence number.
2. The method of claim 1, wherein the edge data in the edge data file is stored sequentially according to the corresponding vertex sequence number;
the target partition is also stored with an edge index file, the edge index file stores the length of each edge data in the edge data file, and the lengths of each edge data stored in the edge index file are sequentially stored according to the corresponding vertex sequence numbers.
3. The method of claim 2, wherein the data type storing the length of each side data in the side index file is an integer data type of 32 bytes or an integer data type of 64 bytes.
4. The method of claim 1, wherein the method further comprises:
Writing the corresponding relation between the target vertex sequence number and the vertex identifications of the target vertices into a mapping table in the target partition, wherein the mapping table stores vertex sequence numbers respectively corresponding to the vertex identifications of the plurality of vertices.
5. The method of any one of claims 1 to 4, further comprising:
and writing the vertex data of the target vertex into a vertex data file in the target partition, wherein the vertex data file stores vertex data corresponding to each vertex sequence number.
6. The method of claim 5, wherein the vertex data in the vertex data file is stored sequentially according to corresponding vertex sequence numbers;
and the target partition is also stored with a vertex index file, the vertex index file is stored with the lengths of the vertex data in the vertex data file, and the lengths of the vertex data stored in the vertex index file are sequentially stored according to the vertex sequence numbers corresponding to the vertex data.
7. The method of claim 5, wherein the vertex data for the target vertex comprises attributes of the target vertex.
8. A graph data processing method applied to a computing node in a storage system in which graph data is stored, the method comprising:
Determining edge data to be processed in an edge data file stored in a target partition of a target node;
carrying out graph data processing according to the edge data to be processed;
the target node is any node in the storage system, the target partition is any storage partition on the target node, the edge data file stores edge data corresponding to vertex sequence numbers of a plurality of vertices stored in the target partition, the edge data comprises partition numbers of partitions where another vertex of one or more edges related to the corresponding vertex is located, vertex sequence numbers of the other vertex, and directions of the edges, and the vertex sequence numbers of any vertex in the target partition indicate the ordering of the any vertex in the plurality of vertices.
9. The method of claim 8, wherein the edge data in the edge data file is sequentially stored according to the corresponding vertex sequence number, the target partition further stores an edge index file, the edge index file stores the length of each edge data in the edge data file, and the length of each edge data stored in the edge index file is sequentially stored according to the corresponding vertex sequence number;
The determining the edge data to be processed in the edge data file stored in the target partition of the target node includes:
and determining the edge data to be processed according to the length of each edge data stored in the edge index file.
10. The method of claim 9, wherein in the case where the graph data processing is an iterative processing, the edge data to be processed is current edge data to be processed which is sequentially iterated according to the respective edge data lengths.
11. The method of claim 9, wherein in the case where the graph data processing is a query processing, the edge data to be processed is edge data located according to the lengths of the edge data and the vertex sequence numbers of vertices currently to be queried.
12. The method of claim 11, wherein the computing node further stores a location of edge data corresponding to a reference vertex sequence number in the edge data file, the reference vertex sequence number being one or more of vertex sequence numbers of the plurality of vertices stored in the target partition;
the current edge data to be processed is edge data positioned according to the length of each edge data, the position of the edge data corresponding to the sequence number of the reference vertex in the edge data file, and the sequence number of the vertex of the current vertex to be queried.
13. The method of claim 8, wherein a vertex data file is stored in the target partition, and vertex data corresponding to vertex sequence numbers of the plurality of vertices is stored in the vertex data file;
the method further comprises the steps of:
and obtaining vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number to be queried.
14. The method of claim 13, wherein the vertex data of each of the plurality of vertices is sequentially stored according to a corresponding vertex sequence number, a vertex index file is further stored in the target partition, the length of each vertex data in the vertex data file is stored in the vertex index file, and the length of each vertex data stored in the vertex index file is sequentially stored according to a corresponding vertex sequence number;
the obtaining vertex data corresponding to the target vertex sequence number from the vertex data file according to the target vertex sequence number includes:
determining the position of vertex data corresponding to the target vertex sequence number in the vertex data file according to the target vertex sequence number and the index file;
and acquiring the vertex data corresponding to the target vertex sequence number from the vertex data file according to the position of the vertex data corresponding to the target vertex sequence number in the vertex data file.
15. The method according to any one of claims 9 to 12, wherein a mapping table is stored in the target partition, and a plurality of vertex sequence numbers corresponding to the plurality of vertex identifications respectively are stored in the mapping table;
after processing the graph data stored on the target partition according to the edge index file, the method further includes:
determining vertex sequence numbers of vertexes obtained after the graph data processing;
and obtaining vertex identifications corresponding to the determined vertex sequence numbers according to the mapping table, and taking the vertex identifications as graph data processing results.
16. A graph data storage device, applied to a target node in a storage system storing graph data, the target node being any node in the storage system, the device comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a target vertex sequence number of a target vertex stored in a target partition, the target partition is any storage partition on the target node, a plurality of vertices are stored in the target partition, the vertices respectively correspond to the vertex sequence numbers, the vertex sequence number of any vertex indicates the ordering of the any vertex in the vertices, and the target vertex is any vertex in the vertices;
The obtaining module is further configured to obtain target edge data corresponding to the sequence number of the target vertex, where the target edge data includes a partition number of a partition where another vertex of one or more edges associated with the target vertex is located, the sequence number of the vertex of the other vertex, and a direction of each edge;
and the writing module is used for writing the target edge data into an edge data file in the target partition, and the edge data file is used for storing the edge data corresponding to each vertex sequence number.
17. A graph data processing apparatus for use with a computing node in a storage system in which graph data is stored, the apparatus comprising:
the determining module is used for determining the edge data to be processed in the edge data file stored in the target partition of the target node;
the processing module is used for carrying out graph data processing according to the edge data to be processed;
the target node is any node in the storage system, the target partition is any storage partition on the target node, the edge data file stores edge data corresponding to vertex sequence numbers of a plurality of vertices stored in the target partition, the edge data comprises partition numbers of partitions where another vertex of one or more edges related to the corresponding vertex is located, vertex sequence numbers of the other vertex, and directions of the edges, and the vertex sequence numbers of any vertex in the target partition indicate the ordering of the any vertex in the plurality of vertices.
18. A server, the server comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to perform the steps of the method of any of the preceding claims 1 to 7 or claims 8 to 15.
19. A computer readable storage medium having stored thereon instructions which, when executed by a processor, implement the steps of the method of any of the preceding claims 1 to 7 or claims 8 to 15.
CN202011192437.1A 2020-10-30 2020-10-30 Graph data storage and processing method and device and computer storage medium Active CN112287182B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011192437.1A CN112287182B (en) 2020-10-30 2020-10-30 Graph data storage and processing method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011192437.1A CN112287182B (en) 2020-10-30 2020-10-30 Graph data storage and processing method and device and computer storage medium

Publications (2)

Publication Number Publication Date
CN112287182A CN112287182A (en) 2021-01-29
CN112287182B true CN112287182B (en) 2023-09-19

Family

ID=74352978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011192437.1A Active CN112287182B (en) 2020-10-30 2020-10-30 Graph data storage and processing method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN112287182B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449153B (en) * 2021-06-28 2023-09-26 湖南大学 Index construction method, apparatus, computer device and storage medium
CN113468275B (en) * 2021-07-28 2024-07-30 浙江大华技术股份有限公司 Data importing method and device of graph database, storage medium and electronic equipment
CN114186100B (en) * 2021-10-08 2024-05-31 支付宝(杭州)信息技术有限公司 Data storage and query method, device and database system
CN113609318B (en) * 2021-10-09 2022-03-22 北京海致星图科技有限公司 Graph data processing method and device, electronic equipment and storage medium
CN113722520B (en) * 2021-11-02 2022-05-03 支付宝(杭州)信息技术有限公司 Graph data query method and device
CN114095958B (en) * 2021-11-16 2023-09-12 新华三大数据技术有限公司 Cell coverage area determining method, device, equipment and storage medium
CN114077680B (en) * 2022-01-07 2022-05-17 支付宝(杭州)信息技术有限公司 Graph data storage method, system and device
CN114254164B (en) * 2022-03-01 2022-06-28 全球能源互联网研究院有限公司 Graph data storage method and device
CN114282073B (en) * 2022-03-02 2022-07-15 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
CN114791968A (en) * 2022-06-27 2022-07-26 杭州连通图科技有限公司 Processing method, device and system for graph calculation and computer readable medium
CN115203489B (en) * 2022-09-15 2023-02-03 阿里巴巴(中国)有限公司 Dynamic graph data storage system, reading system and corresponding method
CN117235120B (en) * 2023-11-09 2024-08-16 支付宝(杭州)信息技术有限公司 Hypergraph data storage and query method and device with time sequence characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522428A (en) * 2018-09-17 2019-03-26 华中科技大学 A kind of external memory access method of the figure computing system based on index positioning
CN110688055A (en) * 2018-07-04 2020-01-14 清华大学 Data access method and system in large graph calculation
CN110795417A (en) * 2019-10-30 2020-02-14 北京明略软件系统有限公司 System and method for storing knowledge graph
CN111241353A (en) * 2020-01-16 2020-06-05 支付宝(杭州)信息技术有限公司 Method, device and equipment for partitioning graph data
CN111694834A (en) * 2019-03-15 2020-09-22 杭州海康威视数字技术股份有限公司 Method, device and equipment for putting picture data into storage and readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10061841B2 (en) * 2015-10-21 2018-08-28 International Business Machines Corporation Fast path traversal in a relational database-based graph structure

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688055A (en) * 2018-07-04 2020-01-14 清华大学 Data access method and system in large graph calculation
CN109522428A (en) * 2018-09-17 2019-03-26 华中科技大学 A kind of external memory access method of the figure computing system based on index positioning
CN111694834A (en) * 2019-03-15 2020-09-22 杭州海康威视数字技术股份有限公司 Method, device and equipment for putting picture data into storage and readable storage medium
CN110795417A (en) * 2019-10-30 2020-02-14 北京明略软件系统有限公司 System and method for storing knowledge graph
CN111241353A (en) * 2020-01-16 2020-06-05 支付宝(杭州)信息技术有限公司 Method, device and equipment for partitioning graph data

Also Published As

Publication number Publication date
CN112287182A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN112287182B (en) Graph data storage and processing method and device and computer storage medium
CN112363979B (en) Distributed index method and system based on graph database
CN106682215B (en) Data processing method and management node
CN110555001B (en) Data processing method, device, terminal and medium
CN112015820A (en) Method, system, electronic device and storage medium for implementing distributed graph database
CN111459885B (en) Data processing method and device, computer equipment and storage medium
WO2017161540A1 (en) Data query method, data object storage method and data system
US20110179013A1 (en) Search Log Online Analytic Processing
CN112434027A (en) Indexing method and device for multi-dimensional data, computer equipment and storage medium
CN114691721A (en) Graph data query method and device, electronic equipment and storage medium
US8548980B2 (en) Accelerating queries based on exact knowledge of specific rows satisfying local conditions
CN116089414B (en) Time sequence database writing performance optimization method and device based on mass data scene
CN111259062B (en) Method and device capable of guaranteeing sequence of statement result set of full-table query of distributed database
CN111666302A (en) User ranking query method, device, equipment and storage medium
Liroz-Gistau et al. Dynamic workload-based partitioning algorithms for continuously growing databases
CN115168499B (en) Database table fragmentation method and device, computer equipment and storage medium
CN116010345A (en) Method, device and equipment for realizing table service scheme of flow batch integrated data lake
Gedik et al. Disk-based management of interaction graphs
CN112307272B (en) Method, device, computing equipment and storage medium for determining relation information between objects
CN114048219A (en) Graph database updating method and device
CN117540056B (en) Method, device, computer equipment and storage medium for data query
CN111949439B (en) Database-based data file updating method and device
US11966393B2 (en) Adaptive data prefetch
CN117786164A (en) Method and device for data export in distributed protogram database
WO2017131795A1 (en) Processing time-varying data using an adjacency list representation of a time-varying graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant