CN117892764A

CN117892764A - Method, device, computer equipment, medium and product for generating graph neural network

Info

Publication number: CN117892764A
Application number: CN202311817973.XA
Authority: CN
Inventors: 肖国庆; 夏立; 李肯立; 陈玥丹; 陈红阳; 唐卓; 阳王东; 段明星; 刘楚波; 周旭; 金纪勇
Original assignee: Shenzhen Research Institute Of Hunan University; Zhejiang Lab
Current assignee: Shenzhen Research Institute Of Hunan University; Zhejiang Lab
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-04-16

Abstract

The application relates to a graph neural network generation method, a graph neural network generation device, a computer device, a storage medium and a computer program product. Comprising the following steps: in the training process of the graph neural network, acquiring the average degree of graph topology data through the graph neural network, and determining an adjacency matrix of the graph topology data; determining graph dividing force of the graph topology data based on the average degree under the condition that the average degree meets the degree condition; based on the number of adjacent points of each vertex determined by dividing the adjacent matrix according to the graph dividing force, ordering each vertex to obtain an ordering result of each vertex; based on a dynamic neighbor division table determined by the sequencing result, sequentially aggregating a plurality of sub-features of the topological data of the graph with the adjacent matrix respectively to obtain a sub-aggregation result of each sub-feature; and integrating the sub-aggregation results, outputting an aggregation result corresponding to the graph topology data, and generating a trained graph neural network based on the aggregation result. The training speed of the graph neural network can be accelerated by adopting the method.

Description

Method, device, computer equipment, medium and product for generating graph neural network

Technical Field

The present application relates to the field of artificial intelligence acceleration technology, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for generating a graph neural network.

Background

In recent years, graph topology data plays an important role in practical applications, such as social networks, recommendation systems, and knowledge graphs. With the development of the neural network, the graph rolling network, the graph neural network and other models are further developed, and the graph neural network can effectively cope with deep learning tasks of various graph topology data, such as node classification and link prediction.

However, most of the graph topologies used in the training process of the most advanced graph neural network method at present show great sparsity and irregularity, which brings difficulty and challenge to the optimization operation in the training process. The existing graphic neural network acceleration method focuses on accelerating the whole training process of the graphic neural network from an algorithm level, and although the methods are optimized for a GPU (graphic processor, graphic Process Unit) architecture, the optimization effect is limited, so that the problem of low training efficiency of the graphic neural network exists.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a graph neural network generation method, apparatus, computer device, computer readable storage medium, and computer program product that can accelerate the training of the graph neural network.

In a first aspect, the present application provides a method for generating a graph neural network, the method comprising:

In the training process of the graph neural network, acquiring the average degree of graph topology data through the graph neural network, and determining an adjacency matrix corresponding to the graph topology data; the graph topology data includes a plurality of interconnected vertices;

Determining graph dividing force corresponding to the graph topology data based on the average degree when the average degree meets the degree condition;

based on the number of adjacent points of each vertex determined by dividing the adjacent matrix according to the graph dividing force, ordering each vertex to obtain an ordering result of each vertex;

Based on a dynamic neighbor division table determined by the sequencing result, respectively and sequentially carrying out aggregation treatment on a plurality of sub-features of the graph topology data and the adjacent matrix to obtain respective sub-aggregation results of each sub-feature;

And integrating the sub-aggregation results, outputting an aggregation result corresponding to the graph topology data, and generating a trained graph neural network based on the aggregation result.

In one embodiment, the obtaining, by the graph neural network, the average degree of the graph topology data includes:

performing graph topology analysis on graph topology data by using a graph neural network to obtain the vertex number and the edge number corresponding to the graph topology data;

An average degree of the graph topology data is determined based on the number of vertices and the number of edges.

In one embodiment, the sorting the vertices based on the number of adjacent points of the vertices determined by dividing the adjacent matrix according to the graph division strength, to obtain a sorting result of the vertices, includes:

Dividing the adjacent matrix based on the graph dividing force to obtain a plurality of fragment sub-graphs;

Traversing each sub-graph to obtain the number of adjacent points of each vertex in each sub-graph;

And according to the number of adjacent points of each vertex in each fragment subgraph, sequencing the vertices to obtain a sequencing result of the vertices.

In one embodiment, the step of sequentially aggregating the plurality of sub-features of the graph topology data with the adjacency matrix based on the dynamic neighbor partitioning table determined by the sorting result, before obtaining the respective sub-aggregation result of each sub-feature, includes:

determining the neighbor group scale of each vertex based on the adjacency matrix, the average degree and the degree of each vertex;

and dividing the neighbor group of each vertex based on the neighbor group scale of each vertex to obtain the neighbor group of each vertex.

In one embodiment, after the obtaining the neighbor set of each vertex, the method includes:

According to the sequencing result of each vertex in the topological data of the graph, sequencing each vertex after dividing the neighbor group, and obtaining the neighbor number and the offset of each vertex after sequencing;

and obtaining a dynamic neighbor partition table based on the number of neighbors of each vertex and the offset.

In one embodiment, the aggregating the multiple sub-features of the graph topology data with the adjacency matrix in turn based on the dynamic neighbor partitioning table determined by the ordering result to obtain respective sub-aggregation results of each sub-feature, including:

acquiring characteristic information obtained by characteristic analysis of the map topology data;

performing feature division on the map topology data based on the feature information to obtain a plurality of sub-features;

and according to the ordering information in the dynamic neighbor partitioning table, sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix to obtain a sub-aggregation result corresponding to each sub-feature.

In a second aspect, the present application provides a graph neural network generation apparatus, the apparatus comprising:

The data acquisition module is used for acquiring the average degree of the graph topology data through the graph neural network in the training process of the graph neural network and determining an adjacent matrix corresponding to the graph topology data; the graph topology data includes a plurality of interconnected vertices;

The diagram division strength determining module is used for determining diagram division strength corresponding to the diagram topology data based on the average degree under the condition that the average degree meets the degree condition;

The sorting module is used for sorting the vertexes based on the number of the adjacent points of the vertexes determined by dividing the adjacent matrix according to the graph dividing force so as to obtain a sorting result of the vertexes;

The aggregation module is used for sequentially carrying out aggregation treatment on a plurality of sub-features of the topological data of the graph and the adjacent matrix respectively based on the dynamic neighbor partitioning table determined by the sequencing result to obtain respective sub-aggregation results of each sub-feature;

and the integration module is used for integrating the sub-aggregation results, outputting an aggregation result corresponding to the graph topology data and generating a trained graph neural network based on the aggregation result.

In a third aspect, the present application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.

In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.

According to the graph neural network generation method, the device, the computer equipment, the storage medium and the computer program product, in the training process of the graph neural network, the average degree of graph topology data is obtained through the graph neural network, the adjacent matrix corresponding to the graph topology data is determined, and whether the graph topology data is divided or not can be judged based on the average degree; determining graph division strength corresponding to the graph topology data based on the average degree under the condition that the average degree meets the degree condition, so that division of an adjacent matrix corresponding to the graph topology data can be realized based on the graph division strength; determining the number of adjacent points of each vertex in the graph topology data based on the graph dividing force and an adjacent matrix corresponding to the graph topology data, and determining the ordering result of the vertices based on the number of adjacent points of each vertex, so that the cache in a computer can be utilized in the process of accessing the memory, the frequency of accessing the memory is reduced, and the calculation process of the graph neural network is accelerated; and sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix of the graph topology data based on the dynamic neighbor partitioning table determined by the sequencing result to obtain a respective sub-aggregation result of each sub-feature, and then carrying out integration treatment on each sub-aggregation result to obtain an aggregation result corresponding to the graph topology data, so that the data locality feature can be better utilized, the cache hit rate is increased, the time for accessing the memory of a computer is reduced, the calculation process of the graph neural network is further accelerated, and the training process of the graph neural network is promoted.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.

FIG. 1 is an application environment diagram of a neural network generation method of FIG. 1 in one embodiment;

FIG. 2 is a flow diagram of a method of generating the neural network of FIG. 2, in one embodiment;

FIG. 3 is a diagram of a topology data structure of the diagram in one embodiment;

FIG. 4 is a schematic diagram of feature aggregation in one embodiment;

FIG. 5 is a flow chart of a method of generating the neural network of FIG. 5 in another embodiment;

FIG. 6 is a vertex rescheduling strategy diagram in one embodiment;

FIG. 7 is a dynamic neighbor partition policy diagram in one embodiment;

FIG. 8 is a data stream balancing policy diagram in one embodiment;

FIG. 9 is a diagram of a dimension partitioning policy in one embodiment;

FIG. 10 is an overview of a method of generating the neural network of FIG. 10, in one embodiment;

FIG. 11 is a block diagram illustrating an apparatus for generating a neural network in accordance with one embodiment;

fig. 12 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The method for generating the graph neural network provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. In the training process of the graph neural network, the server 104 obtains the average degree of the graph topology data through the graph neural network, and determines the adjacency matrix corresponding to the graph topology data. Wherein the graph topology data includes a plurality of interconnected vertices. The server 104 determines the map division strength corresponding to the map topology data based on the average degree when the average degree satisfies the degree condition. The server 104 sorts the vertices based on the number of adjacent points of each vertex determined by dividing the adjacent matrix by the graph dividing force, and obtains a sorting result of each vertex. Based on the dynamic neighbor division table determined by the sequencing result, the server 104 respectively and sequentially performs aggregation processing on the multiple sub-features of the graph topology data and the adjacent matrix to obtain respective sub-aggregation results of each sub-feature. The server 104 integrates the sub-aggregation results, outputs an aggregation result corresponding to the graph topology data, and generates a trained graph neural network based on the aggregation result. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In an exemplary embodiment, as shown in fig. 2, a graph neural network generation method is provided, and an example of application of the method to the server in fig. 1 is described, which includes the following steps 202 to 210.

Wherein:

Step 202, in the training process of the graph neural network, obtaining the average degree of the graph topology data through the graph neural network, and determining the adjacency matrix corresponding to the graph topology data. The graph topology data includes a plurality of interconnected vertices.

The graph topology data is topology data composed of a plurality of vertices and a plurality of edges connecting the vertices. For example, as shown in fig. 3, there are 4 vertices, vertex 1 is connected to vertex 2 and vertex 4, vertex 2 is connected to vertex 3 and vertex 1, vertex 3 is connected to vertex 2 and vertex 4, vertex 4 is connected to vertex 1 and vertex 3, and each vertex in the figure and the edge connecting the vertices form the topology data of the figure.

The average degree refers to the ratio between the number of vertices and the number of edges. For example, the number of vertices is 4, the number of edges is 4, and then the average degree is 1. The adjacency matrix is a matrix representing the adjacency relationship between vertices of the graph topology data. For example, vertex 1 and vertex 2 are connected to each other, then the data at positions (1, 2) and (2, 1) in the matrix is 1, and vertex 1 and vertex 4 are not connected to each other, then the data at positions (1, 4) and (4, 1) in the matrix is 0.

Optionally, the server inputs the graph topology data into the graph neural network in the training process of the graph neural network to obtain the vertex number and the edge data in the graph topology data. And the server obtains the average degree of the topological data of the graph according to the ratio between the number of the vertexes and the number of the edges. The server determines an adjacency matrix representing the adjacency relationship between vertices of the graph topology data according to the relationship of whether the vertices are connected or not.

And 204, determining graph dividing force corresponding to the graph topology data based on the average degree when the average degree meets the degree condition.

Wherein satisfying the degree condition means that the average degree cannot be lower than a specific value. For example, an average degree above 32 satisfies the degree condition for the average degree. The graph division strength refers to a strength for dividing an adjacency matrix representing an adjacency relationship between vertices of topology data. For example, if the graph dividing strength is 3 and the adjacency matrix is 9 columns, the number of divided adjacency matrices is 3 for one graph dividing strength.

The graph partitioning strength is determined based on the ratio between the average degree and the parallel strength of the dimensions. For example, the average degree is 128, the dimension parallel degree is 32, and then the graph division degree is 4. The degree of dimension parallelism refers to the dimension of dimension division and parallel execution, which can be defined manually, or can be determined by performing model analysis and feature analysis on graph topology data by using a graph neural network and then according to the model analysis result and feature dimension information.

Optionally, the server determines, according to a specific value of the average degree, whether the average degree is greater than a minimum threshold indicated in the degree condition, and determines, based on a ratio between the average degree and the parallel degree, a graph division degree for dividing the adjacency matrix if the average degree is greater than the minimum threshold indicated in the degree condition.

Step 206, sorting the vertices based on the number of adjacent points of each vertex determined by dividing the adjacent matrix according to the graph dividing force, so as to obtain a sorting result of each vertex.

The adjacent matrix is divided according to the graph division strength, and a plurality of fragment sub-graphs can be obtained. The partitioning does not cut off the connection of the two vertices, but rather aims to divide the computation volume equally into different GPU threads for computation during the computation process.

Adjacent points refer to points directly connected to the vertex. For example, the vertex 1 and the vertex 2 are connected to each other, and the vertex 2 and the vertex 3 are connected to each other, so that the vertex 1 has the vertex 2, the vertex 2 has the vertex 1 and the vertex 3, and the vertex 3 has the vertex 2.

The number of adjacent points refers to the number of points that interconnect with the vertex. For example, in the above example, the number of adjacent points of the vertex 1 and the vertex 3 is 1, and the number of adjacent points of the vertex 2 is 2. The number of adjacent points is obtained by traversing each of the segment sub-graphs. Ordering refers to ordering the vertices from large to small or from small to large according to the number of neighbor points.

Optionally, the server divides the adjacent matrix according to the graph division strength to obtain a plurality of fragment subgraphs. The server then traverses each segment sub-graph to obtain the number of adjacent points of each vertex. And the server sorts the vertexes according to the number of adjacent points of the vertexes to obtain a sorting result of the vertexes.

And step 208, based on the dynamic neighbor division table determined by the sequencing result, sequentially performing aggregation processing on the multiple sub-features of the graph topology data and the adjacent matrix respectively to obtain respective sub-aggregation results of each sub-feature.

Wherein the adjacency matrix is stored in a dynamic neighbor partition table in a sparse matrix storage format. The dynamic neighbor partitioning table determined by the sequencing result is to sequence the neighbor groups of each vertex according to the vertex sequence obtained by the sequencing result, and record the sequenced information (the vertex, offset and size to which each neighbor group belongs) to the dynamic neighbor partitioning table. The offset refers to the starting position of the current neighbor set of the vertex in memory. The size refers to the neighbor set size. The neighbor set is composed of a plurality of neighbor points of the vertex.

The plurality of sub-features of the map topology data are obtained by performing feature classification on the map topology data by using feature information obtained by performing feature analysis on the map topology data.

Aggregation refers to that each sub-feature is sequentially subjected to matrix aggregation processing with an adjacent matrix. For example, as shown in fig. 4, when the adjacency matrix is a, and the feature X is divided into the sub-feature X ₁ and the sub-feature X ₂, a×x ₁＝Z₁ is calculated first, and a×x ₂＝Z₂ is calculated again when all vertices complete the aggregation operation, so that aggregation of each sub-feature and the adjacency matrix is achieved, and Z ₁ and Z ₂ are sub-aggregation results. The method has the advantages that when aggregation of multiple sub-features is performed simultaneously, the problem occurs that the cache hit rate in the calculation process is reduced, so that the number of times of accessing the memory is increased, the calculation time is also increased, X is divided into X ₁ and X ₂, and then the calculation is performed sequentially, so that the data locality principle is utilized well, and the cache hit rate is increased.

Optionally, the server sorts the neighbor groups of the vertexes according to the vertex sequence obtained by the sorting result, and records the sorted information (the vertexes, the offset and the size to which the neighbor groups belong) to the dynamic neighbor division table. And the server respectively and sequentially aggregates the adjacent matrixes stored in the dynamic neighbor partitioning table in a storage format of the sparse matrix with a plurality of sub-features of the graph topology data to obtain sub-aggregation results of the sub-features.

And 210, integrating the sub-aggregation results, outputting an aggregation result corresponding to the graph topology data, and generating a trained graph neural network based on the aggregation result.

Wherein integration is understood to be the concatenation of individual sub-aggregation results. And when the aggregation result is output, the graph neural network accelerates the completion of work. The trained graph neural network can be used for vertex classification and link prediction.

Optionally, the server is connected with sub-aggregation results corresponding to the sub-features to obtain aggregation results corresponding to the topological data of the graph. And the server obtains the trained neural network based on the aggregation result.

In the graph neural network generation method, in the training process of the graph neural network, the average degree of the graph topology data is obtained through the graph neural network, and the adjacency matrix corresponding to the graph topology data is determined, so that whether the graph topology data is divided or not can be judged based on the average degree; determining graph division strength corresponding to the graph topology data based on the average degree under the condition that the average degree meets the degree condition, so that division of an adjacent matrix corresponding to the graph topology data can be realized based on the graph division strength; determining the number of adjacent points of each vertex in the graph topology data based on the graph dividing force and an adjacent matrix corresponding to the graph topology data, and determining the ordering result of the vertices based on the number of adjacent points of each vertex, so that the cache in a computer can be utilized in the process of accessing the memory, the frequency of accessing the memory is reduced, and the calculation process of the graph neural network is accelerated; and sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix of the graph topology data based on the dynamic neighbor partitioning table determined by the sequencing result to obtain a respective sub-aggregation result of each sub-feature, and then carrying out integration treatment on each sub-aggregation result to obtain an aggregation result corresponding to the graph topology data, so that the data locality feature can be better utilized, the cache hit rate is increased, the time for accessing the memory of a computer is reduced, the calculation process of the graph neural network is further accelerated, and the training process of the graph neural network is promoted.

In an exemplary embodiment, as shown in FIG. 5, step 202 includes steps 502 through 504.

Wherein:

Step 502, performing graph topology analysis on graph topology data by using a graph neural network to obtain the vertex number and the edge number corresponding to the graph topology data.

Wherein the number of vertices refers to the total number of vertices in the graph topology data. The number of edges refers to the number of edges connecting two vertices in the graph topology data. For example, as shown in fig. 3, the number of vertices is 4, and the number of edges is also 4. The graph neural network also performs model analysis, feature analysis, parameter selection and other processes on the graph topology data.

Optionally, the server performs graph topology analysis on the graph topology data by using the graph neural network to obtain a result of the graph topology analysis. And the service determines the vertex number corresponding to the graph topology data and the edge number connecting the two vertices according to the graph topology analysis result.

Step 504, determining an average degree of the graph topology data based on the number of vertices and the number of edges.

Wherein the average degree is the result of the ratio between the number of edges of the number of vertices.

Optionally, after obtaining the number of vertices and the number of edges of the graph topology data, the server determines an average degree of the graph topology data according to a ratio result between the number of vertices and the number of edges.

In this embodiment, the graph neural network is used to perform graph topology analysis on the graph topology data, so that the number of vertices and the number of edges corresponding to the graph topology data can be obtained, and the average degree of the graph topology data can be determined through the ratio between the number of vertices and the number of edges, so that whether to perform graph division processing on the graph topology data can be determined based on the average data.

In an exemplary embodiment, sorting the vertices based on the number of neighboring points of the vertices determined by dividing the neighboring matrix by the graph division strength, to obtain a sorting result of the vertices, includes:

Based on the graph dividing force, dividing the adjacent matrix to obtain a plurality of sub-graphs. Traversing each sub-graph to obtain the number of adjacent points of each vertex in each sub-graph. And according to the number of adjacent points of each vertex in each sub-graph, sequencing the vertices to obtain a sequencing result of the vertices.

The number of the sub-images is the same as the dividing force of the image, and the column width of the sub-images is the ratio result between the column width of the adjacent matrix and the dividing force of the image. For example, the graph dividing strength is 4, the column width of the adjacent matrix is 12 columns, then the number of the sub-graphs of the divided adjacent matrix is also 4, the column width of each sub-graph of the divided sub-graph is 12/4=3, and the division is that the adjacent matrix of the graph topology data is divided into a series of stripe matrices with uniform widths.

The traversing process is to firstly establish an information table for recording the adjacent point of each vertex in which sub-graph, then execute a cycle with a step length of s_width (column width of sub-graph), sequentially traverse each sub-graph, and sequentially record the adjacent point of each vertex in the sub-graph traversed currently into the information table. After traversing each sub-graph, the server also imports the information table into a dynamic two-dimensional workload management module to respectively perform vertex rearrangement, dynamic neighbor division and data stream balance processing.

The vertex rearrangement refers to ranking the vertices from large to small according to the number of adjacent points of each vertex, obtaining a rearranged ranking result, and recording the obtained ranking result in a rearrangement information table. As shown in fig. 6, the vertices are rearranged according to the ascending order of the number of adjacent points, and therefore, the specifications of adjacent tiles are arranged from maximum to minimum in the order of the vertices. In addition to the data stream balancing, during GPU computation of the warped data block, computation is performed in reverse order, i.e., the computation starts with the nth neighbor group data block, followed by the nth-1 neighbor group data block, and finally ends with the first neighbor group data block. The vertex reordering strategy counteracts the potential memory locality disruption of the data stream balancing strategy. In addition, since the reverse order access pattern of neighbor group data blocks is consistent with the basic requirements of the task scheduling policy, efficient load balancing results.

Optionally, the server uses the graph dividing strength as the number of the partitioned sub-graphs after the adjacent matrix is divided, and divides the adjacent matrix according to the column width of the adjacent matrix and the graph dividing strength to obtain the partitioned sub-graphs with the same number as the graph dividing strength. The server establishes an information table again for recording the adjacent point of each vertex in which sub-graph, then executes a cycle with the step length of s_width, traverses each sub-graph in turn, and records the adjacent point of each vertex in the current traversed sub-graph into the information table in turn. After traversing each piece of sub-graph, the server imports the information table into a dynamic two-dimensional workload management module to sort the vertexes from large to small according to the number of adjacent points of each vertex, obtain a rearranged sorting result, and record the obtained sorting result in the rearranged information table.

In this embodiment, by dividing the adjacency matrix of the graph topology data into a plurality of fragment subgraphs, the calculation amount can be divided into different GPU threads for calculation in the calculation process, so that the calculation time is reduced. By sorting the vertexes, the data can be stored by using the cache in the process of accessing the memory of the computer, so that the number of times of accessing the memory is reduced, and the calculation process of the graph neural network is further accelerated.

In an exemplary embodiment, based on a dynamic neighbor partition table determined by the sorting result, the aggregation processing is sequentially performed on the multiple sub-features of the graph topology data and the adjacent matrix respectively, so as to obtain respective sub-aggregation results of each sub-feature, before the aggregation processing includes:

The neighbor set size of each vertex is determined based on the adjacency matrix, the average degree, and the degree of each vertex. And dividing the neighbor group of each vertex based on the neighbor group scale of each vertex to obtain the neighbor group of each vertex.

Where the neighbor set size refers to the number of adjacency points that are present in each set of neighbor sets. For example, a group of 5 neighbors is set up, then the neighbor group size is 5. The degree of a vertex refers to the number of all edges that the vertex joins.

Based on the adjacency matrix, the average degree and the degree of each vertex, the formula for respectively determining the neighbor group scale of each vertex is as follows:

Where ngs (v) represents the neighbor set size of vertex v, deg (v) represents the degree size of vertex v, max_deg is the maximum degree of vertices in all vertices, avg_deg is the average degree of all vertices, K is a user-selected parameter, K ε (1, 2).

The neighbor group division is also a dynamic neighbor division strategy, namely, circulation traversal is carried out on each vertex, when the current vertex is traversed, the number s_size of neighbor group scale division of the current vertex is determined, then traversal operation is carried out on adjacent points of the vertex, every time the adjacent points of the number s_size are traversed, the adjacent points of the number s_size are integrated and combined into a neighbor group, neighbor group information is recorded in a neighbor group information table, and next adjacent points of the number s_size are traversed in sequence until the adjacent points of the current vertex are completely accessed. When the neighbor access is complete, the number of currently traversed adjacent points cur_size is smaller than s_size, the adjacent points of the current cur_size number are used as a neighbor group, and related information is recorded. More specifically, when vertices having a large number of neighbor points are processed, more neighbor points are divided into the same neighbor group. Conversely, when vertices with a small number of neighbors are processed, fewer neighbors are divided into the same neighbor group. For example, as shown in fig. 7, in the case where there are fewer adjacent points of the vertex E, fewer adjacent points are grouped into the same twisted block (e.g., E1), the vertex E is divided into two twisted data blocks, each of which contains a single adjacent point, and on the other hand, although the vertex F has more adjacent points, it is divided into two neighbor groups, each of which contains five neighbors, and each of the vertices A, B, C and D in fig. 7 is the case of the neighbor group divided based on the formula of the neighbor group scale.

The dynamic neighbor partitioning strategy has two main advantages, firstly, memory read operations can be reduced to reduce the time overhead of the aggregation operation. Second, dynamic neighbor partitioning avoids extreme load imbalances due to neighbor set size limitations, thereby greatly reducing the delay overhead resulting therefrom.

Optionally, the server is based on the adjacency matrix, the average degree, and the degree of each vertex by the following formula:

The number of adjacency points in each set of neighbor groups for each vertex is calculated. And the server performs cyclic traversal on each vertex, when traversing to the current vertex, determines the number s_size of neighbor group scale division of the current vertex, then performs traversal operation on the adjacent points of the vertex, and each time the adjacent points of the number s_size are traversed, the adjacent points of the number s_size are summarized and combined into a neighbor group, the neighbor group information is recorded in a neighbor group information table, and next adjacent points of the number s_size are traversed in sequence until the adjacent points of the current vertex are completely accessed. When the neighbor access is complete, the number of currently traversed adjacent points cur_size is smaller than s_size, the adjacent points of the current cur_size are used as a neighbor group, and related information is recorded, so that the neighbor group of each vertex is divided, and the neighbor group of each vertex is obtained.

In this embodiment, the neighbor set scale of each vertex is determined based on the adjacency matrix, the average degree and the degree of each vertex, and the neighbor set is divided for each vertex based on the neighbor set scale of each vertex, so as to obtain the neighbor set of each vertex, so that the number of times of reading the memory can be reduced, the time cost of aggregation operation can be reduced, and meanwhile, the load imbalance caused by the limit of the size of the neighbor set can be avoided by dynamic neighbor division, so that the delay cost is reduced.

In one exemplary embodiment, after obtaining the neighbor set of each vertex, it includes:

And according to the sequencing result of each vertex in the topological data of the graph, sequencing each vertex after dividing the neighbor group, and acquiring the neighbor number and the offset of each vertex after sequencing. And obtaining a dynamic neighbor partition table based on the number of neighbors of each vertex and the offset.

The step of sequencing each vertex after dividing the neighbor set as a part of the data stream balancing strategy refers to sequencing the neighbor sets of different vertices according to the sequencing result in the rearrangement information table. The number of neighbors refers to the number of adjacent points of each vertex, and the offset refers to the starting position of the neighbor set of the vertex in the memory. And recording the neighbor number and the offset of each vertex in the created blank table to obtain a dynamic neighbor partition table, and sending the obtained dynamic neighbor partition table into the GPU by the server to provide necessary training information.

When the GPU is used to process the neighbor set, the data stream balancing policy sequentially allocates a larger neighbor set (i.e., a neighbor set with a larger number of adjacency points) to each idle SM (STREAMING MULTIPROCESSOR, computing unit) core of the GPU, and then allocates a smaller neighbor set to each SM in a similar manner. The data flow balancing strategy can solve the load imbalance problem introduced by large distorted data blocks during computation by exploiting the balancing effect of subsequent computation involving smaller neighbor groups. For example, the GPU has only three SM cores, and assuming that the same amount of data is processed within the same allocation time under ideal conditions, in the described scenario, as shown in fig. 8, each neighbor group flows to each SM core of the GPU in turn according to the number of included neighbor points, and is processed by each SM core individually in turn, the height of the stack of neighbor point data blocks within each SM core can be understood as the current computation time, and the maximum time it takes for each SM core to process data individually corresponds to the total data processing time in view of the parallel nature of the three SM cores.

Optionally, the server ranks the neighbor sets of different vertices according to the ranking result in the rearrangement information table. And the server records the neighbor number and the offset of each vertex into the created blank table to obtain a dynamic neighbor partition table. The server also sends the obtained dynamic neighbor partition table to the GPU to provide necessary training information.

In the embodiment, the computer load distribution performance can be improved remarkably by applying a data flow balance strategy.

In an exemplary embodiment, based on a dynamic neighbor partition table determined by the sorting result, the aggregation processing is sequentially performed on the multiple sub-features of the graph topology data and the adjacent matrix respectively, so as to obtain respective sub-aggregation results of each sub-feature, including:

and obtaining characteristic information obtained by characteristic analysis of the topological data of the graph. And carrying out feature division on the topological data of the graph based on the feature information to obtain a plurality of sub-features. And according to the ordering information in the dynamic neighbor partitioning table, sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix to obtain sub-aggregation results respectively corresponding to each sub-feature.

The dividing refers to dividing the features of the topological data of the graph according to the acquired feature information, and the whole processes of dimension dividing and parallel execution are performed in the GPU kernel. For example, the dimension of 128-byte data multiple is divided once, that is, the size of each feature element of the feature information is 4-byte floating point data, the feature dimension is divided into multiples of 32, and if the size of each feature element of the feature information is 1-byte floating point data, the feature dimension is divided into multiples of 128. As shown in fig. 4, feature X is divided into X ₁ and X ₂ in the figure, where X ₁ and X ₂ can perform aggregation independently, and perform aggregation with the adjacent matrix a to obtain sub-aggregation results Z ₁ and Z ₂, respectively, and sub-aggregation results Z ₁ and Z ₂ can be finally combined into an aggregation result Z, where when X is divided into X ₁ and X ₂, a×x ₁＝Z₁ is calculated first, and a×x ₂＝Z₂ is calculated again when all vertices complete the aggregation. Further, the cache data update process is described as shown in fig. 9. The first row in FIG. 9 depicts the cache data dynamics when the dimension partitioning policy is not used, the second and third rows use the cache data dynamics when the dimension partitioning policy, X ₂₁、X₂₃、X₄₁、X₄₃, etc. represent memory read operations. In a scenario where no dimension partitioning policy is used, two memory read operations may occur. In contrast, in the scenario where the dimension partitioning strategy is used, the number of memory accesses is reduced by half compared to when not in use. Therefore, the dimension division enhances the data locality, thereby greatly reducing the memory read operation and saving the read time.

Optionally, the server divides the features of the graph topology data according to the acquired feature information to obtain a plurality of sub-features. And the server sequentially carries out aggregation treatment on each sub-feature and the adjacent matrix respectively to obtain a corresponding sub-aggregation result of each sub-feature.

In this embodiment, by dividing the features of the graph topology data into a plurality of features and sequentially performing aggregation calculation, the data locality principle can be well utilized, and the cache hit rate can be increased, so that the time for accessing the memory is reduced, and the whole calculation process is further accelerated, so that the trained graph neural network can be obtained quickly.

The application also provides an application scene, which applies the method for generating the graph neural network. Specifically, the application of the graph neural network generation method in the application scene is as follows:

And in the training process of the graph neural network, the server inputs the graph topology data into the graph neural network to obtain the vertex number and the edge data in the graph topology data. And the server obtains the average degree of the topological data of the graph according to the ratio between the number of the vertexes and the number of the edges. The server determines an adjacency matrix representing the adjacency relationship between vertices of the graph topology data according to the relationship of whether the vertices are connected or not. And the server judges whether the average degree is larger than the minimum threshold indicated in the degree condition according to the specific value of the average degree, and determines the graph dividing force for dividing the adjacent matrix based on the ratio between the average degree and the parallel force of the dimension under the condition that the average degree is larger than the minimum threshold indicated in the degree condition.

The server takes the graph dividing force as the number of the partitioned sub-graphs of the adjacent matrix, and divides the adjacent matrix according to the column width of the adjacent matrix and the graph dividing force to obtain the partitioned sub-graphs which are the same in number as the graph dividing force. The server establishes an information table again for recording the adjacent point of each vertex in which sub-graph, then executes a cycle with the step length of s_width, traverses each sub-graph in turn, and records the adjacent point of each vertex in the current traversed sub-graph into the information table in turn. After traversing each piece of sub-graph, the server imports the information table into a dynamic two-dimensional workload management module to sort the vertexes from large to small according to the number of adjacent points of each vertex, obtain a rearranged sorting result, and record the obtained sorting result in the rearranged information table.

And the server sorts the neighbor groups of different vertexes according to the sorting result in the rearrangement information table. And the server records the neighbor number and the offset of each vertex into the created blank table to obtain a dynamic neighbor partition table. The server also sends the obtained dynamic neighbor partition table to the GPU to provide necessary training information.

The server based on the adjacency matrix, the average degree, and the degree of each vertex, passes the following formula:

And the server divides the characteristics of the topological data of the graph according to the acquired characteristic information to obtain a plurality of sub-characteristics. And the server sequentially carries out aggregation treatment on each sub-feature and the adjacent matrix respectively to obtain a corresponding sub-aggregation result of each sub-feature. And the server is connected with sub-aggregation results corresponding to the sub-features to obtain aggregation results corresponding to the topological data of the graph. And the server obtains the trained neural network based on the aggregation result. The outline diagram of the graph neural network generation method is shown in fig. 10, firstly, graph topology data, GPU information and graph neural network information are loaded and extracted, then, a decision maker in the graph neural network is used for carrying out model analysis, graph topology analysis, parameter selection and other processes on the graph topology data, after graph division processing is carried out on the graph topology data, vertex rearrangement, dynamic neighbor division and data flow balancing processing are carried out on each vertex, and finally, feature dimension division and parallel execution operation are carried out in a GPU kernel function, so that the hit rate of an L2 cache of the GPU is optimized.

In the above embodiment, in the training process of the graph neural network, the average degree of the graph topology data is obtained through the graph neural network, and the adjacency matrix corresponding to the graph topology data is determined, so that whether to perform graph division on the graph topology data can be judged based on the average degree; determining graph division strength corresponding to the graph topology data based on the average degree under the condition that the average degree meets the degree condition, so that division of an adjacent matrix corresponding to the graph topology data can be realized based on the graph division strength; determining the number of adjacent points of each vertex in the graph topology data based on the graph dividing force and an adjacent matrix corresponding to the graph topology data, and determining the ordering result of the vertices based on the number of adjacent points of each vertex, so that the cache in a computer can be utilized in the process of accessing the memory, the frequency of accessing the memory is reduced, and the calculation process of the graph neural network is accelerated; and sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix of the graph topology data based on the dynamic neighbor partitioning table determined by the sequencing result to obtain a respective sub-aggregation result of each sub-feature, and then carrying out integration treatment on each sub-aggregation result to obtain an aggregation result corresponding to the graph topology data, so that the data locality feature can be better utilized, the cache hit rate is increased, the time for accessing the memory of a computer is reduced, the calculation process of the graph neural network is further accelerated, and the training process of the graph neural network is promoted.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a graph neural network generation device for realizing the graph neural network generation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the device for generating a graph neural network provided below may be referred to the limitation of the method for generating a graph neural network hereinabove, and will not be described herein.

In an exemplary embodiment, as shown in fig. 11, there is provided a graph neural network generation apparatus including:

The data obtaining module 1102 is configured to obtain, through the graph neural network, an average degree of graph topology data during a training process of the graph neural network, and determine an adjacency matrix corresponding to the graph topology data. The graph topology data includes a plurality of interconnected vertices.

The graph division strength determining module 1104 is configured to determine a graph division strength corresponding to the graph topology data based on the average degree if the average degree satisfies the degree condition.

The ranking module 1106 is configured to rank the vertices based on the number of neighboring points of the vertices determined by dividing the neighboring matrix according to the graph division strength, so as to obtain a ranking result of the vertices.

And an aggregation module 1108, configured to sequentially aggregate the multiple sub-features of the graph topology data with the adjacency matrix based on the dynamic neighbor partition table determined by the ordering result, to obtain respective sub-aggregation results of each sub-feature.

And an integration module 1110, configured to integrate the sub-aggregation results, output an aggregation result corresponding to the graph topology data, and generate a trained graph neural network based on the aggregation result.

In one embodiment, the data acquisition module is further configured to: and performing graph topology analysis on the graph topology data by using a graph neural network to obtain the vertex number and the edge number corresponding to the graph topology data. Based on the number of vertices and the number of edges, an average degree of the graph topology data is determined.

In one embodiment, the ranking module is further configured to: based on the graph dividing force, dividing the adjacent matrix to obtain a plurality of sub-graphs. Traversing each sub-graph to obtain the number of adjacent points of each vertex in each sub-graph. And according to the number of adjacent points of each vertex in each sub-graph, sequencing the vertices to obtain a sequencing result of the vertices.

In one embodiment, the aggregation module is further configured to: the neighbor set size of each vertex is determined based on the adjacency matrix, the average degree, and the degree of each vertex. And dividing the neighbor group of each vertex based on the neighbor group scale of each vertex to obtain the neighbor group of each vertex.

In one embodiment, the aggregation module is further configured to: and according to the sequencing result of each vertex in the topological data of the graph, sequencing each vertex after dividing the neighbor group, and acquiring the neighbor number and the offset of each vertex after sequencing. And obtaining a dynamic neighbor partition table based on the number of neighbors of each vertex and the offset.

In one embodiment, the aggregation module is further configured to: and obtaining characteristic information obtained by characteristic analysis of the topological data of the graph. And carrying out feature division on the topological data of the graph based on the feature information to obtain a plurality of sub-features. And according to the ordering information in the dynamic neighbor partitioning table, sequentially carrying out aggregation treatment on each sub-feature and the adjacent matrix to obtain sub-aggregation results respectively corresponding to each sub-feature.

The respective modules in the above-described graph neural network generation apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one exemplary embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing graph topology data, average degrees, adjacency matrix, graph division strength, adjacent point number, sequencing result, dynamic neighbor division table, sub-aggregation result and aggregation result. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of generating a graph neural network.

It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile memory may include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high density embedded nonvolatile memory, resistive random access memory (ReRAM), magneto-resistive random access memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric memory (Ferroelectric Random Access Memory, FRAM), phase change memory (PHASE CHANGE memory, PCM), graphene memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for generating a graph neural network, the method comprising:

2. The method of claim 1, wherein the obtaining, by the graph neural network, the average degree of graph topology data comprises:

3. The method of claim 1, wherein the ranking the vertices based on the number of adjacent points for each vertex determined by dividing the adjacency matrix by the graph division strength comprises:

4. The method according to claim 1, wherein the step of sequentially aggregating the plurality of sub-features of the graph topology data with the adjacency matrix based on the dynamic neighbor partitioning table determined by the ranking result, before obtaining the respective sub-aggregation result of each sub-feature, comprises:

5. The method of claim 4, wherein after obtaining the neighbor set for each of the vertices, comprising:

6. The method according to claim 1, wherein the sequentially aggregating the plurality of sub-features of the graph topology data with the adjacency matrix based on the dynamic neighbor partitioning table determined by the ordering result to obtain a respective sub-aggregation result of each sub-feature, includes:

7. A graph neural network generation apparatus, the apparatus comprising:

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.