CN112668700B

CN112668700B - Width graph convolution network model system based on grouping attention and training method

Info

Publication number: CN112668700B
Application number: CN202011610968.8A
Authority: CN
Inventors: 刘勋; 宗建华; 夏国清; 张义宽; 赵鹏
Original assignee: South China Institute Of Software Engineering Gu
Current assignee: South China Institute Of Software Engineering Gu
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2023-11-28
Anticipated expiration: 2040-12-30
Also published as: CN112668700A

Abstract

The application provides a breadth-graph convolutional network model based on grouping attention and a training method thereof, wherein the model sequentially comprises an input layer, a higher-order graph convolutional layer which simultaneously captures multi-order neighborhood information, a self-connection and grouping attention mechanism-introduced higher-order graph convolutional layer, an SP information fusion pooling layer which fuses the multi-order neighborhood information and a softmax function multi-classification output layer, and the training method corresponding to the model is a high-efficiency training method which is used for inputting the preprocessing characteristics into the model for training after the preprocessing characteristics are obtained. In the embodiment, the grouping attention Gao Jietu convolution layer is adopted to increase the width of the model, reduce the depth of the model and the quantity and complexity of parameters, gather more abundant node information and give higher weight to own nodes, and a simple grouping attention mechanism is adopted to adjust the classification contribution degree of different neighborhood nodes in combination with the attention score, so that the learning capacity, the stability, the effectiveness and the classification precision of the model are further improved while widening the receptive field of the model and avoiding the risk of fitting.

Description

Width graph convolution network model system based on grouping attention and training method

Technical Field

The application relates to the technical field of image processing and deep learning, in particular to a breadth-chart convolutional network model based on grouping attention and a training method thereof.

Background

With the continuous deepening of the machine learning research of the graph rolling neural network, a higher-order graph rolling network model and a higher-order graph rolling network model which can gather more node information, widen a model receptive field and improve classification expression are sequentially proposed by different researchers. The classification of the existing high-order graph rolling network model and the high-order graph rolling network model is up to the expectations of researchers to a certain extent, but the classification still has the defects: the method comprises the steps that a higher-order graph convolution network model is designed, a first-order graph convolution is designed to a P-order graph convolution, different weight parameters are used for different-order convolutions, the relation between higher-order nodes is learned by two higher-order graph convolution layers, higher-order graph convolutions gathering information of neighborhood nodes with different distances are utilized, after the neighborhood information with different distances is gathered by each higher-order graph convolution, the neighborhood information is spliced by utilizing column connection, and finally, a network architecture of the neighborhood information is fused through a full-connection layer, and as the technology of adopting different weight parameters of different-order convolutions, stacking a plurality of higher-order graph convolution layers and full-connection fusion of the neighborhood information with different distances is necessary, the complexity is increased, the parameter quantity is increased in multiple, and the overfitting risk is increased; although the higher-lower order graph convolution network model adopts a weight sharing mechanism to a certain extent, the number of parameters is smaller, and a plurality of layers of higher-lower order graph convolution layers are stacked, so that the number of parameters and complexity of the model are not obviously reduced, and the risk of overfitting cannot be avoided. In addition, the importance of the neighborhood nodes with different distances to the classification prediction is not distinguished by the high-order graph rolling network model and the high-order graph rolling network model, the contribution degree of the neighborhood nodes with different distances is considered to be the same importance, and the contribution degree has a certain deviation from the real information, so that the classification prediction effect can be influenced to a certain extent.

Therefore, how to distinguish importance of different distance neighborhood nodes to classification prediction based on the research of the existing higher-order graph rolling network and the higher-order graph rolling network is very significant in that the construction and application of the model are closer to reality and more effective on the basis of guaranteeing classification performance of the existing model, reducing calculation complexity and parameter quantity and avoiding overfitting risks.

Disclosure of Invention

The application aims to reduce the calculation complexity and parameter quantity of the existing higher-order graph rolling network and the existing higher-order graph rolling network, avoid the risk of overfitting, and meanwhile, distinguish the classification contribution degree of different neighborhood nodes based on the principle that the classification and contribution degree of adjacent neighborhood nodes are similar and the classification and contribution degree of non-adjacent neighborhood nodes are different, so that the construction and application of a model are more practical and more effective, and the classification performance is further improved.

In order to achieve the above object, it is necessary to provide a breadth-graph convolutional network model based on packet attention and a training method thereof.

In a first aspect, an embodiment of the present application provides a breadth-graph convolutional network model based on packet attention, where the breadth-graph convolutional network model sequentially includes an input layer, a packet attention Gao Jietu convolutional layer, an information fusion pooling layer, and an output layer;

the input layer is used for receiving the graph characteristics of the training data set;

the group attention high-order graph convolution layer is used for carrying out zero-order to k-order group attention graph convolution operation according to the graph characteristics to obtain graph convolution data;

the information fusion pooling layer is used for carrying out zero-order to k-order feature fusion according to the picture volume product data to obtain fusion data;

and the output layer is used for outputting a model result according to the fusion data.

Further, the packet attention Gao Jietu convolution layer is generated by:

grouping the different order graph products;

the intra-group graph volume uses an attention mechanism for attention fusion, and the inter-group graph volume uses an attention score for adjusting the weight.

Further, a new self-join is introduced at either order of the graph convolution of the packet attention Gao Jietu convolution layer.

Further, the packet attention Gao Jietu convolution layer includes a zero-order graph convolution to a k-order graph convolution based on weight sharing, expressed as:

or (b)

Wherein X is the input matrix of the graph, +.>Is a parameter matrix,/->Is the regularized adjacency matrix of the graph, k is the highest order of the graph convolution, +.>，/>Fusion function for simple attention mechanism>The attention scores are convolved for the corresponding different sets of graphs.

Further, the saidOutput HGCN of output layer of breadth-graph rolling network model _SA Expressed as:

or (b)

Wherein,to activate the function +.>For information fusion function->The function is output for multiple classifications.

Further, the attention fusion formula of the simple attention mechanism fusion function SA is:

wherein,the output is fused for the attention of the ith order graph convolution and the i+1st order graph convolution.

Further, the saidThe activation function is a ReLU nonlinear activation function.

Further, the information fusion pooling layer adopts SP summation information fusion pooling, and the calculation formula is as follows:

or (b)

In a second aspect, an embodiment of the present application provides a training method for a breadth-chart convolutional network model based on packet attention, where the steps of the training method include:

acquiring the training data set, and acquiring graph characteristics of the training data set according to the type of the training data set, wherein the graph characteristics comprise an input matrix and a regularized adjacency matrix of a graph;

regularized adjacent matrixes of all the graphs with different orders adopt a method of intra-group attention fusion and inter-group weighting summation to obtain a preprocessing adjacent matrix, and the preprocessing adjacent matrix is integrated with an input matrix of the graph to obtain preprocessing characteristics;

inputting the preprocessing features into the width graph convolution network model, and performing feature training to obtain a training result.

Further, the step of inputting the preprocessing feature into the width graph convolution network model to perform feature training, and obtaining a training result includes:

randomly initializing a parameter matrix of the width graph convolution network model, and initializing the attention score to a specific value;

and inputting the preprocessing characteristics into the width map convolutional network model, adjusting the attention score according to learning rate optimization and combining the training data set attribute, and training by adopting a loss function and gradient descent method to obtain a converged parameter matrix.

The application provides a width map convolution network model based on grouping attention and a training method thereof, by which the width map convolution network model based on grouping attention, which is provided with an input layer, a grouping attention Gao Jietu convolution layer, an SP summation information fusion pooling layer and a softmax function output layer, is combined with a characteristic preprocessing method before training of the model, and the effect of accurate classification is obtained accordingly. Compared with the prior art, the model and the training method thereof not only gather richer node information among more-order neighbors by introducing higher weight given to self nodes by self connection and distinguish high-order graph convolution of classification contribution degrees of different neighbor nodes based on a grouping attention mechanism in practical classification application, well improve learning capacity and classification precision of the model, but also effectively reduce parameter quantity, reduce complexity and training difficulty of the model, avoid overfitting risks and increase stability by designing a layer of grouping attention Gao Jietu convolution and adopting a weight sharing mechanism among different-order graph convolution.

Drawings

FIG. 1 is a schematic diagram of an application scenario of a breadth-graph convolutional network model based on packet attention and a training method thereof in an embodiment of the application;

FIG. 2 is a schematic diagram of a breadth-diagram convolutional network model based on packet attention;

FIG. 3 is a schematic diagram of a packet attention based breadth-graph convolutional network model (k is an odd number) employing an SP information fusion pooling layer;

FIG. 4 is a flow chart diagram of the training method of the packet attention based breadth-graph convolutional network model of FIG. 3;

FIG. 5 is a schematic flow chart of step S13 in FIG. 4, wherein the preprocessing features are input into a breadth-chart convolutional network model based on packet attention for feature training;

fig. 6 is an internal structural view of a computer device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantageous effects of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples, and it is apparent that the examples described below are part of the examples of the present application, which are provided for illustration only and are not intended to limit the scope of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The breadth-graph convolutional neural network based on packet attention provided by the application isThe model and the training method thereof can be applied to a terminal or a server as shown in fig. 1. The terminal may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers and portable wearable devices, and the server may be implemented by a separate server or a server cluster formed by a plurality of servers. The server may employ a breadth-diagram convolutional network model (HGCN) based on packet attention _SA ) And finishing the classification tasks which can be finished by the existing high-order graph convolution neural network model and the high-order and low-order graph convolution neural network model by the corresponding training method, and sending the classification prediction result of the model to the terminal for the user of the terminal to check and analyze.

In one embodiment, as shown in FIG. 2, a packet attention based breadth-graph convolutional network model is provided that includes an input layer 1, a packet attention Gao Jietu convolutional layer 2, an information fusion pooling layer 3, and an output layer 4;

the input layer 1 is used for receiving the graph characteristics of the training data set;

the group attention Gao Jietu convolution layer 2 is used for carrying out zero-order to k-order group attention graph convolution operation according to the graph characteristics to obtain graph convolution data;

the information fusion pooling layer 3 is used for carrying out zero-order to k-order feature fusion according to the picture volume product data to obtain fusion data;

and the output layer 4 is used for outputting a model result according to the fusion data.

The number of convolution layers 2 and information fusion pooling layers 3 based on the grouping attention Gao Jietu is only 1, namely the structure of the width graph convolution network model based on the grouping attention is as follows: the input layer 1 is connected with the group attention Gao Jietu convolution layer 2, the group attention Gao Jietu convolution layer 2 is connected with the information fusion pooling layer 3, and the information fusion pooling layer 3 is connected with the output layer 4 which adopts a softmax function to carry out multi-class output.

The mechanism of attention stems from the study of human vision. In cognitive sciences, due to bottlenecks in information processing, humans tend to ignore portions of visible information as needed, focusing on only a specific portion of the information. In order to reasonably utilize limited visual information processing resources, a human needs to select a specific part in a visual area and then concentrate on the specific part, so that valuable information is screened out, and the visual attention mechanism greatly improves the efficiency of human processing visual information. The attention mechanism in deep learning refers to the thinking mode of the human visual attention mechanism, so that high-value information is rapidly screened from a large amount of data.

The group attention Gao Jietu convolution layer in this embodiment groups the different order convolution, the intra-group convolution uses the attention mechanism to perform attention fusion, the inter-group convolution uses the attention score to adjust the weight, and a new self-connection is introduced into any order convolution to obtain a new higher order convolution. The introduction of new self-connection means that the regularized adjacency matrix of any order of the graph is added with a unit matrix with the same dimension as the regularized adjacency matrix input model of the new graph. It should be noted that, considering that the element value of the diagonal position of the adjacent matrix where the self-connection does not exist (i.e. the self node and the self node have one edge) is 0, when the adjacent matrix is input into the model training, the model classification effect may be affected by neglecting the self node information, in this embodiment, the regularized adjacent matrix of the original graph contains the self-connection, and the self-connection emphasized by the present application refers to that the self-connection is reintroduced on the basis of the original self-connection regularized adjacent matrix, so as to further increase the weight of the self node, namely:

，

wherein,regularized adjacency matrix containing self-junctions for the ith order,>regularized adjacency matrix with self-connection for zero order,>is->Identity matrix of the same dimension. The effect when introducing new self-connections, such as in second order graph convolution, isOf course, the introduction of self-join at other orders of graph convolution is also an option, other orders being available and so on, and not described in detail herein. The principle of grouping the volumes of different orders introduced into the self-connection is that the categories of adjacent neighborhood nodes tend to be consistent and the contribution degree may be similar, the categories of non-adjacent neighborhood nodes are different and the contribution degree may be different, the even-order volumes except 0 order and the low-order volumes adjacent to the even-order volumes are integrated into a group to replace the even-order volumes, and then the grouping with the highest order k being even and the grouping with the odd number are respectively:，

。

after the grouping based on the attention mechanism is obtained, the SA simple attention mechanism is adopted to carry out attention fusion on different graph convolution in the group to obtain a new fusion graph convolution, and a series of attention scores are utilizedTo adjust the weights of the convolutions of different groups of graphs, the attention scores give more important groups of classification higher weights, and the less important groups of less weights adjust the contribution degree of the neighborhood nodes of different groups to the classification of the prediction targets, so that the group attention higher-order volumes with the highest order k being even and odd are respectively expressed as:

wherein, attention fusion of the simple attention mechanism fusion function SA selects accumulation pooling, and the formula is:

which is a kind ofThe output is fused for the attention of the ith order graph convolution and the i+1st order graph convolution.

The above group attention higher order graph convolution includes zero order graph convolution to k order graph convolution based on weight sharing, so that the parameter amount of the group attention higher order graph convolution is consistent with the parameter amount of the first order graph convolution, not only by introducing new self-connection at any order graph convolutionThe higher weight is given to the self node, and the higher-order neighborhood information of the node is captured through the first-order to k-order grouping graph rolling, and the contribution degree difference of the neighborhood nodes of different groups of graph rolling is distinguished. In order to make up for the learning capacity of a layer of architecture, in practical application, a larger k value, namely a larger width, can be selected according to requirements, namely, the receptive field of the model is increased by increasing the width (increasing the order) instead of increasing the depth (increasing the layer number), so that the learning capacity of the model is enhanced. Wherein the order k of the packet note Gao Jietu convolutional layer can be one of the second order and above, or a combination of any plural orders. Assume that the output of the output layer of the breadth-diagram convolutional network model based on packet attention is HGCN _SA Then it is expressed as:

or (b)

Where X is the input matrix of the graph,is a parameter matrix,/->Is the regularized adjacency matrix of the graph, k is the highest order of the graph convolution, +.>，/>Fusion function for simple attention mechanism>For the attention fraction convolved for the different sets of graphs, +.>To activate the function +.>For information fusion function->The function is output for multiple classifications.

When the highest order k=2 of the above graph convolution, namely, HGCN employing 0, 1 and 2 order neighborhood mixture _SA -2, the formula is as follows:

；

when k=3, i.e. HGCN employing 0, 1, 2 and 3 order neighborhood mixture _SA -3, the formula is as follows:

；

the model where k is a higher order can be obtained by referring to the above model and so on. In this embodiment, the same weight parameters are used in each order neighborhood of the same graph convolution layer to realize weight sharing and reduce the number of parameters, and the selection of the parameter W in the model formula is embodied.

According to the method, a network architecture with only one layer and a grouping attention mechanism and a self-connected higher-order graph convolution layer is built, so that the calculation amount of parameters and models is reduced, the model training efficiency is effectively improved, the classification effect of the models is improved by introducing the self-connection mode to increase the self-node information weight based on the consideration of the fact that the classification influence of self-node characteristics is larger and the classification contribution degree of different group graphs are different, the different-order graph convolution is firstly grouped, then the intra-group graph convolution is fused by adopting the attention mechanism, the contribution degree of different groups of neighborhood nodes to the prediction target classification is adjusted based on the attention score scoring principle of the more important graph convolution group with the less weight, and the purposes of realizing more practical and more effective model construction and application and further improving the classification precision are achieved.

When the method is practically applied to large-scale classification training, the method needs to calculateDue to->Usually a sparse matrix with m non-zero elements and the group note Gao Jietu convolutions all use a weight sharing mechanism, using right to left multiplication to calculate +.>For example, when k=2, use +.>Multiplying to get +.>Similarly, the->According toBy analogy +.>Left-hand k-1 order graph convolution to calculate k-order graph convolution, i.e. +.>The calculation method effectively reduces the calculation complexity. In addition, since the weight sharing mechanism is adopted by different order graph convolution, the parameter quantity of the group attention Gao Jietu convolution and the parameter quantity of the first order graph convolution are the same, and the high efficiency of the group attention high order graph convolution calculation is ensured to a certain extent.

Preferably, the method comprises the steps of,the activation function is +.>A nonlinear activation function.

Wherein, the ReLU activation function is used for nonlinear processing. Since the expression of the linear model is insufficient and some data features are not necessarily linearly separable, in order to solve the problem, an activation function is adopted to perform nonlinear processing after the information fusion pooling layer, and the commonly used nonlinear activation functions include sigmoid, tanh, reLU, elU, PReLU, etc., where the activation functions can be used, but each function has advantages, and in this embodiment, the most ReLU functions used in the neural network are adopted, and the formula is defined as follows:

that is, values equal to or greater than 0 are retained, and all other values less than 0 are directly rewritten to 0. By the method, the values in the feature map generated after convolution are mapped, so that uncorrelated data can be directly discarded during feature extraction, and the operation is more convenient.

The nonlinear activation function can improve the expression capability of the model, but is not very useful for the task of classifying the graph, especially for the width graph convolution network model with only one layer of grouping attention Gao Jietu convolution layer in the embodiment, the nonlinear activation process can be omitted, the complexity of model calculation is further reduced, but the accuracy can be lost a little, and the influence on the model classification accuracy is not great overall, so that in the practical application of the model, whether the nonlinear activation function needs to be used or not can be determined according to the specific situation of the classifying task, if the accuracy requirement is relatively high, the nonlinear activation function can be selected to be used, and if the model calculation complexity is reduced, the model performance is improved, and the nonlinear activation process can be omitted.

Preferably, the information fusion pooling layer adopts SP summation information fusion pooling to fuse node information of different-order neighborhoods from zero order to k order, and a calculation formula when k is even is as follows:

the calculation formula when k is an odd number is:

corresponding grouping attention Gao Jietu convolution adopting SP information fusion can gather more and richer neighborhood information to obtain global graph structure information, meanwhile, the factors that own nodes are more important in classification prediction and contribution degree difference of different groups of neighborhood nodes to prediction target classification are considered, and model expressions when k is even are as follows:

as shown in fig. 3, the model expression when k is an odd number is:

，

where H is the output value of the packet attention Gao Jietu convolution layer, i.e., the input value of the softmax function output layer of the model.

The information fusion method in the above embodiment is described with a specific third-order embodiment, and the case of the higher order is similar. Let k=3, let zero-order neighborhood beThe first order neighborhood is->The second order neighborhood is->The third order neighborhood is->，/>For corresponding->The attention score of the graph convolution group, the SP summation information fusion process is:

is provided with，/>，/>，/>，Then

The implementation process of the packet attention high-order graph volume integration algorithm adopting SP information fusion in the embodiment is as follows:

input:

convolution operation:

and (3) information fusion:

nonlinear activation:

in the embodiment, the graph network is firstly input into the grouping attention high-order graph convolution to carry out the algorithm processing, then the SP summation information fusion pooling layer is used for mixing the zero-order to high-order characteristics of different neighborhoods, and the method of obtaining the classification probability result by inputting the softmax function output layer after nonlinear activation can keep more and richer characteristic information in the learning process to learn the global graph topology, meanwhile, the more important function of own nodes in the prediction process is considered, the classification contribution degree difference of different groups of neighborhood nodes is distinguished, and the effectiveness and the learning effect of the model are improved well.

In one embodiment, as shown in fig. 4, the steps of any one of the training methods of the breadth-diagram convolutional network model based on packet attention include:

s11, acquiring the training data set, and obtaining graph characteristics of the training data set according to the type of the training data set, wherein the graph characteristics comprise an input matrix and a regularized adjacency matrix of a graph;

the training data set is selected according to actual classification requirements, such as a text classification data set, a semi-supervised classification data set, a multi-view classification 3D data set and the like. The data sets of each classification task are different in content and type, and the corresponding methods for obtaining the graph feature matrix (namely the input matrix of the graph of the model) and the regularized adjacency matrix after preprocessing are also different, for example, when text classification is required, corpus data comprising documents and titles are required to be processed to construct a corresponding corpus text graph network, and the input matrix of the graph used for model training and the regularized adjacency matrix of the graph are obtained according to the corpus text graph network. For other data sets, such as semi-supervised data sets or multi-view classification data sets, which have corresponding types of preprocessing methods, when the model in the example is used for classification, the data set corresponding to the task is converted into an input matrix of the graph and a regularized adjacency matrix of the graph only according to a conventional method corresponding to the classification task type. In the following embodiments of the present application, the semi-supervised data set as shown in table 1 is taken as an example for relevant description.

Table 1 semi-supervised classification classical dataset information table

S12, regularized adjacent matrixes of all the graphs with different orders adopt a method of intra-group attention fusion and inter-group weighting summation to obtain a preprocessed adjacent matrix, and the preprocessed adjacent matrix is integrated with an input matrix of the graph to obtain preprocessing characteristics;

the application is constructed by only one layer of grouping attention Gao Jietu convolution layer, no multi-layer diagram high-order convolution layer, before model training, the grouping of the diagram convolutions from zero order to k order can be combined with the diagram convolutions SA attention fusion in the group, and the characteristics are preprocessed by a method of attention score weighting among the groups to obtain a preprocessing adjacent matrix, and the corresponding SP summation information fusion calculation formulas are respectively optimized as follows:

when k is an even number, the number of the n-type units,

when k is an odd number, the number of the elements,

due to regularization of adjacency matrix prior to model training、/>(i.e.)>) And->Are known, then easily obtainable by matrix multiplication>Then the matrix number is multiplied and added to get the sum of the matrix number and the sum of the matrix number is easy to get the sum of the matrix number and the sum of the matrix number>Visible->Is obtained by element-by-element operation, the spatial position and +.>Is identical, is an operator which retains the graph topology, i.e. will +.>The adjacency matrix, which is used as a pre-processed graph, is used for subsequent model training. Obtaining a pretreatment adjacent matrix by the pretreatment>Since the input matrix X is known, the linear transformation can be directly applied to the obtained +.>As a pretreatment characteristic matrix, the model is sent to training to a certain extentThe complexity and difficulty of machine training are reduced, and therefore the high efficiency of model training is guaranteed.

S13, inputting the preprocessing features into the width graph convolution network model, and performing feature training to obtain a training result.

As shown in fig. 5, the step S13 of inputting the preprocessing feature into the breadth-and-volume network model to perform feature training, and obtaining a training result includes:

s131, randomly initializing a parameter matrix of the width graph rolling network model, and initializing the attention score to a specific value;

the method for randomly initializing the model parameter matrix comprises the following steps: gaussian initialization with weights following Gaussian distribution, xavier initialization with weights being uniformly distributed, and MSRA initialization with mean 0 and variance 2/n. When the parameter matrix of the breadth-graph convolutional network model based on the grouping attention is randomly initialized, the three initialization characteristics can be combined according to the actual classification requirements to select, and the application effect of the model is not affected. It should be noted that the initial values of the attention scores of the models are all set to be 1, and the attention scores are adjusted by combining the training data set attributes according to learning rate optimization in the subsequent training process, the maximum graph convolution orders corresponding to different data sets are different, and the attention scores of the graph convolutions of all orders are also different. In this embodiment, after determining the maximum convolution orders corresponding to the Pubmed, cora and Citeser data sets, the attention multipliers are adjusted on the models of the corresponding orders according to the classification accuracy in the training process of the different data sets.

S132, inputting the preprocessing features into the width map convolutional network model, adjusting the attention score according to learning rate optimization and combining the training data set attribute, and training by adopting a loss function and gradient descent method to obtain a converged parameter matrix.

The process of training the breadth-graph convolutional network model based on the packet attention comprises the following steps: (1) Preprocessing special obtained by preprocessing effective characteristic data in selected training data setThe method comprises the steps of firstly, characterizing and inputting models with different orders, carrying out forward propagation by using an initialized attention score, an initialized parameter matrix and a maximum learning rate to obtain a classification result, selecting the model with the highest classification precision as a reference model for subsequent training of the training data set, wherein the maximum orders of the convolutional network model based on the width map of group attention are respectively 21, 8 and 4 on the Pubmed, cora and Citeser data set as shown in table 2; (2) After determining a width value (highest order) of a breadth-graph convolutional network model based on grouping attention, sequentially adjusting attention score values of different order neighborhood nodes according to a principle that a lower order neighborhood node is more important than a higher order node, training the adjusted attention score input model, obtaining a classification result through forward propagation, calculating cross entropy through a loss function, training by using a gradient descent algorithm of a backward propagation update parameter matrix until convergence, obtaining a parameter matrix when the current attention score is converged, and recording corresponding classification precision; (3) Repeating the operation of the step 2, continuously adjusting the attention score to train until a parameter matrix with higher classification precision is obtained, and using the parameter matrix as a convergent parameter matrix under the corresponding attention score of the model for subsequent classification test, wherein the HGCN is based on the maximum orders of Pubmed, cora and Citeser data sets _SA The attention scores corresponding to the models are shown in table 2.

TABLE 2 HGCN _SA Test accuracy comparison table based on Pubmed, cora and Citeseer data sets

Table 2 illustrates: in the table k is the maximum order of the graph convolution, the accuracy of the model is expressed in percent, and the number is the average of 10 runs.

In this embodiment, during model training, according to the characteristics of the training data set, the loss function selected is:

，

m is the number of classes, which is the set of labeled vertices (nodes), and +.>The real label representing the label node is displayed,and represents a predicted probability value between 0 and 1 for softmax (input label node). After initializing the parameter matrix, obtaining an initial loss function value based on all training data sets, if the loss function is larger, indicating that the neural network performance is not good, and using a gradient descent method, namely, continuously adjusting and updating the weight parameter to train again by calculating the partial derivative of the loss function with respect to the neural network model parameter until the loss function is reduced to an acceptable range, and ending the whole training process of the model, thereby achieving the converged parameter matrix.

In the embodiment of the application, a width graph convolution network model based on grouping attention Gao Jietu convolution and a model training method for characteristic preprocessing thereof are designed, the grouping attention width graph convolution is adopted to replace the depth graph convolution, the complexity of the model, the parameter quantity of the model and the training difficulty are reduced under the condition of no multi-layer graph convolution layer, the interaction relation among multi-neighborhood nodes can be learned, the neighborhood information importance of the self nodes and the contribution degree difference for distinguishing different groups of neighborhood nodes are highlighted, and the characteristic preprocessing method during model training is combined, so that the construction and the application of the model are more practical, the receptive field is widened, the training efficiency and the classification precision are improved, the overfitting risk is avoided, and the stability of the model is also improved.

In the embodiment of the application, the model is respectively classified and trained based on the semi-supervised classification data set, and is respectively compared with the test effect of the conventional graph convolution nerve model, and the result is shown in the following table 3:

TABLE 3 HGCN _SA Based on the same semi-supervised data set as the existing graph convolution modelTest accuracy alignment meter

Table 3 illustrates: the accuracy in the table is expressed as a percentage and the number is the average of 10 runs.

Based on the above experimental results table 3, this embodiment provides a breadth-graph convolution network model HGCN based on packet attention, which has only one layer of information fusion layer for aggregating different-order neighborhood nodes simultaneously, and considering the important role of classification prediction of own nodes, and distinguishing the classification contribution degree of different groups of neighborhood nodes, and mixing the zero-order to high-order characteristics of different neighbors _SA The method has the advantages that more and richer neighborhood characteristic information is reserved in classification learning, global graph topology is learned, receptive fields are widened, an existing high-order graph convolution model is simplified, complexity of the model is reduced, parameter quantity and training difficulty of the model are reduced, model training efficiency is further improved, overfitting risks of the model are avoided, the self node weight is increased through introducing new self-connection, the contribution degree difference of different groups of neighborhood nodes is distinguished by introducing a grouping attention mechanism into different-order graph convolution, model expression capability and learning capability are further improved, and experimental results based on three groups of reference semi-supervision classification data sets indicate that compared with the existing other baseline methods, the method has great advantages in classification precision, parameter quantity, complexity, stability and the like in the aspect of applying the width graph convolution network model based on grouping attention.

In order to determine the importance of the self node to the classification prediction and the significance of the packet attention mechanism, the application is also based on two sets of comparison experiments of whether the self connection is introduced under the condition that other conditions are kept unchanged and whether the packet attention mechanism is introduced under the condition that other conditions are kept unchanged, wherein the research results show that the HGCN of the self connection and the packet attention mechanism are simultaneously introduced as shown in tables 4-5 _SA The stability and classification accuracy of the model are best.

Table 4 has noSelf-connecting HGCN _SA With HGCN _SA Classification accuracy comparison based on semi-supervised data set

Table 4 illustrates: the accuracy in the table is expressed in percent and the number is the average of 10 runs, based on Citeseer, pubmed and Cora datasets with self-attached HGCN _SA Classification accuracy of model is compared with no self-connecting HGCN _SA The classification accuracy of the model is respectively improved by 1.7%, 1.1% and 0.7%.

Table 5 HGCN without packet attention mechanism _SA With HGCN _SA Classification accuracy comparison based on semi-supervised data set

Table 5 illustrates: the accuracy in the table is expressed as a percentage and the number is the average of 10 runs.

In practical application of the above embodiment of the present application, a reasonable choice can be made between a grouping attention mechanism and self-connection of a model according to practical requirements, if only the grouping attention mechanism is to be introduced, that is, the method of fusing intra-group graph convolution by an SA simple attention mechanism and setting attention scores to adjust classification contribution degrees of different groups of neighborhood nodes is used to distinguish the classification contribution degrees of different orders of neighborhood nodes, and when self-connection is not required to be introduced to further increase the weight of self-connection, the self-connection in the above embodiments of the present application can be introducedThe part is removed, and only other technical schemes and implementation of other parts are reserved, and are not repeated here.

Although the steps in the flowcharts described above are shown in order as indicated by arrows, these steps are not necessarily executed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the sub-steps or stages of other steps or other steps.

Fig. 6 shows an internal structural diagram of a computer device, which may be a terminal or a server in particular, in one embodiment. As shown in fig. 6, the computer device includes a processor, a memory, a network interface, a display, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a training method for a breadth-diagram convolutional network model based on packet attention. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those of ordinary skill in the art that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer devices to which the present inventive arrangements may be applied, and that a particular computing device may include more or fewer components than shown, or may combine some of the components, or have the same arrangement of components.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the computer program to perform the steps of the above-described training method based on a breadth-diagram convolutional network model of packet attention.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon which, when executed by a processor, implements the steps of the training method of the packet attention based breadth-graph convolutional network model described above.

In summary, the embodiment of the application provides a breadth-graph convolutional network model based on grouping attention and a training method thereof, which are based on the basis of fully considering the problems of excessive parameters, high complexity, low training efficiency, overfitting risk, indistinguishable contribution of different distance neighborhood nodes to the classification of a predicted target and the like of the existing high-order graph convolutional neural network model and the high-low order graph convolutional neural network model, and provides a breadth-graph convolutional network model which comprises a high-order graph convolutional layer capable of capturing multi-order neighborhood information, increasing the weight of the self node and introducing a grouping attention mechanism, an SP information fusion layer mixed with different order neighborhood characteristics and a softmax classification output layer, and a high-efficiency model training method which is corresponding to the model and is used for performing characteristic preprocessing and training. When the model and the training method thereof are applied to actual classification test, the higher-order graph convolution layer based on grouping attention is adopted to increase the width of the model, reduce the depth of the model and reduce the quantity of parameters, meanwhile, multi-order neighborhood information can be gathered simultaneously, higher weight is given to self nodes, and a simple grouping attention mechanism is adopted to adjust the classification contribution degree of different neighborhood nodes in combination with attention score, so that the model receptive field is widened, the model fitting risk is avoided, the model is constructed and applied more practically, and the learning capacity, stability, effectiveness and classification precision of the model are further improved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above.

In this specification, each embodiment is described in a progressive manner, and all the embodiments are directly the same or similar parts referring to each other, and each embodiment mainly describes differences from other embodiments. It should be noted that, any combination of the technical features of the foregoing embodiments may be used, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few preferred embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and substitutions should also be considered to be within the scope of the present application. Therefore, the protection scope of the patent of the application is subject to the protection scope of the claims.

Claims

1. A breadth-diagram convolutional network model system based on packet attention, the system comprising a breadth-diagram convolutional network model based on packet attention; the breadth-graph convolution network model sequentially comprises an input layer, a grouping attention Gao Jietu convolution layer, an information fusion pooling layer and an output layer;

the output layer is used for outputting a model result according to the fusion data;

wherein the packet note Gao Jietu convolution layer is generated by:

grouping the different order graph products;

the intra-group graph volume adopts an attention mechanism to carry out attention fusion, and the inter-group graph volume adopts an attention score to adjust the weight;

the packet note Gao Jietu convolution layer includes a zero-order graph convolution to k-order graph convolution based on weight sharing, expressed as:

or (b)

Wherein X is the input matrix of the graph, +.>Is a parameter matrix,/->Is the regularized adjacency matrix of the graph, k is the highest order of the graph convolution,，/>fusion function for simple attention mechanism>Convolving the attention score for the corresponding different set of graphs;

output HGCN of output layer of the breadth-graph convolutional network model _SA Expressed as:

or (b)

2. The packet attention based breadth-scale convolution network model system of claim 1, wherein a new self-join is introduced at any order of the packet attention Gao Jietu convolution layer.

3. The packet attention based breadth-view rolling network model system of claim 1, wherein the attention fusion formula of the simple attention mechanism fusion function is:

4. The packet attention based breadth-diagram rolling network model system of claim 1, wherein the activation function is a ReLU nonlinear activation function.

5. The packet attention based breadth-wise rolling network model system of claim 1, wherein the information fusion pooling layer employs SP summation information fusion pooling, the calculation formula of which is as follows:

or (b)

6. A method of training a packet attention-based breadth-graph convolutional network model in a packet attention-based breadth-graph convolutional network model system as recited in any one of claims 1-5, wherein the method comprises the steps of:

7. The method for training a packet-aware breadth-graph convolutional network model of claim 6, wherein said inputting the preprocessed features into the breadth-graph convolutional network model for feature training, the step of obtaining training results comprises:

and inputting the preprocessing characteristics into the width graph convolution network model, adjusting the attention scores of the convolutions of different groups of graphs according to learning rate optimization and combining the training data set attributes, and training by adopting a loss function and gradient descent method to obtain a converged parameter matrix.