CN114330672A

CN114330672A - Multi-information aggregated graph residual generation model, classification method, electronic device and storage medium

Info

Publication number: CN114330672A
Application number: CN202210011116.XA
Authority: CN
Inventors: 贾晓芬; 冯铸; 郭永存; 赵佰亭; 黄友锐; 马天兵
Original assignee: Anhui University of Science and Technology
Current assignee: Anhui University of Science and Technology
Priority date: 2022-01-05
Filing date: 2022-01-05
Publication date: 2022-04-12
Anticipated expiration: 2042-01-05
Also published as: CN114330672B

Abstract

The invention discloses a multi-information aggregated graph residual error generation model, a classification method, electronic equipment and a storage medium, wherein the graph residual error generation model comprises a depth initial residual error graph convolution module, a random graph generation module and a supervision loss module; the deep initial residual error map convolution module is used for deeply extracting characteristic information contained in the nodes in the original topological graph, classifying the classes of the nodes in the original topological graph and outputting a prediction label; the random graph generation module is used for simulating a noise graph of an original topological graph, and obtaining relationship information between nodes as a parallel structure of the depth initial residual graph convolution module; and the supervision loss module is used for adding an additional loss function generated by the random map generation module to the depth initial residual map convolution module and jointly constraining model training. The invention avoids the problem of over-smoothness while ensuring the structure locality of the topological graph, fully learns the characteristic information of the topological graph, improves the model performance and the model classification effect and has high operation speed.

Description

Multi-information aggregated graph residual generation model, classification method, electronic device and storage medium

Technical Field

The invention belongs to the technical field of topological graph node classification, and relates to a graph residual error generation model for multi-information aggregation, a classification method, electronic equipment and a storage medium.

Background

Data of many actual scenes in the real world, such as social networks, citation networks, protein molecular structures and the like, are non-Euclidean structures, and for a certain node in the non-Euclidean data, due to the number of neighbor nodes and uncertainty of edge relations with the neighbor nodes, the traditional neural network has difficulty in extracting features from the non-Euclidean data.

The problem is solved well by the advent of graph convolution networks, which learn new node representations from node characteristics and graph topology information to classify nodes, but the network achieves optimal performance in two-step neighborhoods without getting rid of the constraints of shallow structures. On one hand, the shallow layer structure only considers neighbors, ignores the global property and cannot acquire effective information from far neighbors. When the graph convolution network is deepened, the feature vectors of the nodes are converged to the same value due to multiple transmissions, and the nodes of different types are difficult to distinguish, so that the performance of the model is greatly reduced, and the problem of over-smoothness is caused; on the other hand, the too shallow network structure makes the receptive field not large enough, and is difficult to obtain the dependency relationship between the nodes and the distant nodes and the global information of the graph structure.

In recent years, many efforts have emerged to optimize graph-convolution networks. For example, KLICPERA et al adds the probability of turning back to the root node in the random walk process, and retains the local information and locality of the node through a personalized propagation mechanism, but the high computation cost makes it not have universality, which is detailed in "KLICPERA J; the number of the BOJCHEVSKI A,

S.Predict then propagate:Graph neural networks meet personalized pagerank[J]arXiv preprint, arXiv:1810.05997,2018 ". Liu et alNeighborhood information is extracted by a graph convolution network, a community structure is maintained through modular constraint, and finally neighborhood and community information are integrated in the graph convolution network to learn node representation, but the effect of semi-supervised tasks is poor, and the results are detailed in' LIU Y, WANG Q, WANG X, et al]Pattern Recognition Letters,2020,138:462-468 ". Furthermore, conventional machine learning methods typically treat the data samples as independent and the graph structure as a fixed quantity, disregarding the graph structure itself from noisy data or modeling assumptions, and disregarding the relationship information contained between the data samples and the graph structure. Qu et al model data and tags with conditional random fields to learn dependencies between data and approximate the posterior distribution of object tags using Graph neural networks, but are limited to Graph convolutional networks whose constraints do not make good use of Graph structures, node features, and relationships to known tags, as detailed in "QU M, BENGIO Y, TANG J. Gmnn: Graph markovneural networks [ C]//International Conference on Machine Learning.LongBeach:ICML,2019:5241-5250”。

The inventor finds that the existing graph convolution network classification method based on deep learning has poor classification effect, and the graph convolution network classification method has the following defects: (1) the problem of over-smoothness of the graph convolution network exists, when the network is too deep, graph convolution operation can cause each node to be filled with a large amount of redundant information, and node characteristics lose diversity and locality; (2) only structural information and node characteristic information in the topological graph are utilized, and the mutual relation among the characteristics, the graph and the labels is not fully utilized, so that the characteristic information learned by the model is limited, and the classification effect is poor.

Therefore, the method based on the graph convolution network is limited to the problem that graph convolution is prone to being over-smooth, different types of nodes are difficult to distinguish through simple overlay convolution operation, the performance of the model is greatly reduced, and the shallow graph convolution network is difficult to extract deeper feature information in the topological graph so as to improve the node classification effect of the topological graph. And the classification performance of the graph convolution network can be effectively improved by relieving or solving the over-smooth problem and utilizing the relevant information among the graph structure, the node characteristics and the labels.

Disclosure of Invention

In order to solve the problems, the invention provides a multi-information aggregated graph residual error generation model, which avoids the problem of over-smoothness while ensuring the structural locality of a topological graph, fully learns the characteristic information of the topological graph, improves the performance of the model and the classification effect of the model, and has high operation speed.

The second purpose of the invention is to provide a method for classifying the topological graph nodes by a graph residual error generation model for multi-information aggregation.

A third object of the present invention is to provide an electronic apparatus.

It is a fourth object of the present invention to provide a computer storage medium.

The technical scheme adopted by the invention is that a graph residual error generation model for multi-information aggregation comprises

The depth initial residual error map convolution module is used for extracting characteristic information contained in the nodes in the original topological map in a deep layer, classifying the classes of the nodes in the original topological map and outputting a prediction label;

the random graph generating module is used for simulating a noise graph of an original topological graph, serving as a parallel structure of the depth initial residual graph convolution module, providing modeling assumption for the depth initial residual graph convolution module and acquiring relationship information between nodes;

and the supervision loss module is used for adding an additional loss function generated by the random map generation module to the depth initial residual map convolution module and jointly constraining model training.

Furthermore, the depth initial residual error image convolution module is formed by sequentially connecting a linear input module, a multilayer initial image convolution residual error module and a linear output module in series;

the linear input module ignores 30% -60% of neurons in original characteristic input through a dropout unit according to a set probability, performs linear transformation on the original characteristic input processed by the dropout unit in a full-connection mode through the linear input unit, reduces dimensionality, performs nonlinear mapping through a nonlinear activation function, and outputs an initial residual error item to a first layer of initial residual error graph convolution module;

after 30% -60% of neurons in the initial residual error items are ignored by the first layer of initial residual error graph convolution module through a dropout unit, the residual error items are sent to a graph convolution unit, a feature representation is generated through convolution of a symmetrical normalized Laplace matrix on the residual error items, the initial residual error items and the feature representation output by the current graph convolution unit are in jumping connection according to a set proportion, information fusion is carried out, feature output with the shape and the size consistent with those of the initial residual error items is obtained, the feature output is sent to a nonlinear activation unit for nonlinear mapping, the result is used as the input of the next initial residual error graph convolution module, the operation is repeated, and the output of the Nth layer of initial residual error graph convolution module is sent to a linear output module;

the linear output module discards neurons randomly through a dropout unit, obtains characteristic output through a full-connection mode after dimensionality reduction processing of the linear output unit, classifies nodes of the topological graph, and finally normalizes the characteristic output through a softmax normalization unit to output a prediction tag matrix.

Further, the random graph generation module comprises a two-layer sensor module, a graph edge sampling module and a negative edge sampling module;

the two-layer perceptron module consists of a linear input module and a linear output module, wherein the linear input module is used for carrying out linear transformation and classification division on characteristic input, and the linear output module is used as the input of the graph edge sampling module and the negative edge sampling module for carrying out edge random sampling;

the graph edge sampling module is used for sampling edges between node pairs according to the probability of generating edges between all nodes on the graph through the relationship between the nodes, and the probability only depends on the edges between the nodes;

the negative edge sampling module is used for randomly sampling edges among nodes on the graph which do not exist according to the set negative edge probability;

and the edge combination nodes sampled by the graph edge sampling module and the negative edge sampling module construct a noise graph.

Further, the loss function is composed of a depth initial residual error map volume module loss and a random map generation module loss;

the loss of the depth initial residual error map convolution module is the inconsistency degree obtained by calculating the log-likelihood cost function of the prediction label output by the depth initial residual error map convolution module and the sample real label;

and the random image generation module loss is composed of the loss of the two layers of sensor modules and the loss formed by the joint distribution of the image structure, the characteristic information and the label, and is used for adding disturbance to the training process of the depth initial residual image convolution module and correcting the parameters of the depth initial residual image convolution module.

A method for classifying topological graph nodes by a multi-information aggregated graph residual error generation model is carried out according to the following steps:

step S1, the node information on the original topological graph forms original characteristic input, and the original characteristic input is respectively input into the depth initial residual error graph convolution module and the random graph generation module;

step S2, extracting the characteristics of the original characteristic input through a depth initial residual error graph convolution module, classifying the types of the nodes in the original topological graph, and outputting the prediction labels of the nodes with known types;

step S3, extracting the features of the original feature input through a random image generation module, constructing a noise image, and adding disturbance to the training process of the depth initial residual image convolution module;

s4, training a constraint model through a supervision loss module;

and step S5, when the model achieves the best effect, reasoning the unknown class nodes and outputting the prediction labels of the unknown class nodes, namely classifying the nodes on the topological graph.

Further, the output of the initial residual map convolution module of the k-th layer is calculated by the following formula;

wherein, X^(k-1)The input of the convolution module of the k-th layer initial residual map is shown, and when k is 1, the k is an initial residual term; alpha represents a proportional parameter of a residual error structure, and an initial residual error item and the feature representation output by the current graph convolution unit are added according to the proportional parameter alpha for information fusion; x^(k)Representing the output of the k-th layer initial residual map convolution module; dropout denotes a dropout cell, ReLU is a nonlinear activation cell,

representing a symmetric normalized Laplace matrix, X⁽⁰⁾Representing the output of the linear input module as the initial residual item of the multi-layer initial residual graph convolution module;

the symmetric normalized Laplace matrix

The convolution kernel, which is a graph convolution unit, is calculated according to the following formula:

d is D_jA constructed diagonal matrix; d_jRepresenting the degree of a node j in the original topological graph;

representing a degree matrix with self-loops; j ∈ {1, …, n } represents a node in the original topological graph structure; a is the adjacency matrix of the original topological graph structure,

representing an adjacency matrix with self-loops; i is_nIs an identity matrix;

is shown to pass through

To pair

Performing a normalization operation on the characteristic distribution of the input matrix X by

To pair

And carrying out normalization operation to ensure information transmission.

Further, in step S3, the method for constructing the noise map specifically includes:

s31, sending the original characteristic input X into a double-layer perceptron module of a random graph generation module, and carrying out linear transformation and classification on the original characteristic input X to obtain the dimension reduction expression of X;

s32, dimension splicing is carried out on the result of the linear transformation and the dimension reduction expression in the step S31 and a row index matrix and a column index matrix of the symmetrical normalized Laplace matrix respectively, and a random noise graph is constructed through the graph edge sampling probability and the negative edge sampling probability;

the graph edge sampling probability p_1θ(G|X,L_k) The calculation formula of (2) is as follows:

graph dimension reduction representation generated by graph edge sampling

The calculation formula of (2) is as follows:

the negative edge sampling probability p_2θ(G|X,L_k) The calculation formula of (2) is as follows:

negative edge sample generated graph dimension reduction representation

The calculation formula of (2) is as follows:

wherein L is_kE is L as a known label matrix, and L is a label matrix;

a row index matrix and a column index matrix which are symmetric normalized Laplace matrixes respectively; embed represents word embedding representing operation of the acquisition node, and Concat represents dimension splicing operation; n represents the number of nodes, λ is the negative side sampling rate,

is the collected negative side sample; graph edge sampling p_1θ(e_i,j|X,L_k) Is a random edge e between the node pairs i, j where the edge relation really exists in the graph structure_i,jProbability of sampling, negative edge sampling p_2θ(e_i,j|X,L_k) Is a random e in a node pair i, j where no edge relationship exists_i,jThe edge sampling probability.

Further, the loss function of the supervision loss module in the step S4 is as follows:

ζ＝μ₁ζ₁+μ₂(ζ₂-LogSigmod(p_θ(G|X,L_k)+p_θ(L_k|X,G)+p_θ(G|X,L_k,L_m)))；

where ζ represents a total loss function of a graph residual generation model for multi-information aggregation, ζ₁Zeta, the loss function representing the convolution module of the initial residual map of depth₂A loss function representing a dual layer perceptron module; logsigmoid (p)_θ(G|X,L_k))、p_θ(L_k|X,G)、p_θ(G|X,L_k,L_m) Respectively are graph structure and characteristic informationLoss function formed by label joint distribution;

ζ₂、logsigmoid(p_θ(G|X,L_k))、p_θ(L_k|X,G)、p_θ(G|X,L_k,L_m) Loss functions which jointly form a random graph generation module; mu.s₁、μ₂Respectively representing super parameters of loss functions of the depth initial residual error map convolution module and the random map generation module; l is_mE is an unknown label matrix, and LogSigmoid represents a nonlinear function.

An electronic device realizes classification of nodes of a topological graph by adopting the method.

A computer storage medium having stored therein at least one program instruction, the at least one program instruction being loaded and executed by a processor to implement the above-described method of classification of a topological graph node.

The invention has the beneficial effects that:

the invention effectively utilizes the information of graph structure, node characteristics and labels to construct a graph convolution network which can avoid the problem of over-smoothness; a deep initial residual error graph convolution structure is introduced, a plurality of same initial residual error graph convolution structures are used, initial residual error items are connected with feature representation residual errors output by a current graph convolution unit, feature information of nodes in a topological graph is extracted, partial information of the initial features is reserved for each layer of nodes, the problem of over-smoothness caused by overlarge aggregation radius or multiple transmission is avoided while the locality of the graph structure is guaranteed, and feature information can be fully learned.

According to the invention, the noise map of the original topological graph is randomly generated by the random map generation module, and a modeling hypothesis is provided for the convolution structure of the depth initial residual map, so that the model generates disturbance. In addition, a supervision loss module is adopted to supervise and correct model parameters of the label-free data posterior distribution errors, and the model training is jointly constrained by using an additional loss function generated by a random graph generation module, so that the network can acquire more internal characteristics of a graph structure, the model performance and the model classification capability are improved, and the problems of over-smoothness and poor classification effect of the conventional node classification method based on the graph convolution network are solved. In addition, the model designed by the invention has small parameter quantity, high running speed and strong generalization, and can be implemented more conveniently in practical application and product deployment; the application range is wide, and the application method is flexible and various.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a graph residual generation model for multi-information aggregation according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method for classifying nodes of a topological graph by using a graph residual generation model for multi-information aggregation according to an embodiment of the present invention.

Fig. 3a is a Cora data set node aggregation graph of the KIPF method.

FIG. 3b is

A Cora data set node aggregation graph of the method.

Fig. 3c is a Cora data set node aggregation graph according to an embodiment of the present invention.

Fig. 4a is a graph of Citeseer dataset node aggregation for KIPF method.

FIG. 4b is

The Citeser data set node aggregation graph of the method.

Fig. 4c is a graph of Citeseer dataset node aggregation in accordance with an embodiment of the present invention.

Fig. 5a is a Pubmed data set node aggregation graph of the KIPF method.

FIG. 5b is

The method comprises the step of aggregating a graph of the Pubmed data set nodes.

Fig. 5c is a Pubmed data set node aggregation graph according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the case of the example 1, the following examples are given,

a graph residual generation model for multi-information aggregation of node classification is structurally shown in FIG. 1 and comprises a depth initial residual graph convolution module, a random graph generation module and a supervision loss module.

The deep initial residual error map convolution module is used for extracting characteristic information contained in the nodes in the original topological map in a deep layer, classifying the classes of the nodes in the original topological map and outputting a prediction label.

And the random graph generation module is used for simulating a noise graph of the original topological graph, serving as a parallel structure of the depth initial residual graph convolution module, providing modeling assumption for the depth initial residual graph convolution module and extracting relationship information between nodes.

The node classification in the embodiment of the invention is to extract the characteristics of an original topological graph through a deep initial residual error graph convolution module to obtain a prediction label of an unknown label node, generate a noise graph through a random graph generation model by utilizing the randomness of graph edge sampling and negative edge sampling, add disturbance to the training process of the deep initial residual error graph convolution module, promote model learning through error supervision of a supervision loss module on the label-free data posterior distribution, improve the performance of the model, obtain a node classification model with better effect, classify the node of the unknown label when the model achieves the best effect, and finally output the prediction label of the unknown node, namely deduces the label approximation posterior distribution of the unknown node, wherein the label approximation posterior distribution is a group of vectors, and the prediction label is a result value converted according to the vectors; where the category is equivalent to the label.

The deep initial residual error map convolution module in the embodiment of the invention is formed by sequentially connecting a linear input module, a multilayer initial residual error map convolution module and a linear output module in series, characteristics are input into the linear input module, characteristic extraction is carried out by the multilayer initial residual error map convolution module, and characteristic output is obtained by the linear output module to be used as a prediction tag.

The linear input module is used for carrying out dimensionality reduction processing on the characteristic input and carrying out linear transformation on the characteristic input in a full-connection mode; the multilayer initial residual error graph convolution module is used for extracting the characteristic information contained in the nodes in the original topological graph to obtain the characteristic representation of the graph structure; and the linear output module is used for carrying out dimensionality reduction processing on the feature output and realizing classification of the feature output in a full-connection mode. The feature representation of each initial residual map convolution module is connected to the output residual of that module via the linear input module.

The original topological graph such as a social network, a quotation network and the like is complex in abstraction, and useful node features are difficult to extract by only utilizing a linear network. The shallow graph convolution network can extract effective low-level feature information in the topological graph through simple training, and achieves a good node classification function, but complex abstract node feature information such as far-adjacent node dependency relationship and graph structure global information can be fully learned through the deep network, so that abstract understanding expression of the topological graph information is optimized, and the graph convolution network is difficult to further deepen due to the problem of over-smoothness. In order to solve the problem, a residual error structure is introduced to deepen a network structure, so that a model can learn more complicated high-level features. The embodiment of the invention adopts a linear input module to carry out dimension reduction processing on the characteristic input, and collects the node characteristics; the multilayer initial residual error graph convolution module is used for extracting feature information contained in the nodes in the original topological graph structure, and each layer of nodes keeps partial information of the initial features by using the initial residual error structure, so that the locality of the graph structure is ensured; the linear output module realizes classification of feature output in a full-connection mode.

In some embodiments, the linear input module is composed of a dropout unit, a linear input unit and a nonlinear activation unit, the neuron inactivation probability of the dropout unit is 0.3-0.6, and 30% -60% of neurons in the original characteristic input are ignored according to a set probability; the input dimension and the output dimension of the linear input unit are respectively a second dimension of characteristic input and a hidden layer dimension, and the hidden layer dimension is generally set to be 16, 64 or 256; the nonlinear activation function of the nonlinear activation unit adopts a ReLU function. Firstly, inputting characteristics through a dropout unit, neglecting a certain number of neurons according to the neuron inactivation probability conforming to Bernoulli distribution, and at the moment, the network only needs to train nodes which are not inactivated in an original topological graph, so that the phenomenon of model overfitting is relieved to a certain extent; then, a linear input unit is used for carrying out dimension reduction processing on the feature input in a linear transformation mode, so that the number of features is reduced, and the number of model parameters is reduced; and because the expression capability of the linear module is not enough, the nonlinear activation unit is used for carrying out nonlinear mapping on the transformed characteristic input and transmitting the nonlinear characteristic input to the multilayer initial residual error map convolution module. The activation function is a factor for adding nonlinearity to a linear module, and generally selects Sigmoid, Tanh and ReLU functions, wherein the Sigmoid function is used for hiding the output of a layer, so that the output can be mapped between (0 and 1), a nerve sensitive area is positioned at a place with a larger slope, slopes on two sides are gentle, the situation that the gradient disappears is easy to occur in the process of propagation and derivation, and the function does not take 0 as a center, so that the weight updating is slow, and the influence of positive and negative samples on the network and the gradient is difficult to see; the Tanh function is a hyperbolic tangent function, is similar to a Sigmoid function curve, and also has the defects that slopes at two ends are gentle, and gradients are easy to disappear, but the function takes 0 as a center, and an output interval is (-1,1), so that weight convergence can be accelerated; the ReLU function is a linear function, and compared with the first two functions, the phenomenon of gradient saturation does not exist, and the function alleviates the problem of gradient disappearance to a certain extent. The nonlinear activation unit of the embodiment of the invention adopts an activation function which is a ReLU function and is mainly used for providing a nonlinear effect on the output of the linear input unit and enhancing the expression capability of the model. The linear input module reduces the dimension of the features of the original topological graph into a small feature matrix through a linear layer, wherein the feature matrix comprises feature outlines of nodes and an overall structure of the original topological graph, and a feature extraction structure is required to further extract the features of the output of the unit.

The multilayer initial residual error map convolution module is composed of N layers of initial residual error map convolution modules which are connected in sequence, N is generally 16 or 64 according to the characteristics of nodes and topological graphs, and the structure of each initial residual error map convolution module is the same. The initial residual error graph convolution module comprises a dropout unit, a graph convolution unit, an initial residual error structure and a nonlinear activation unit, and the output of the linear input module is used as an initial residual error item; after 30% -60% (adjusted and optimized according to network characteristics) of neurons in the initial residual error items are ignored by the first layer of initial residual error graph convolution module through the dropout unit, the residual error items are sent to the first graph convolution unit, a feature representation is generated by convolution of the residual error items through a symmetrical normalized Laplace matrix, the initial residual error items and the feature representation output by the current graph convolution unit are subjected to jumping connection according to a set proportion to perform information fusion, and feature output with the shape and the size consistent with the initial residual error items is obtained; and sending the characteristic output to a nonlinear activation unit for nonlinear mapping, transmitting the result to a next initial residual error map convolution module as input, repeating the operation for N times until an Nth layer of initial residual error map convolution module, and sending the output of the Nth layer of initial residual error map convolution module to a linear output module.

The characteristic information carried by each node on the topological graph is different, and the local structure formed between the nodes is also unique. When the graph convolution is used for extracting features, the neighbor nodes can aggregate the features of the neighbor nodes to the central node, the central node carries the information of the neighbor nodes and the self nodes to be spread to the next layer, and along with the continuous superposition of the graph convolution operation, the feature information between the nodes becomes similar and is difficult to distinguish. Therefore, a residual mechanism is required to be introduced to inhibit the problem of over-smoothing, the node features are extracted in a mode of combining an initial residual structure and graph convolution operation, over-fitting can be avoided while the number of layers of a graph convolution network is deepened, and diversity and locality of node feature information are reserved. When the initial residual error item passes through the first graph convolution unit, the relationship of edges between node pairs in the topological graph forms a Laplace matrix, the Laplace matrix is used as a convolution kernel of the graph convolution unit after symmetrical normalization, and the node characteristic information of the node and the characteristic of the neighbor node can be propagated after weighted average because the diagonal line of the symmetrical normalized Laplace matrix is not 0. As the number of layers increases, the farther features each node can aggregate, the larger the receptive field.

The linear output module is composed of a dropout unit, a linear output unit and a softmax normalization unit, neurons are discarded randomly through the output of the multilayer initial residual map convolution module through the dropout unit, dimension reduction processing is conducted through the linear output unit, feature output is obtained through a full-connection mode, classification and division of nodes of the topological graph are achieved, and finally the softmax normalization unit is used for conducting normalization processing on the feature output to obtain an output prediction value. The linear output unit is similar to the linear input unit in structure, the output dimension of the linear output unit is the number of classes of nodes, and the output dimension of the linear input unit is the number of neuron units of the hidden layer. In addition, the Softmax normalization unit adopts a normalization index function, in order to express the multi-classification prediction results from negative infinity to positive infinity in a probability form, the output of the linear output unit is firstly converted by using the index function, the nonnegativity of the model output is ensured, and then the converted results are normalized to ensure that the sum of the probabilities of the prediction results is equal to 1.

The traditional machine learning method generally treats a data sample as independent and treats a graph structure as a fixed quantity, ignores the assumption that the graph structure is from noise data or modeling, ignores the relationship information contained between the data sample and the graph structure, and effectively assists the inference model training by utilizing the graph structure, node characteristics and known labels to generate a random graph to assist the network in predicting unknown labels. The random graph generation module in the embodiment of the invention takes the characteristics of an original topological graph as input, comprises a two-layer perceptron module, a graph edge sampling module and a negative edge sampling module, is used for simulating a noise graph of the original topological graph, is used as a parallel structure of a depth initial residual graph convolution module, provides modeling hypothesis for the depth initial residual graph convolution module, and extracts relationship information between nodes.

The double-layer perceptron module consists of a linear input module and a linear output module and is used for carrying out linear transformation and classification on characteristic input, and the characteristic output is used as the input of the graph edge sampling module and the negative edge sampling module for carrying out edge random sampling; the graph edge sampling module is used for randomly sampling edges between node pairs according to the probability of generating edges between all nodes on the graph through the relationship between the nodes, and the probability only depends on the edges between the nodes; and the negative edge sampling module is used for randomly sampling edges among nodes on the graph according to the set negative edge probability, and the edge combination nodes sampled by the graph edge sampling module and the negative edge sampling module construct a noise graph.

In the embodiment of the invention, the double-layer perceptron module consists of a linear input module and a linear output module, and the structure of the double-layer perceptron module is lack of a graph convolution unit and an initial residual structure compared with an initial residual graph convolution module, and is used for carrying out linear transformation and classification division on characteristic input to obtain dimension reduction representation of an original topological graph, and the output of the double-layer perceptron module and the characteristic input are jointly used as the input of a graph edge sampling module and a negative edge sampling module to carry out edge random sampling to generate a random graph.

And the graph edge sampling module is used for combining the training set label with the prediction label of the double-layer perceptron module on the original topological graph, and carrying out edge sampling on the symmetrical normalized Laplace matrix through a label merging matrix and characteristic input according to the probability of all edges possibly generated between the nodes on the original topological graph to obtain a random graph of the graph edge sampling module. The graph edge sampling module sparsifies a node label matrix according to a one _ hot coding mode, then masks a non-training part of the label matrix to obtain a training mask label matrix, merges the one _ hot coding matrix and the dimensionality reduction representation of the double-layer perceptron module according to the matrix to obtain a label merging matrix of a training set node label and an unknown node predictive label, selects a value in the one _ hot coding matrix when an element in the matrix is not 0, otherwise selects a value in the dimensionality reduction representation of the double-layer perceptron module, and adds a predictive label to a generated random graph node in the process. And searching word embedding from a symmetrical normalized Laplacian matrix representing the edge relation between node pairs by using the label merging matrix and the characteristic input in an index mode, and obtaining the dimensionality reduction representation of the graph edge sampling random graph in a linear layer dimensionality reduction mode.

And the negative side sampling module is used for carrying out side random sampling on a random incidence matrix constructed by the symmetrical normalized Laplace matrix through a label merging matrix and characteristic input according to the multiplying power of the negative side sample to obtain a random graph of the negative side sampling module. Taking one dimension of the symmetrical normalized Laplace matrix as a prototype, randomly generating a random incidence matrix representing the edge relation between node pairs according to the multiplying power of a negative edge sample, retrieving words from the random incidence matrix by a label merging matrix and characteristic input in an index manner, embedding the words, and obtaining the dimensionality reduction representation of the negative edge sampling random graph in a linear layer dimensionality reduction manner.

Considering that the constraint condition of the deep initial residual map convolution module is too single, in order to improve the model learning ability and enable the network to acquire more intrinsic characteristics of the map structure, it is desirable to acquire the constraint condition with gain from the outside of the module. The loss function in the embodiment of the invention consists of the loss of a depth initial residual error map volume module and the loss of a random map generation module, and the loss function and the random map generation module are trained together by a constraint model. The loss of the depth initial residual image volume module is the inconsistency degree obtained by calculating an output predicted value generated by the depth initial residual image volume module and a real value of a sample through a log likelihood cost function; and the random image generation module loss is composed of the loss of the two layers of sensor modules and the loss formed by the joint distribution of the image structure, the characteristic information and the label, and is used for adding disturbance to the model and correcting the parameter of the depth initial residual image convolution module.

In the embodiment of the invention, the loss of the deep initial residual error map convolution module adopts a maximum likelihood loss function commonly used for multi-classification to carry out model training, and because softmax normalization is adopted at the output of the deep initial residual error map convolution module, a cross entropy loss function is actually used. The input of the function is a predicted label vector and a real label vector of a depth initial residual error map convolution module, the real label vector is converted into a one _ hot coding form in the calculation process, the predicted label vector subjected to softmax normalization processing and the real label vector of the one _ hot coding are subjected to point multiplication, when the value of the real label vector is 1, the absolute value of the predicted label vector corresponding to the position is taken as an error, and the error summation is carried out to obtain a total loss value. If the matching number of the predicted label and the real label is large, the total loss value is small, and when the matching number is large, the total loss value is large.

The random graph generation module loss is composed of two-layer sensor module loss and loss formed by combined distribution of a graph structure, characteristic information and labels, and the two-layer sensor module loss is the inconsistency degree obtained by calculating an output predicted value and a sample real value which are generated by the two-layer sensor module through a log-likelihood cost function; losses formed by joint distribution of a graph structure, characteristic information and labels are respectively a Logisigmoid error generated by output of a graph edge sampling module and a negative edge sampling module, a prediction label error of a deep initial residual error graph convolution module and a two-layer perceptron module about an unknown label node part, the deep initial residual error graph convolution module learns an empirical error from labeled data, a posterior label is generated for unlabeled data, in a random graph generated by sampling of a random graph generation module, the unlabeled data generates category attribution according to the distribution of the labeled data, and the model learning is promoted by the back propagation through supervision and correction of the two modules about the posterior distribution error of the unlabeled data, so that the performance of the model is improved.

The node classification aims to realize the nonlinear mapping relation from the abstract complex topological graph to the node characteristics according to the node information and graph structure information of the original topological graph, and the classification of the nodes is completed by extracting the characteristic information carried by the nodes and neighbors through a graph convolution method. Nodes in the same cluster are often densely connected, in the process of graph convolution, the output becomes smooth due to the characteristics of the laplacian matrix, and the characteristics of the output and the characteristics are similar due to the smoothing operation, so that the classification task becomes easy, but the problem of over-smoothing is caused. From the spatial perspective, the essence of graph convolution is to aggregate neighbor node information using graph structure to generate new node features. When the operations of graph convolution are overlapped too much, the aggregation operation may cause each node to be flooded with a large amount of redundant information, and the diversity and locality of the node characteristics are lost. According to the depth initial residual error module provided by the embodiment of the invention, the initial residual error structure is added in the graph convolution process, so that the problem of over-smoothness is greatly relieved, and the characteristic information contained in the node is fully extracted. The random graph generation module carries out edge random sampling on the original topological graph by a graph edge sampling and negative edge sampling method to generate a noise graph of the original topological graph, and effective disturbance is added in the training process of the model to achieve the effect of data enhancement. The loss monitoring module also assists the depth initial residual error module to learn the model by utilizing the relation information among the graph structure, the node characteristics and the labels, so that the model can acquire more internal characteristics of the topological graph and better complete the node classification task.

In the case of the example 2, the following examples are given,

the method for classifying the topological graph nodes by using the graph residual error generation model for multi-information aggregation is carried out according to the following steps as shown in FIG. 2:

step S1, inputting characteristic input formed by node information on the topological graph into the depth initial residual error graph convolution module and the random graph generation module respectively;

step S2, the depth initial residual error map convolution module performs feature extraction on the feature input to obtain a dimensionality reduction representation of the original feature input as an output predicted value;

the calculation formula output by the depth initial residual error map convolution module is as follows:

X⁽⁰⁾＝ReLU(Lin(dropout(X)))；

wherein, X is the characteristic input (namely the original characteristic input) of the linear input module; dropout represents a dropout unit, and 30% -60% of neurons in the original characteristic input X are ignored according to a given probability; lin represents a linear input unit for reducing dimensionality for linear transformation of feature input; the ReLU is a nonlinear activation unit, and a nonlinear function is provided by adopting a ReLU nonlinear activation function; x⁽⁰⁾Representing the output of the linear input module as the initial residual item of the multi-layer initial residual graph convolution module;

representing a symmetric normalized laplacian matrix as a "convolution kernel" for the graph convolution unit; d is D_jA constructed diagonal matrix; d_jRepresenting the degree of a node j in the original topological graph G;

representing a degree matrix with self-loops; j ∈ {1, …, n } represents a node index in the graph structure; a is a contiguous matrix of the graph structure,

representing an adjacency matrix with self-loops; i is_nIs an identity matrix;

is shown to pass through

To pair

To pair

Normalization is performed to ensure information transfer.

X^(k-1)The input of the convolution module of the k-th layer initial residual map is shown, and when k is 1, the k is an initial residual term; alpha represents a proportional parameter of a residual error structure, and an initial residual error item and the feature representation output by the current graph convolution unit are added according to the proportional parameter alpha for information fusion; x^(k)Representing the output of the k-th layer initial residual map convolution module; lout represents a linear output unit, which is used for classifying and dividing the output of the multilayer initial residual error map convolution module to obtain characteristic output; softmax represents a softmax normalization unit, and the characteristic output normalization processing is carried out to obtain an output predicted value

The prediction label of the unknown label node is finally obtained through a multilayer initial residual image convolution module formed by sequentially connecting a plurality of initial residual image convolution modules consisting of image convolution units and residual image structures, namely the characteristic input X of dimension reduction of the topological graph⁽⁰⁾In-process acquisition prediction tag matrix

Step S3, the feature input is sent to a random graph generation module, the double-layer perceptron module carries out feature extraction on the feature input, the feature input and the feature output of the dimensionality reduction are sent to a graph edge sampling module and a negative edge sampling module to carry out random sampling at the same time to construct a noise graph, and disturbance is added to the model, wherein the specific implementation process is as follows:

firstly, the feature input X is sent to a random graph generating module, the feature input is subjected to linear transformation and classification division by a double-layer perceptron module, and the dimension reduction representation of X is obtained

X⁽⁰⁾＝ReLU(Lin(dropout(X)))；

X⁽⁰⁾As part of a dual layer sensor module;

secondly, mixing X⁽⁰⁾、

Performing dimensionality splicing with a row index matrix and a column index matrix of the symmetrical normalized Laplace matrix, and constructing a random noise map according to the map edge sampling probability and the negative edge sampling probability, wherein the map edge sampling probability p is_1θ(G|X,L_k) And graph dimension reduction representation generated by graph edge sampling

And negative edge sampling probability p_2θ(G|X,L_k) And negative edge sample generated graph dimension reduction representation

The calculation formulas of (A) and (B) are respectively as follows:

wherein p is_1θ(G|X,L_k)、p_2θ(G|X,L_k) Respectively representing the sampling probability of the graph edge and the sampling probability of the negative edge; l is_ke.L is a known label matrix, L_mE, taking L as an unknown label matrix, and taking L as a label matrix;

is the collected negative side sample;

the graph dimension reduction representation generated by graph edge sampling and negative edge sampling is respectively shown. Graph edge sampling p_1θ(e_i,j|X,L_k) Is a random edge e between the node pairs i, j where the edge relation really exists in the graph structure_i,jProbability of sampling, negative edge sampling p_2θ(e_i,j|X,L_k) Is a random e in a node pair i, j where no edge relationship exists_i,jThe edge sampling probability.

S4, constraining the model training by using a monitoring loss module to improve the model performance, wherein the loss function in the monitoring loss module is as follows:

ζ＝μ₁ζ₁+μ₂(ζ₂-Logsigmod(p_θ(G|X,L_k))-p_θ(L_k|X,G)-p_θ(G|X,L_k,L_m))；

where ζ represents a total loss function of a graph residual generation model for multi-information aggregation, ζ₁Zeta, the loss function representing the convolution module of the initial residual map of depth₂Loss function, logsigmoid (p), representing a two-layer perceptron module_θ(G|X,L_k))、p_θ(L_k|X,G)、p_θ(G|X,L_k,L_m) Respectively forming loss functions by joint distribution of graph structures, characteristic information and labels; zeta₂、logsigmoid(p_θ(G|X,L_k))、p_θ(L_k|X,G)、p_θ(G|X,L_k,L_m) Loss functions which jointly form a random graph generation module; mu.s₁、μ₂Respectively representing super parameters of loss functions of the depth initial residual error map convolution module and the random map generation module; LogSigmoid represents a nonlinear function.

And step S5, classifying the nodes on the topological graph by using the model.

The core formula of the depth initial residual error map convolution module simplifies the model form, and a random map generation module and a supervision loss module are added for data enhancement. On the basis of the loss function of the depth initial residual error map convolution module, the loss function of the random map generation module is additionally added, the relationship information among the map structure, the characteristic information and the labels is fully utilized, and the model generalization capability is improved by error supervision generated by the posterior distribution of the label-free data of the two modules, so that the network can acquire more inherent characteristics of the map structure. The embodiment of the invention can be applied to text classification and knowledge reasoning in natural language processing, protein molecule classification and chemical function prediction in biochemistry, and can also be applied to a recommendation system, traffic prediction and the like; the invention can be independently used as a main program of application to process non-Euclidean data such as topological graph and word vector, extract the relation information of nodes or graphs from the non-Euclidean data and classify and predict the relation information, and can also be used as an auxiliary program of application to extract the semantic relation among image labels, thereby improving the performance of the main model.

In order to verify the effectiveness of the node classification method in the embodiment of the invention, Cora, Citeser and Pubmed are selected as test data sets, a topological graph of each data set is a citation network with documents as nodes and citations as edges, the characteristics of each node correspond to word stock representation of a document, and a label represents the research field to which the document belongs. An algorithm of KIPF (KIPF T N, WELLING M. semi-redundant classification with graph conditional networks [ J)].arXiv Preprint,arXiv:1609.02907,2016)；

Algorithm (a) to

P,CUCURULL G,CASANOVA A,et al.Graph attention networks[C]v/International Conference on Learning retrieval. Vancouver: ICLR,2018: 1-12); LUO's algorithm (LUO Y, JI R, GUAN T, et al. Evary node counts: Self-ensemble-compensated networks for semi-collaborative learning [ J)]Pattern Recognition,2020,106:107451) and the results of the experiments of the invention were verified by comparative analysis in both subjective and objective aspects.

As shown in fig. 3a to 3c, the classification method of the multi-information aggregated graph residual generation model to the topological graph nodes and other algorithms of the embodiment of the present invention aggregate graphs to the nodes of the Cora data set in the semi-supervised task. It can be observed that the Ours (inventive embodiment) algorithm is the best compared to other methods. For the Cora data set, although the graph structure is small, the feature vectors and the categories are large, the KIPF method cannot extract enough feature information to learn the global structure information, a deep network is required to extract potential effective information, and

the method only learns through the node characteristics, and ignores the graph structure information. Therefore, the method effectively improves the graph convolution network, avoids over-smoothing, fully extracts the node characteristic information and improves the classification effect.

As shown in fig. 4a to 4c, for the classification method of the multi-information aggregated graph residual generation model for the topological graph nodes and other algorithms in the semi-supervised task, node aggregation graphs of the Citeseer data set are shown, as follows: for the Citeser data set, the average number of nodes in the second-order neighborhood of each node is lower in the three data sets, the graph is relatively sparse, the information in the graph structure cannot be well extracted by the graph convolution network can be observed from the KIPF method,

method for neglecting view structure information obviously has defectsOn the contrary, when the graph structure has more categories or the graph is relatively sparse, the random graph generation method provides greater help, and the node aggregation graph of the embodiment of the invention can be observed to have the best classification effect. Therefore, the method effectively improves the graph convolution network, avoids over-smoothing, fully extracts the node characteristic information and improves the classification effect.

As shown in fig. 5a to 5c, for the classification method of the multi-information aggregated graph residual generation model for the topological graph nodes and the node aggregated graph of the Pubmed data set by other algorithms in the semi-supervised task, it can be seen that: for the Pubmed dataset, it can be seen that the graph is of few categories and relatively dense, KIPF method and

the classification effect of the method is approximate, the graph convolution network can well transmit the feature information from the neighbors, and the method of the embodiment of the invention has more perfect structure, more complete feature extraction and better classification effect. Therefore, the method effectively improves the graph convolution network, avoids over-smoothing, fully extracts the node characteristic information and improves the classification effect.

In the embodiment of the invention, in order to avoid deviation caused by qualitative analysis, the experimental results of different methods are compared by taking the semi-supervised classification accuracy as an evaluation index, and the results are shown in table 1:

TABLE 1 comparison of semi-supervised classification accuracy (%) for different methods

The higher the classification accuracy rate is, the better the classification effect of the semi-supervised node is. As can be seen from the data in Table 1, the accuracy of the cortex and Citeser data sets of the example of the invention (Ours) is better than that of the KIPF,

And LUO, the accuracy on the Pubmed dataset is only slightly lower than that of LUO, and the results are shown with KIPF,

The node aggregation graph under the Ours method is similar. Table 1 explicitly indicates the score of the semi-supervised classification accuracy under different reference data sets, so the method of the embodiment of the present invention is superior to other classification methods in node classification effect.

By comparing various depth models under different network depths, the embodiment of the invention can be observed to obtain the best result in an experiment, and can maintain the model performance at a deeper layer, thereby well relieving the problem of over-smoothness. Ablation experiment comparison proves that all modules designed by the embodiment of the invention play a positive role in the model learning process, can help the model to learn more characteristic information and improve the performance of the model.

The classification method of the topological graph nodes in the embodiment of the invention can be stored in a computer readable storage medium if the classification method is realized in the form of a software functional module and is sold or used as an independent product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the classification method of the topological graph nodes according to the embodiment of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A multi-information aggregated graph residual error generation model is characterized by comprising

2. The model for generating the residual error of the multi-information aggregated graph according to claim 1, wherein the depth initial residual error graph convolution module is formed by sequentially connecting a linear input module, a multi-layer initial residual error graph convolution module and a linear output module in series;

the first layer of initial residual error graph convolution module ignores 30% -60% of neurons in initial residual error items through a dropout unit, sends the residual error items to a graph convolution unit of the first layer, generates characteristic representation through convolution of a symmetrical normalized Laplace matrix on the residual error items, jump-connects the initial residual error items and the characteristic representation output by the current graph convolution unit according to a set proportion, performs information fusion to obtain characteristic output with the shape and size consistent with the initial residual error items, sends the characteristic output to a nonlinear activation unit for nonlinear mapping, takes the result as the input of the next initial residual error graph convolution module, and repeats until the output of the Nth layer of initial residual error graph convolution module is sent to a linear output module;

3. The model of claim 1, wherein the stochastic graph generation module comprises a two-layer perceptron module, a graph edge sampling module, and a negative edge sampling module;

4. The model for generating residual of multi-information aggregated graph according to claim 1, wherein the loss function is composed of depth initial residual graph volume loss and random graph generation module loss;

5. The method for classifying the topological graph nodes by using the multi-information aggregated graph residual error generation model according to any one of claims 1 to 4, is characterized by comprising the following steps:

s4, training a constraint model through a supervision loss module;

6. The method for classifying topological graph nodes by using a multi-information aggregated graph residual error generation model according to claim 5, wherein the output of the initial residual error graph convolution module of the k layer is calculated by the following formula;

wherein, X^(k-1)The input of the convolution module of the k-th layer initial residual map is shown, and when k is 1, the k is an initial residual term; alpha represents the proportional parameter of the residual error structure, and the initial residual error item and the characteristic output by the current graph convolution unit are calculated according to the proportional parameter alphaIndicating addition for information fusion; x^(k)Representing the output of the k-th layer initial residual map convolution module; dropout denotes a dropout cell, ReLU is a nonlinear activation cell,

the symmetric normalized Laplace matrix

representing an adjacency matrix with self-loops; i is_nIs an identity matrix;

is shown to pass through

To pair

Input moments during normalization operationsCharacteristic distribution of matrix X, by

To pair

And carrying out normalization operation to ensure information transmission.

7. The method for classifying topological graph nodes by using a multi-information aggregated graph residual generation model according to claim 5, wherein in the step S3, the method for constructing the noise graph specifically comprises:

graph dimension reduction representation generated by graph edge sampling

The calculation formula of (2) is as follows:

negative edge sample generated graph dimension reduction representation

The calculation formula of (2) is as follows:

wherein L is_kE is L as a known label matrix, and L is a label matrix;

is the collected negative side sample; graph edge sampling p_1θ(e_i，j|X，L_k) Is a random edge e between the node pairs i, j where the edge relation really exists in the graph structure_i，jProbability of sampling, negative edge sampling p_2θ(e_i，j|X，L_k) Is a random e in a node pair i, j where no edge relationship exists_i，jThe edge sampling probability.

8. The method for classifying topological graph nodes by using a multi-information aggregated graph residual error generation model according to claim 5, wherein the loss function in the supervision loss module in the step S4 is as follows:

ζ＝μ₁ζ₁+μ₂(ζ₂-LogSigmod(p_θ(G|X，L_k)+p_θ(L_k|X，G)+p_θ(G|X，L_k，L_m)))；

where ζ represents the graph residual generation for multi-information aggregationTotal loss function of model, ζ₁Zeta, the loss function representing the convolution module of the initial residual map of depth₂A loss function representing a dual layer perceptron module; logsigmoid (p)_θ(G|X，L_k))、p_θ（L_k|X，G)、p_θ(G|X，L_k，L_m) Respectively forming loss functions by joint distribution of graph structures, characteristic information and labels;

ζ₂、logsigmoid(p_θ(G|X，L_k))、p_θ(L_k|X，G)、p_θ(G|X，L_k，L_m) Loss functions which jointly form a random graph generation module; mu.s₁、μ₂Respectively representing super parameters of loss functions of the depth initial residual error map convolution module and the random map generation module; l is_mE is an unknown label matrix, and LogSigmoid represents a nonlinear function.

9. An electronic device, characterized in that the classification of the nodes of the topology graph is implemented using the method according to any of claims 5 to 8.

10. A computer storage medium having stored therein at least one program instruction which is loaded and executed by a processor to implement the method of classification of a topology graph node according to any of claims 5 to 8.