CN116108917A

CN116108917A - Non-marked sample enhanced semi-supervised graph neural network method

Info

Publication number: CN116108917A
Application number: CN202310140620.4A
Authority: CN
Inventors: 王波; 国英龙; 徐振宇; 王伟; 贾智洋; 杨铭锴; 刘彦北; 李志胜
Original assignee: Siteng Heli Tianjin Technology Co ltd
Current assignee: Siteng Heli Tianjin Technology Co ltd
Priority date: 2023-02-21
Filing date: 2023-02-21
Publication date: 2023-05-12
Anticipated expiration: 2043-02-21
Also published as: CN116108917B

Abstract

The invention discloses a semi-supervised graph neural network method enhanced by a label-free sample, which comprises the following steps: collecting original graph structure data, preprocessing, generating a corresponding enhanced view, maximizing mutual information between node representations and graph representations of the original view and the enhanced view, performing feature fusion on the node representations of the sample view to obtain more comprehensive node representations of information, performing training optimization on the model by minimizing an objective function, fully mining self-supervision information of the unlabeled sample, and improving node classification performance. The method for enhancing the semi-supervised graph neural network by using the unlabeled sample can fully utilize the unlabeled sample, mine self-supervision information, provide additional supervision signals for the model, improve node classification performance, and can be widely applied to the technical fields of chemical molecular property prediction, biological protein function prediction, social network analysis and the like.

Description

Non-marked sample enhanced semi-supervised graph neural network method

Technical Field

The invention relates to a neural network technology, in particular to a semi-supervised graph neural network method enhanced by a label-free sample.

Background

The figure is a ubiquitous data structure and is widely used in many fields. For example, in electronic commerce, graph-based learning systems can make highly accurate recommendations using interactions between users and products; in bioinformatics, proteins or enzymes can be represented as maps, which can be further classified according to chemical properties.

As map data is becoming more common in real-world scenarios, learning an efficient representation of map data is becoming more important. Meanwhile, with the successful application of deep learning on graph data, a Graph Neural Network (GNN) becomes the most popular graph representing learning technology, and has attracted extensive research interest in recent years.

However, many GNN models are still built in a supervised manner, and are limited by a semi-supervised learning framework, and supervision information of the models mainly comes from a small amount of tag data, so that unmarked data cannot be fully utilized, and supervision signals are imperfect. Therefore, how to fully utilize the unlabeled sample to mine the self-monitoring signal is important to improve the classification performance of the GNN.

Currently, self-supervised graph contrast learning-based methods are a viable approach that yields good results in extracting supervision information from the graph data. However, since valuable tag information in semi-supervised learning is not incorporated, contrast learning cannot be directly applied to semi-supervised learning.

Disclosure of Invention

In order to solve the technical problems, the invention provides a semi-supervised graph neural network method enhanced by a non-marked sample, which can fully utilize the non-marked sample to mine self-supervision information, provide additional supervision signals for a model, improve node classification performance and can be widely applied to the technical fields of chemical molecular property prediction, biological protein function prediction, social network analysis and the like.

In order to achieve the above purpose, the present invention provides a method for a semi-supervised graph neural network enhanced by a label-free sample, comprising the following steps:

s1, collecting original graph structure data, and preprocessing a data set;

s2, generating a corresponding enhancement view for the collected image data by using an image diffusion method;

s3, inputting the original view and the enhanced view into a parameter-sharing graph neural network encoder together, learning respective node representation and graph representation, and adopting a cross-view contrast learning strategy to maximize mutual information between the node representation and the graph representation of the original view and the enhanced view;

s4, carrying out feature fusion on the node representation of the positive sample view to obtain node representation with more comprehensive information, wherein the node representation is used for node classification tasks;

s5, training and optimizing the model by minimizing an objective function, and fully mining self-supervision information of the unmarked sample to improve node classification performance.

Preferably, in step S1, the preprocessing mainly includes dividing a training set, a verification set, a test set, and normalization processing of an adjacency matrix and a feature matrix;

and the graph dataset used is denoted g= (V, E), where V is the set of nodes, E is the set of edges, |v|=n, n represents the number of nodes, |e|=m, m is the number of edges; adjacency matrix A epsilon R ^n×n The method comprises the steps of carrying out a first treatment on the surface of the Node characteristic matrix of the graph of node attributes is expressed as X ε R ^n×d D represents the dimension of the node feature.

Preferably, in step S2, the original image data is subjected to data enhancement by using an image diffusion method to obtain an enhanced view;

the graph diffusion method is PPR or Heat Kernel.

Preferably, in step S2, considering that the graph comparison method relies on the comparison between node representations in different views, and that different datasets exist in different feature dimensions and node numbers, enhancement is performed using the graph diffusion method, by converting the adjacency matrix into a diffusion matrix, and considering both matrices as two congruent views of the same graph structure, the diffusion matrix is considered as an enhanced view;

wherein the generalized map diffusion is defined as:

in θ _k Is a weight coefficient for determining the ratio of global and local information, T E R ^n×n Is a generalized transition matrix;

PPR graph diffusion:

PPR sets the parameter to t=ad ^-1 ，

Wherein alpha is the transmission probability and the value range is alpha E [0,1 ]]；

Alternatively, heat Kernel diffusion:

setting the parameter to t=ad ^-1 ，

t represents diffusion time and is therefore defined as:

S ^PPR ＝α(I _n -(1-α)D ^-1/2 AD ^-1/2 ) ^-1

S ^heat ＝exp(tAD ^-1 -t)。

preferably, in step S3, a graph representation is obtained through a Pooling function for subsequent cross-view contrast learning.

Preferably, in step S3, mutual information between the node representation of one view and the graph representation of the other view is maximized using cross-view contrast learning, and by maximizing the discriminator

The calculated probability score is optimized.

Preferably, the step S3 specifically includes the following steps:

s31, randomly disturbing the feature matrix X of the original graph G by using a random function c to obtain a negative sample

S32, respectively inputting the two views into a parameter sharing encoder to finally obtain a node representation H of the positive sample view _α ，H _β And node representation of negative sample views

Wherein α and β represent the original view and the enhanced view, respectively;

s33, modeling mutual information as a discriminator

The mutual information maximization module is marked as

Representation of each node +.>

And the diagram under this view represents +.>

The greater this value, the higher the similarity of the node representation to the graph representation and vice versa;

s34, calculating a discriminator score by applying a bilinear scoring function:

wherein W is a leachable scoring matrix, sigma is a Sigmoid nonlinear activation function for converting the consistency score into a probability value;

s35, adopting a standard binary cross entropy loss function:

wherein E is _G ＝{H _α ，H _β }，

By minimizing the formula, mutual information between two views is effectively maximized, so that supervision signals in graph data are mined, and the problem of lack of model supervision is relieved. />

Preferably, in step S4, the node representations of the front view are fused in a matrix addition and concatenation manner by using a feature fusion mechanism, and the classification loss of the label nodes is optimized by using a cross entropy function.

Preferably, the step S4 specifically includes the following steps:

s41, a feature fusion process comprises the following steps:

H _δ ＝H _α +H _β ，H _δ ＝H _α ||H _β

wherein H is _δ Is represented by the elevation node H _α And H _β Fusion is carried out to obtain;

wherein each vector->

C represents the number of categories;

s42, converting each element into a probability value by using a softMax function:

for each node, the above formula obtains a predicted probability value for each class of the node (S ₁ ，S ₂ ，S ₃ ，. _. .，S _C ) The index category with the highest score represents the label of the model prediction;

s43, optimizing the cross entropy loss function:

wherein y is _i，c A binary indication of class labels for each node.

Preferably, in step S5, cross-view contrast learning and feature fusion are combined to obtain a final objective function, and the loss function optimization model is minimized through the above conditions.

The beneficial effects of the invention are as follows:

1. the self-consistency constraint item is provided, and the self-supervision information is extracted from a large number of unmarked nodes by maximizing mutual information between two views, so that the structure information and the characteristic information of unmarked data can be effectively utilized, and supervision signals can be fully mined.

2. The feature fusion mechanism is provided, and comprehensive node embedding is obtained by fusing node representations of two front views, so that the feature fusion mechanism is beneficial to fully extracting supervision information from a few label samples, can help a model learn the node representations with rich and comprehensive information, effectively improves classification performance, and effectively avoids label information loss.

3. The self-consistency constraint item is combined with the feature fusion mechanism, so that the supervision signals can be mined from the marked data and the unmarked data at the same time, and the self-consistency constraint item can be used for scenes such as graph data analysis, graph representation learning and the like instead of relying on a small amount of tag data, and can assist people in better analyzing and using the graph data.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a block diagram of an algorithm of the present invention;

fig. 3 is a visual experimental diagram of an experimental example node representation of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings, and it should be noted that, while the present embodiment provides a detailed implementation and a specific operation process on the premise of the present technical solution, the protection scope of the present invention is not limited to the present embodiment.

A label-free sample-enhanced semi-supervised graph neural network method, comprising the steps of:

s1, collecting original graph structure data, and preprocessing a data set;

the graph diffusion method is PPR or Heat Kernel.

wherein the generalized map diffusion is defined as:

PPR graph diffusion:

PPR sets the parameter to t=ad ^-1 ，

Alternatively, heat Kernel diffusion:

setting the parameter to t=ad ^-1 ，

t represents diffusion time and is therefore defined as:

S ^PPR ＝α(I _n -(1-α)D ^-1/2 AD ^-1/2 ) ^-1

S ^heat ＝exp(tAD ^-1 -t)。

s3, commonly inputting the original view and the enhanced view into a parameter-sharing graph neural network encoder, learning respective node representation and graph representation, and adopting a cross-view contrast learning strategy to maximize mutual information between the node representation (local feature) and the graph representation (global feature) of the original view and the enhanced view;

The calculated probability score is optimized.

Preferably, the step S3 specifically includes the following steps:

s31, utilizing a random function

Randomly scrambling the feature matrix X of the original graph G and obtaining a negative sample +.>

s33, modeling mutual information as a discriminator

The mutual information maximization module is marked as

Representation of each node +.>

And the diagram under this view represents +.>

The greater this value, the higher the similarity of the node representation to the graph representation and vice versa; />

S34, calculating a discriminator score by applying a bilinear scoring function:

s35, adopting a standard binary cross entropy loss function:

wherein E is _G ＝{H _α ，H _β }，

By minimizing the formula, mutual information between two views is effectively maximized, so that supervision signals in graph data are mined, and the problem of lack of model supervision is relieved.

Preferably, the step S4 specifically includes the following steps:

s41, feature fusion is carried out in a mode of adding or splicing node expression matrixes, so that loss of feature information is avoided, comprehensive node expression can be obtained, subsequent node classification performance is improved, and a feature fusion process is as follows:

H _δ ＝H _α +H _β ，H _δ ＝H _α ||H _β

each of which is oriented inQuantity->

C represents the number of categories;

s43, optimizing the cross entropy loss function:

wherein y is _i，c A binary indication of class labels for each node. In a general classification task, cross entropy loss is a well behaved standard loss function. Therefore, the cross entropy loss function is used to calculate H in this embodiment _δ Classification loss of (c).

Experimental example

The experiment was conducted with reference to six graph datasets, including three quotation network datasets, one social network dataset, one disease dataset, and one commodity purchase graph dataset.

Table 1 is a table of statistical information of the number of nodes, the number of edges, and the number of categories of the dataset

As can be seen from table 1, in the experiment, for the data set division, 20 tag nodes are selected as training sets for each class according to the number of classes of the data set, and 500 and 1000 nodes are selected as verification sets and test sets, respectively. Since the Disease data set has only 1044 nodes, the experiment has no verification set, and is divided into a training set and a test set according to the above strategy. In addition, the experiment further assessed the performance of the model using different tag rates (5%, 10%,15%, 20%) on the three quoted network datasets (Cora, citeseer, pubmed).

The method of the invention was compared in experiments with the following methods:

deep walk: the method uses random walk to obtain context information, and uses a lattice algorithm to learn a network embedding method of network representation.

ChebNet: a graph-roll network based method uses ChebNet filters.

GCN: the method is a spectrum-based graph-rolling network model that learns node representations by aggregating information from neighbors.

TAGCN: the method is a novel graph convolution network defined in the vertex domain, which designs a set of learner filters of fixed size to convolve the graph.

GraphSAGE: the method is a generic generalized framework that instead of training a different representation vector for each node, trains a set of aggregation functions that generate node representations for previously unseen data.

GAT: the method provides an attention mechanism for allocating different weights to neighbor nodes for feature aggregation, considers the importance of neighbors and improves the aggregation process.

MixHop: the method is based on a graph-rolling network that can learn these relationships by repeatedly blending feature representations of neighbors of different distances.

ARMA: inspired by an autoregressive moving average (Auto-Regressive Moving Average, ARMA) filter, the method proposes a new graph roll stacking, improves training efficiency, and enhances modeling capability of GNN.

LGCN: the method is a new lorentz graph convolution network (Lorentzian Graph Convolutional Network, LGCN) that designs a unified graph manipulation framework on hyperbolic models of hyperbolic spaces.

FIG. 2 is a table showing the results of the node classification accuracy on six data sets for the experimental example

The algorithm with the best performance effect is SCGNN (the method of the invention). The performance of SCGNN on five data sets was consistently better than all of the comparison baselines, indicating that the present method is effective in classifying performance. SCGNN achieved a greater performance improvement than the best performance in baseline, especially on the Disease and blogctalog datasets, 85.3% vs 79.5% and 81.4% vs 79.2%, respectively. Furthermore, on the Amazon Computer dataset, the performance of SCGNN was also improved by 1.6% compared to the optimal baseline, demonstrating the effectiveness of the present invention.

Table 3 is a table of node classification results for different label rates

From table 3, SCGNN (invention) performed best over all data sets and label rates compared to all baselines. In particular, SCGNN was increased by 1.1% and 1.8% on Cora and citeser, respectively, compared to the optimal baseline in terms of accuracy. At the F1 score, SCGNN performed 2% higher than the optimal baseline in the Pubmed dataset. In addition, the performance of the method of the present invention on all datasets is always better than GCN, because SCGNN can extract more supervisory information from the graph data than GCN, demonstrating the effectiveness of feature fusion mechanisms and self-consistent constraints. In addition, as can be seen from the results in the table, the performance of SCGNN is also improved obviously along with the improvement of the label rate, which shows that the method has a very good application prospect in semi-supervised node classification tasks, especially in the condition that label data is not easy to acquire.

Also as can be seen in fig. 3, the visualization of deepfulk is the worst on both the Cora and Pubmed datasets, as the different classes of samples are all mixed together with no clear boundaries. The GCN visualization results are improved, but the effect is still not ideal, and particularly on Cora data, the classification results are very confusing. Compared to GCN, the visualization of GAT is slightly improved, but the boundaries between different categories remain blurred. It is obvious that the method of the invention has the best visualization result, the learned representation structure is more compact, the similarity is highest, and the boundaries between different categories are clear, which indicates that the method of the invention is beneficial to extracting more comprehensive node representation for downstream tasks.

Therefore, the method for the semi-supervised graph neural network enhanced by the unlabeled sample can fully utilize the unlabeled sample, mine self-supervision information, provide additional supervision signals for the model, improve node classification performance, and can be widely applied to the technical fields of chemical molecular property prediction, biological protein function prediction, social network analysis and the like.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. The unmarked sample enhanced semi-supervised graph neural network method is characterized in that: the method comprises the following steps:

s1, collecting original graph structure data, and preprocessing a data set;

2. A method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 1, wherein: in step S1, preprocessing mainly comprises dividing a training set, a verification set, a test set, and normalization processing of an adjacent matrix and a feature matrix;

3. A method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 1, wherein: in step S2, performing data enhancement on the original image data by using an image diffusion method to obtain an enhanced view;

the graph diffusion method is PPR or Heat Kernel.

4. A method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 3, wherein: in step S2, considering that the graph comparison method relies on the comparison between node representations in different views, and that different datasets exist in different feature dimensions and node numbers, enhancement is performed using the graph diffusion method, by converting the adjacency matrix into a diffusion matrix, and considering the two matrices as two congruent views of the same graph structure, the diffusion matrix is considered as an enhanced view;

wherein the generalized map diffusion is defined as:

PPR graph diffusion:

PPR sets the parameter to t=ad ^-1 ，

Alternatively, heat Kernel diffusion:

setting the parameter to t=ad ^-1 ，

t represents diffusion time and is therefore defined as:

S ^PPR ＝α(I _n -(1-α)D ^-1/2 AD ^-1/2 ) ^-1

S ^heat ＝exp(tAD ^-1 -t)。

5. a method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 1, wherein: in step S3, a graph representation is obtained by a Pooling function for subsequent cross-view contrast learning.

6. The method of unmarked sample enhanced semi-supervised graph neural network as set forth in claim 5, wherein: in step S3, mutual information between the node representation of one view and the graph representation of another view is maximized using cross-view contrast learning, and by maximizing the discriminator

The calculated probability score is optimized.

7. The method of unmarked sample enhanced semi-supervised graph neural network as set forth in claim 6, wherein: the step S3 specifically comprises the following steps:

s33, modeling mutual information as a discriminator

The mutual information maximization module is marked as

Representation of each node +.>

And the diagram under this view represents +.>

s34, calculating a discriminator score by applying a bilinear scoring function:

s35, adopting a standard binary cross entropy loss function:

wherein E is _G ＝{H _α ，H _β }，

8. A method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 1, wherein: in step S4, a feature fusion mechanism is used, and a matrix addition and splicing manner is adopted to fuse node representations of front views, and a cross entropy function is used to optimize classification loss of label nodes.

9. The label-free sample-enhanced semi-supervised graph neural network approach of claim 8, wherein: the step S4 specifically comprises the following steps:

s41, a feature fusion process comprises the following steps:

H _δ ＝H _α +H _β ，H _δ ＝H _α ||H _β

wherein each vector

C represents the number of categories;

for each node, the above formula obtains a predicted probability value for each class of the node (S ₁ ，S ₂ ，S ₃ ，...，S _C ) The index category with the highest score represents the label of the model prediction;

s43, optimizing the cross entropy loss function:

wherein y is _i，c A binary indication of class labels for each node.

10. A method of a label-free sample-enhanced semi-supervised graph neural network as recited in claim 1, wherein: in step S5, cross-view contrast learning and feature fusion are combined to obtain a final objective function, and the loss function optimization model is minimized through the conditions.