CN112364983B

CN112364983B - Protein interaction network node classification method based on multichannel graph convolutional neural network

Info

Publication number: CN112364983B
Application number: CN202011260336.3A
Authority: CN
Inventors: 杨旭华; 马钢峰; 徐新黎
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2024-03-22
Anticipated expiration: 2040-11-12
Also published as: CN112364983A

Abstract

A protein interaction network node classification method based on a multichannel graph convolutional neural network is characterized in that a classification effect is improved by combining high-order information, a protein interaction network is constructed according to protein interaction data, a multichannel graph convolutional neural network model is constructed, the model comprises a two-layer structure, semi-supervised classification is completed on the basis of data of a few labeled proteins by using different graph convolution kernel combinations, and the type of unlabeled proteins is obtained. The invention extracts the high-order information of the protein interaction network by combining the multichannel high-order neighborhood graph convolution neural networks, and improves the classification precision of the protein under lower operation cost.

Description

Protein interaction network node classification method based on multichannel graph convolutional neural network

Technical Field

The invention relates to the field of protein classification, in particular to a protein interaction network node classification method based on a multichannel graph convolutional neural network.

Background

Proteins are the material basis of life, and almost all the components of the human body are not separated from proteins, and have long been the focus of research. Proteins often participate in vital processes such as cellular metabolism, regulation of gene expression, etc. through interactions, and on the basis of this, protein interaction networks are formed. The protein interaction network visualizes the relation existing between proteins through the network, thereby facilitating research and analysis, and playing a very important role in understanding biological composition and some disease causes from a molecular level.

The graph rolling network aims at performing convolution analysis on irregular complex network data. In semi-supervised learning, the graph convolution can obtain better classification performance through a few labeled training sets, and the training speed is higher, so that the method is widely applied to various network structure data sets. However, the aggregation of the high-order neighborhood information can cause the feature to be too smooth, so that the common graph rolling network can only aggregate the 2-3-order neighborhood feature information, but the relation among proteins in the protein interaction network is relatively tight, and only the aggregation of the low-order information is insufficient. Meanwhile, protein interaction network data are often huge and complex, so that the higher-order neighborhood information is captured under the condition of controlling the network depth, namely, fewer parameters, and better protein classification performance is obtained.

Disclosure of Invention

In order to solve the problem of larger deviation of the existing protein interaction network classification result, the invention provides a protein interaction network node classification method based on a multichannel graph convolutional neural network, which is higher in accuracy.

The invention solves the technical problems by adopting the specific technical steps that:

a protein interaction network node classification method based on a multichannel graph convolutional neural network comprises the following steps:

step one: constructing a protein interaction network model G (V, E) according to protein interaction data, wherein V is a node, E is a continuous edge, an adjacent matrix is represented by A, one node represents a protein, and a node set V= { V ₁ ,v ₂ ,...,v _N -represents a collection of proteins; if two proteins have interaction, a connecting edge is arranged between the two corresponding nodes; n represents the number of proteins, each protein initial feature vector is represented by a one-hot vector, the identity matrix X is the combination of all the protein initial feature vectors, C is the class number of the proteins, a small part of the proteins are known to have class labels, and a large part of the proteins have no class labels;

step two: constructing a multi-channel graph convolutional neural network model, wherein the model comprises a two-layer structure, the first layer comprises k channels, and an i-order convolutional kernel SGC is used on the ith channel _i I e {1,2,., k }; the second layer contains k three-dimensional convolution kernels, where the (k+1-j) th order convolution kernel SGC is used on the jth channel _k+1-j J e {1,2,.. K }, the i-th channel of the network model consists of the i-th channel of the first layer and the i-th channel of the second layer, wherein the output of the i-th channel of the first layer is the input of the i-th channel of the second layer;

step three: computing an i-order convolution kernel

Wherein GCN represents a graph roll-up neural network without an activation function, wherein i is more than or equal to 1 and less than or equal to k;

step four: computing an output of an ith channel of a multi-channel graph convolutional neural network model

y(i)＝SGC _(k+1-i) (f(SGC _i (X,A)),A)，

Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;

step five: model output for computing a multi-channel graph convolutional neural network model

Wherein g is a softmax activation function;

step six: calculating a loss value for a semi-supervised classification

Where μ is the labeled node set, Y _ij Is a node with a classification label;

step seven: repeating the steps three to six until the loss value converges, and taking the obtained Q as the classification result of the protein interaction network.

The technical conception of the invention is as follows: according to the invention, based on the shallow neural network, different convolution arrangements are combined by using multiple channels while high-order information is aggregated, so that the classification performance of proteins in the protein interaction network is effectively improved, and the classification accuracy is improved.

The beneficial effects of the invention are as follows: the protein interaction network is processed through the combination of the multi-channel high-order neighborhood graph convolution information, and the classification precision of the protein is improved under lower operation cost.

Drawings

Fig. 1 is a schematic diagram of a neural network model, for convenience of understanding, let k=3, input features into different channels for convolution, accumulate and activate the obtained results through two-layer graph convolution, and finally obtain an output result.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a protein interaction network node classification method based on a multi-channel graph convolutional neural network includes the following steps:

step three: computing an i-order convolution kernel

y(i)＝SGC _(k+1-i) (f(SGC _i (X,A)),A)，

Where g is a softmax activation function, and the model is shown in fig. 1;

step six: calculating a loss value for a semi-supervised classification

Where μ is the labeled node set, Y _ij Is a node with a classification label;

As described above, the specific implementation steps implemented by this patent make the present invention clearer. Any modifications and changes made to the present invention fall within the spirit of the invention and the scope of the appended claims.

Claims

1. A protein interaction network node classification method based on a multichannel graph convolutional neural network is characterized by comprising the following steps of: the method comprises the following steps:

step one: constructing a protein interaction network model G (V, E) according to protein interaction data, wherein V is a node, E is a continuous edge, an adjacent matrix is represented by A, one node represents a protein, and a node set V= { V ₁ ,v ₂ ,...,v _N -represents a collection of proteins; if two proteins have interaction, a connecting edge is arranged between the two corresponding nodes; n representsThe number of proteins, wherein each protein initial characteristic vector is represented by a one-hot vector, an identity matrix X is a combination of all protein initial characteristic vectors, C is the class number of proteins, a small part of proteins are known to have class labels, and a large part of proteins have no class labels;

step three: computing an i-order convolution kernel

y(i)＝SGC _(k+1-i) (f(SGC _i (X,A)),A)，

Wherein g is a softmax activation function;

step six: calculating a loss value for a semi-supervised classification

Where μ is the labeled node set, Y _ij Is a node with a classification label;