CN112364983A

CN112364983A - Protein interaction network node classification method based on multichannel graph convolutional neural network

Info

Publication number: CN112364983A
Application number: CN202011260336.3A
Authority: CN
Inventors: 杨旭华; 马钢峰; 徐新黎
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-12
Anticipated expiration: 2040-11-12
Also published as: CN112364983B

Abstract

A protein interaction network node classification method based on a multi-channel graph convolution neural network improves classification effect by combining high-order information, a protein interaction network is constructed according to protein interaction data, a multi-channel graph convolution neural network model is constructed, the model comprises two layers of structures, different graph convolution kernel combinations are used, semi-supervised classification is completed on the basis of a small number of labeled protein data, and the category of non-labeled protein is obtained. According to the method, the high-order information of the protein interaction network is extracted through the combination of the multi-channel high-order neighborhood graph convolutional neural networks, and the classification precision of the protein is improved under the lower operation cost.

Description

Protein interaction network node classification method based on multichannel graph convolutional neural network

Technical Field

The invention relates to the field of protein classification, in particular to a protein interaction network node classification method based on a multi-channel graph convolutional neural network.

Background

Proteins are the material basis of life, and almost all components of the human body can not be separated from proteins, so that the proteins are hot spots of research for a long time. Proteins often interact with each other to participate in cell metabolism, gene expression regulation, and other life processes, and thus a protein interaction network is formed. The protein interaction network visualizes and describes the relationship existing between the proteins through the network, is helpful for research and analysis, and plays an important role in understanding the biological composition and some disease causes from the molecular level.

The graph convolution network aims at performing convolution analysis on irregular complex network data. In semi-supervised learning, graph convolution can obtain better classification performance through a few labeled training sets, and the training speed is high, so that the graph convolution method is widely applied to various network structure data sets. However, the feature is too smooth due to the aggregation of high-order neighborhood information, so that the ordinary graph volume network can only aggregate 2-3 order neighborhood feature information, but the connection between proteins in the protein interaction network is relatively tight, and only aggregation of low-order information is not enough. Meanwhile, the protein interaction network data is often huge and complex, so that the method is very necessary for capturing high-order neighborhood information and obtaining better protein classification performance under the condition of controlling the network depth, namely under the condition of less parameters.

Disclosure of Invention

In order to solve the problem of large deviation of classification results of the existing protein interaction network, the invention provides a protein interaction network node classification method based on a multi-channel graph convolutional neural network with high accuracy.

The invention solves the technical problem by adopting the specific technical steps that:

a protein interaction network node classification method based on a multichannel graph convolutional neural network comprises the following steps:

the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is₁,v₂,...,v_NDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the amount of the protein,each protein initial characteristic vector is represented by a one-hot vector, an identity matrix X is the combination of all the protein initial characteristic vectors, C is the number of classes of the protein, and a small part of the known protein has class labels, and a large part of the protein does not have the class labels;

step two: constructing a multi-channel graph convolution neural network model, wherein the model comprises a two-layer structure, the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the ith channel_iI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channel_k+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;

step three: computing i order convolution kernels

Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;

step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model

y(i)＝SGC_(k+1-i)(f(SGC_i(X,A)),A)，

Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;

step five: calculating model output of a multi-channel graph convolutional neural network model

Wherein g is a softmax activation function;

step six: calculating loss values for semi-supervised classification

Where μ is a set of labeled nodes, Y_ijIs a node with a classification label;

step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.

The technical conception of the invention is as follows: the invention is based on the shallow neural network, and combines different convolution arrangements by using multiple channels while aggregating high-order information, thereby effectively improving the classification performance of the protein in the protein interaction network and improving the classification accuracy.

The invention has the beneficial effects that: the protein interaction network is processed by the convolution information combination of the multi-channel high-order neighborhood graph, so that the classification precision of the protein is improved under the lower operation cost.

Drawings

Fig. 1 is a schematic diagram of a neural network model, and for convenience of understanding, k is set to 3, features are input to different channels for convolution, and the obtained results are accumulated and activated through two-layer graph convolution to finally obtain an output result.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1, a protein interaction network node classification method based on a multi-channel graph convolutional neural network includes the following steps:

the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is₁,v₂,...,v_NDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the number of proteins, each protein initial characteristic vector is represented by one-hot vectors, the unit matrix X is the combination of all the protein initial characteristic vectors, C is the category number of the proteins, a small part of the proteins are known to have category labels, and most of the proteins do not have the category labels;

step two: construction ofA multi-channel graph convolution neural network model comprises a two-layer structure, wherein the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the i-th channel_iI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channel_k+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;

step three: computing i order convolution kernels

y(i)＝SGC_(k+1-i)(f(SGC_i(X,A)),A)，

Wherein g is a softmax activation function, and the model is shown in FIG. 1;

step six: calculating loss values for semi-supervised classification

Where μ is a set of labeled nodes, Y_ijIs a node with a classification label;

As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims

1. A protein interaction network node classification method based on a multi-channel graph convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:

step three: computing i order convolution kernels

y(i)＝SGC_(k+1-i)(f(SGC_i(X,A)),A)，

Wherein g is a softmax activation function;

step six: calculating loss values for semi-supervised classification

Where μ is a set of labeled nodes, Y_ijIs a node with a classification label;