CN112364983A - Protein interaction network node classification method based on multichannel graph convolutional neural network - Google Patents
Protein interaction network node classification method based on multichannel graph convolutional neural network Download PDFInfo
- Publication number
- CN112364983A CN112364983A CN202011260336.3A CN202011260336A CN112364983A CN 112364983 A CN112364983 A CN 112364983A CN 202011260336 A CN202011260336 A CN 202011260336A CN 112364983 A CN112364983 A CN 112364983A
- Authority
- CN
- China
- Prior art keywords
- channel
- protein
- neural network
- protein interaction
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006916 protein interaction Effects 0.000 title claims abstract description 27
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 14
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 40
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 40
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 238000003062 neural network model Methods 0.000 claims abstract description 5
- 239000013598 vector Substances 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000019522 cellular metabolic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computing Systems (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biotechnology (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A protein interaction network node classification method based on a multi-channel graph convolution neural network improves classification effect by combining high-order information, a protein interaction network is constructed according to protein interaction data, a multi-channel graph convolution neural network model is constructed, the model comprises two layers of structures, different graph convolution kernel combinations are used, semi-supervised classification is completed on the basis of a small number of labeled protein data, and the category of non-labeled protein is obtained. According to the method, the high-order information of the protein interaction network is extracted through the combination of the multi-channel high-order neighborhood graph convolutional neural networks, and the classification precision of the protein is improved under the lower operation cost.
Description
Technical Field
The invention relates to the field of protein classification, in particular to a protein interaction network node classification method based on a multi-channel graph convolutional neural network.
Background
Proteins are the material basis of life, and almost all components of the human body can not be separated from proteins, so that the proteins are hot spots of research for a long time. Proteins often interact with each other to participate in cell metabolism, gene expression regulation, and other life processes, and thus a protein interaction network is formed. The protein interaction network visualizes and describes the relationship existing between the proteins through the network, is helpful for research and analysis, and plays an important role in understanding the biological composition and some disease causes from the molecular level.
The graph convolution network aims at performing convolution analysis on irregular complex network data. In semi-supervised learning, graph convolution can obtain better classification performance through a few labeled training sets, and the training speed is high, so that the graph convolution method is widely applied to various network structure data sets. However, the feature is too smooth due to the aggregation of high-order neighborhood information, so that the ordinary graph volume network can only aggregate 2-3 order neighborhood feature information, but the connection between proteins in the protein interaction network is relatively tight, and only aggregation of low-order information is not enough. Meanwhile, the protein interaction network data is often huge and complex, so that the method is very necessary for capturing high-order neighborhood information and obtaining better protein classification performance under the condition of controlling the network depth, namely under the condition of less parameters.
Disclosure of Invention
In order to solve the problem of large deviation of classification results of the existing protein interaction network, the invention provides a protein interaction network node classification method based on a multi-channel graph convolutional neural network with high accuracy.
The invention solves the technical problem by adopting the specific technical steps that:
a protein interaction network node classification method based on a multichannel graph convolutional neural network comprises the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the amount of the protein,each protein initial characteristic vector is represented by a one-hot vector, an identity matrix X is the combination of all the protein initial characteristic vectors, C is the number of classes of the protein, and a small part of the known protein has class labels, and a large part of the protein does not have the class labels;
step two: constructing a multi-channel graph convolution neural network model, wherein the model comprises a two-layer structure, the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the ith channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Wherein g is a softmax activation function;
step six: calculating loss values for semi-supervised classification
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
The technical conception of the invention is as follows: the invention is based on the shallow neural network, and combines different convolution arrangements by using multiple channels while aggregating high-order information, thereby effectively improving the classification performance of the protein in the protein interaction network and improving the classification accuracy.
The invention has the beneficial effects that: the protein interaction network is processed by the convolution information combination of the multi-channel high-order neighborhood graph, so that the classification precision of the protein is improved under the lower operation cost.
Drawings
Fig. 1 is a schematic diagram of a neural network model, and for convenience of understanding, k is set to 3, features are input to different channels for convolution, and the obtained results are accumulated and activated through two-layer graph convolution to finally obtain an output result.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a protein interaction network node classification method based on a multi-channel graph convolutional neural network includes the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the number of proteins, each protein initial characteristic vector is represented by one-hot vectors, the unit matrix X is the combination of all the protein initial characteristic vectors, C is the category number of the proteins, a small part of the proteins are known to have category labels, and most of the proteins do not have the category labels;
step two: construction ofA multi-channel graph convolution neural network model comprises a two-layer structure, wherein the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the i-th channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Wherein g is a softmax activation function, and the model is shown in FIG. 1;
step six: calculating loss values for semi-supervised classification
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.
Claims (1)
1. A protein interaction network node classification method based on a multi-channel graph convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the number of proteins, each protein initial characteristic vector is represented by one-hot vectors, the unit matrix X is the combination of all the protein initial characteristic vectors, C is the category number of the proteins, a small part of the proteins are known to have category labels, and most of the proteins do not have the category labels;
step two: constructing a multi-channel graph convolution neural network model, wherein the model comprises a two-layer structure, the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the ith channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Wherein g is a softmax activation function;
step six: calculating loss values for semi-supervised classification
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011260336.3A CN112364983B (en) | 2020-11-12 | 2020-11-12 | Protein interaction network node classification method based on multichannel graph convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011260336.3A CN112364983B (en) | 2020-11-12 | 2020-11-12 | Protein interaction network node classification method based on multichannel graph convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112364983A true CN112364983A (en) | 2021-02-12 |
CN112364983B CN112364983B (en) | 2024-03-22 |
Family
ID=74515357
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011260336.3A Active CN112364983B (en) | 2020-11-12 | 2020-11-12 | Protein interaction network node classification method based on multichannel graph convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364983B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113053457A (en) * | 2021-03-25 | 2021-06-29 | 湖南大学 | Drug target prediction method based on multi-pass graph convolution neural network |
CN113241114A (en) * | 2021-03-24 | 2021-08-10 | 辽宁大学 | LncRNA-protein interaction prediction method based on graph convolution neural network |
CN113539381A (en) * | 2021-07-16 | 2021-10-22 | 中国海洋大学 | Molecular dynamics result analysis method based on residue interaction and PEN |
CN115312119A (en) * | 2022-10-09 | 2022-11-08 | 之江实验室 | Method and system for identifying protein structural domain based on protein three-dimensional structure image |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522953A (en) * | 2018-11-13 | 2019-03-26 | 北京师范大学 | The method classified based on internet startup disk algorithm and CNN to graph structure data |
CN109977232A (en) * | 2019-03-06 | 2019-07-05 | 中南大学 | A kind of figure neural network visual analysis method for leading figure based on power |
CN110889015A (en) * | 2019-10-31 | 2020-03-17 | 天津工业大学 | Independent decoupling convolutional neural network characterization algorithm for graph data |
CN111563533A (en) * | 2020-04-08 | 2020-08-21 | 华南理工大学 | Test subject classification method based on graph convolution neural network fusion of multiple human brain maps |
CN111916144A (en) * | 2020-07-27 | 2020-11-10 | 西安电子科技大学 | Protein classification method based on self-attention neural network and coarsening algorithm |
-
2020
- 2020-11-12 CN CN202011260336.3A patent/CN112364983B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522953A (en) * | 2018-11-13 | 2019-03-26 | 北京师范大学 | The method classified based on internet startup disk algorithm and CNN to graph structure data |
CN109977232A (en) * | 2019-03-06 | 2019-07-05 | 中南大学 | A kind of figure neural network visual analysis method for leading figure based on power |
CN110889015A (en) * | 2019-10-31 | 2020-03-17 | 天津工业大学 | Independent decoupling convolutional neural network characterization algorithm for graph data |
CN111563533A (en) * | 2020-04-08 | 2020-08-21 | 华南理工大学 | Test subject classification method based on graph convolution neural network fusion of multiple human brain maps |
CN111916144A (en) * | 2020-07-27 | 2020-11-10 | 西安电子科技大学 | Protein classification method based on self-attention neural network and coarsening algorithm |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113241114A (en) * | 2021-03-24 | 2021-08-10 | 辽宁大学 | LncRNA-protein interaction prediction method based on graph convolution neural network |
CN113053457A (en) * | 2021-03-25 | 2021-06-29 | 湖南大学 | Drug target prediction method based on multi-pass graph convolution neural network |
CN113539381A (en) * | 2021-07-16 | 2021-10-22 | 中国海洋大学 | Molecular dynamics result analysis method based on residue interaction and PEN |
CN113539381B (en) * | 2021-07-16 | 2023-09-05 | 中国海洋大学 | Molecular dynamics result analysis method based on residue interaction and PEN |
CN115312119A (en) * | 2022-10-09 | 2022-11-08 | 之江实验室 | Method and system for identifying protein structural domain based on protein three-dimensional structure image |
CN115312119B (en) * | 2022-10-09 | 2023-04-07 | 之江实验室 | Method and system for identifying protein structural domain based on protein three-dimensional structure image |
US11908140B1 (en) | 2022-10-09 | 2024-02-20 | Zhejiang Lab | Method and system for identifying protein domain based on protein three-dimensional structure image |
Also Published As
Publication number | Publication date |
---|---|
CN112364983B (en) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112364983B (en) | Protein interaction network node classification method based on multichannel graph convolutional neural network | |
WO2021023202A1 (en) | Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method | |
CN109918708B (en) | Material performance prediction model construction method based on heterogeneous ensemble learning | |
Lance et al. | A general theory of classificatory sorting strategies: II. Clustering systems | |
CN100595780C (en) | Handwriting digital automatic identification method based on module neural network SN9701 rectangular array | |
CN110516095A (en) | Weakly supervised depth Hash social activity image search method and system based on semanteme migration | |
CN112100514B (en) | Friend recommendation method based on global attention mechanism representation learning | |
CN112046489B (en) | Driving style identification algorithm based on factor analysis and machine learning | |
CN113221694B (en) | Action recognition method | |
CN113378938B (en) | Edge transform graph neural network-based small sample image classification method and system | |
CN116416561A (en) | Video image processing method and device | |
CN114896512A (en) | Learning resource recommendation method and system based on learner preference and group preference | |
CN112906747A (en) | Knowledge distillation-based image classification method | |
CN116911459A (en) | Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant | |
CN113177417A (en) | Trigger word recognition method based on hybrid neural network and multi-stage attention mechanism | |
CN115035341A (en) | Image recognition knowledge distillation method capable of automatically selecting student model structure | |
CN115730631A (en) | Method and device for federal learning | |
CN117854597A (en) | Track prediction method based on contrast learning feature dimension reduction | |
CN107122472A (en) | Extensive unstructured data extracting method, its system, DDM platform | |
CN112382347A (en) | Synergistic anti-cancer drug combination identification method based on molecular fingerprint and multi-target protein | |
CN103823843B (en) | Gauss mixture model tree and incremental clustering method thereof | |
CN112071362B (en) | Method for detecting protein complex fusing global and local topological structures | |
Feng | Self-generation fuzzy modeling systems through hierarchical recursive-based particle swarm optimization | |
Şener et al. | Deep Spectral Clustering of Single-Cell RNA-seq Data | |
Zhu et al. | Emotion Recognition in Learning Scenes Supported by Smart Classroom and Its Application. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |