CN112364983A - Protein interaction network node classification method based on multichannel graph convolutional neural network - Google Patents

Protein interaction network node classification method based on multichannel graph convolutional neural network Download PDF

Info

Publication number
CN112364983A
CN112364983A CN202011260336.3A CN202011260336A CN112364983A CN 112364983 A CN112364983 A CN 112364983A CN 202011260336 A CN202011260336 A CN 202011260336A CN 112364983 A CN112364983 A CN 112364983A
Authority
CN
China
Prior art keywords
channel
protein
neural network
protein interaction
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011260336.3A
Other languages
Chinese (zh)
Other versions
CN112364983B (en
Inventor
杨旭华
马钢峰
徐新黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011260336.3A priority Critical patent/CN112364983B/en
Publication of CN112364983A publication Critical patent/CN112364983A/en
Application granted granted Critical
Publication of CN112364983B publication Critical patent/CN112364983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computing Systems (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A protein interaction network node classification method based on a multi-channel graph convolution neural network improves classification effect by combining high-order information, a protein interaction network is constructed according to protein interaction data, a multi-channel graph convolution neural network model is constructed, the model comprises two layers of structures, different graph convolution kernel combinations are used, semi-supervised classification is completed on the basis of a small number of labeled protein data, and the category of non-labeled protein is obtained. According to the method, the high-order information of the protein interaction network is extracted through the combination of the multi-channel high-order neighborhood graph convolutional neural networks, and the classification precision of the protein is improved under the lower operation cost.

Description

Protein interaction network node classification method based on multichannel graph convolutional neural network
Technical Field
The invention relates to the field of protein classification, in particular to a protein interaction network node classification method based on a multi-channel graph convolutional neural network.
Background
Proteins are the material basis of life, and almost all components of the human body can not be separated from proteins, so that the proteins are hot spots of research for a long time. Proteins often interact with each other to participate in cell metabolism, gene expression regulation, and other life processes, and thus a protein interaction network is formed. The protein interaction network visualizes and describes the relationship existing between the proteins through the network, is helpful for research and analysis, and plays an important role in understanding the biological composition and some disease causes from the molecular level.
The graph convolution network aims at performing convolution analysis on irregular complex network data. In semi-supervised learning, graph convolution can obtain better classification performance through a few labeled training sets, and the training speed is high, so that the graph convolution method is widely applied to various network structure data sets. However, the feature is too smooth due to the aggregation of high-order neighborhood information, so that the ordinary graph volume network can only aggregate 2-3 order neighborhood feature information, but the connection between proteins in the protein interaction network is relatively tight, and only aggregation of low-order information is not enough. Meanwhile, the protein interaction network data is often huge and complex, so that the method is very necessary for capturing high-order neighborhood information and obtaining better protein classification performance under the condition of controlling the network depth, namely under the condition of less parameters.
Disclosure of Invention
In order to solve the problem of large deviation of classification results of the existing protein interaction network, the invention provides a protein interaction network node classification method based on a multi-channel graph convolutional neural network with high accuracy.
The invention solves the technical problem by adopting the specific technical steps that:
a protein interaction network node classification method based on a multichannel graph convolutional neural network comprises the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the amount of the protein,each protein initial characteristic vector is represented by a one-hot vector, an identity matrix X is the combination of all the protein initial characteristic vectors, C is the number of classes of the protein, and a small part of the known protein has class labels, and a large part of the protein does not have the class labels;
step two: constructing a multi-channel graph convolution neural network model, wherein the model comprises a two-layer structure, the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the ith channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Figure BDA0002774427720000021
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Figure BDA0002774427720000022
Wherein g is a softmax activation function;
step six: calculating loss values for semi-supervised classification
Figure BDA0002774427720000023
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
The technical conception of the invention is as follows: the invention is based on the shallow neural network, and combines different convolution arrangements by using multiple channels while aggregating high-order information, thereby effectively improving the classification performance of the protein in the protein interaction network and improving the classification accuracy.
The invention has the beneficial effects that: the protein interaction network is processed by the convolution information combination of the multi-channel high-order neighborhood graph, so that the classification precision of the protein is improved under the lower operation cost.
Drawings
Fig. 1 is a schematic diagram of a neural network model, and for convenience of understanding, k is set to 3, features are input to different channels for convolution, and the obtained results are accumulated and activated through two-layer graph convolution to finally obtain an output result.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a protein interaction network node classification method based on a multi-channel graph convolutional neural network includes the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the number of proteins, each protein initial characteristic vector is represented by one-hot vectors, the unit matrix X is the combination of all the protein initial characteristic vectors, C is the category number of the proteins, a small part of the proteins are known to have category labels, and most of the proteins do not have the category labels;
step two: construction ofA multi-channel graph convolution neural network model comprises a two-layer structure, wherein the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the i-th channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Figure BDA0002774427720000041
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Figure BDA0002774427720000042
Wherein g is a softmax activation function, and the model is shown in FIG. 1;
step six: calculating loss values for semi-supervised classification
Figure BDA0002774427720000043
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
As mentioned above, the present invention is made more clear by the specific implementation steps implemented in this patent. Any modification and variation of the present invention within the spirit of the present invention and the scope of the claims will fall within the scope of the present invention.

Claims (1)

1. A protein interaction network node classification method based on a multi-channel graph convolutional neural network is characterized by comprising the following steps: the method comprises the following steps:
the method comprises the following steps: constructing a protein interaction network model G (V, E) according to the protein interaction data, wherein V is a node, E is a connecting edge, the adjacency matrix is represented by A, one node represents one protein, and the node set V ═ V { (V) } is1,v2,...,vNDenotes the protein pool; if the two proteins have interaction, a connecting edge is arranged between the corresponding two nodes; n represents the number of proteins, each protein initial characteristic vector is represented by one-hot vectors, the unit matrix X is the combination of all the protein initial characteristic vectors, C is the category number of the proteins, a small part of the proteins are known to have category labels, and most of the proteins do not have the category labels;
step two: constructing a multi-channel graph convolution neural network model, wherein the model comprises a two-layer structure, the first layer is provided with k channels, and an i-th order convolution kernel SGC is used on the ith channeliI ∈ {1,2,..., k }; the second layer contains k three-dimensional convolution kernels, with the (k +1-j) -th order convolution kernel SGC being used on the jth channelk+1-jJ belongs to {1, 2.. multidata, k }, wherein the ith channel of the network model consists of the ith channel of the first layer and the ith channel of the second layer, and the output of the ith channel of the first layer is the input of the ith channel of the second layer;
step three: computing i order convolution kernels
Figure FDA0002774427710000011
Wherein GCN represents a graph convolution neural network without an activation function, wherein i is greater than or equal to 1 and less than or equal to k;
step four: computing the output of the ith channel of a multi-channel graph convolutional neural network model
y(i)=SGC(k+1-i)(f(SGCi(X,A)),A),
Wherein i is more than or equal to 1 and less than or equal to k, and f is a relu function;
step five: calculating model output of a multi-channel graph convolutional neural network model
Figure FDA0002774427710000012
Wherein g is a softmax activation function;
step six: calculating loss values for semi-supervised classification
Figure FDA0002774427710000013
Where μ is a set of labeled nodes, YijIs a node with a classification label;
step seven: and repeating the third step to the sixth step until the loss value is converged, and taking the obtained Q as a classification result of the protein interaction network.
CN202011260336.3A 2020-11-12 2020-11-12 Protein interaction network node classification method based on multichannel graph convolutional neural network Active CN112364983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011260336.3A CN112364983B (en) 2020-11-12 2020-11-12 Protein interaction network node classification method based on multichannel graph convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011260336.3A CN112364983B (en) 2020-11-12 2020-11-12 Protein interaction network node classification method based on multichannel graph convolutional neural network

Publications (2)

Publication Number Publication Date
CN112364983A true CN112364983A (en) 2021-02-12
CN112364983B CN112364983B (en) 2024-03-22

Family

ID=74515357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011260336.3A Active CN112364983B (en) 2020-11-12 2020-11-12 Protein interaction network node classification method based on multichannel graph convolutional neural network

Country Status (1)

Country Link
CN (1) CN112364983B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053457A (en) * 2021-03-25 2021-06-29 湖南大学 Drug target prediction method based on multi-pass graph convolution neural network
CN113241114A (en) * 2021-03-24 2021-08-10 辽宁大学 LncRNA-protein interaction prediction method based on graph convolution neural network
CN113539381A (en) * 2021-07-16 2021-10-22 中国海洋大学 Molecular dynamics result analysis method based on residue interaction and PEN
CN115312119A (en) * 2022-10-09 2022-11-08 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522953A (en) * 2018-11-13 2019-03-26 北京师范大学 The method classified based on internet startup disk algorithm and CNN to graph structure data
CN109977232A (en) * 2019-03-06 2019-07-05 中南大学 A kind of figure neural network visual analysis method for leading figure based on power
CN110889015A (en) * 2019-10-31 2020-03-17 天津工业大学 Independent decoupling convolutional neural network characterization algorithm for graph data
CN111563533A (en) * 2020-04-08 2020-08-21 华南理工大学 Test subject classification method based on graph convolution neural network fusion of multiple human brain maps
CN111916144A (en) * 2020-07-27 2020-11-10 西安电子科技大学 Protein classification method based on self-attention neural network and coarsening algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522953A (en) * 2018-11-13 2019-03-26 北京师范大学 The method classified based on internet startup disk algorithm and CNN to graph structure data
CN109977232A (en) * 2019-03-06 2019-07-05 中南大学 A kind of figure neural network visual analysis method for leading figure based on power
CN110889015A (en) * 2019-10-31 2020-03-17 天津工业大学 Independent decoupling convolutional neural network characterization algorithm for graph data
CN111563533A (en) * 2020-04-08 2020-08-21 华南理工大学 Test subject classification method based on graph convolution neural network fusion of multiple human brain maps
CN111916144A (en) * 2020-07-27 2020-11-10 西安电子科技大学 Protein classification method based on self-attention neural network and coarsening algorithm

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113241114A (en) * 2021-03-24 2021-08-10 辽宁大学 LncRNA-protein interaction prediction method based on graph convolution neural network
CN113053457A (en) * 2021-03-25 2021-06-29 湖南大学 Drug target prediction method based on multi-pass graph convolution neural network
CN113539381A (en) * 2021-07-16 2021-10-22 中国海洋大学 Molecular dynamics result analysis method based on residue interaction and PEN
CN113539381B (en) * 2021-07-16 2023-09-05 中国海洋大学 Molecular dynamics result analysis method based on residue interaction and PEN
CN115312119A (en) * 2022-10-09 2022-11-08 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image
CN115312119B (en) * 2022-10-09 2023-04-07 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image
US11908140B1 (en) 2022-10-09 2024-02-20 Zhejiang Lab Method and system for identifying protein domain based on protein three-dimensional structure image

Also Published As

Publication number Publication date
CN112364983B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN112364983B (en) Protein interaction network node classification method based on multichannel graph convolutional neural network
WO2021023202A1 (en) Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN109918708B (en) Material performance prediction model construction method based on heterogeneous ensemble learning
Lance et al. A general theory of classificatory sorting strategies: II. Clustering systems
CN100595780C (en) Handwriting digital automatic identification method based on module neural network SN9701 rectangular array
CN110516095A (en) Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN112100514B (en) Friend recommendation method based on global attention mechanism representation learning
CN112046489B (en) Driving style identification algorithm based on factor analysis and machine learning
CN113221694B (en) Action recognition method
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
CN116416561A (en) Video image processing method and device
CN114896512A (en) Learning resource recommendation method and system based on learner preference and group preference
CN112906747A (en) Knowledge distillation-based image classification method
CN116911459A (en) Multi-input multi-output ultra-short-term power load prediction method suitable for virtual power plant
CN113177417A (en) Trigger word recognition method based on hybrid neural network and multi-stage attention mechanism
CN115035341A (en) Image recognition knowledge distillation method capable of automatically selecting student model structure
CN115730631A (en) Method and device for federal learning
CN117854597A (en) Track prediction method based on contrast learning feature dimension reduction
CN107122472A (en) Extensive unstructured data extracting method, its system, DDM platform
CN112382347A (en) Synergistic anti-cancer drug combination identification method based on molecular fingerprint and multi-target protein
CN103823843B (en) Gauss mixture model tree and incremental clustering method thereof
CN112071362B (en) Method for detecting protein complex fusing global and local topological structures
Feng Self-generation fuzzy modeling systems through hierarchical recursive-based particle swarm optimization
Şener et al. Deep Spectral Clustering of Single-Cell RNA-seq Data
Zhu et al. Emotion Recognition in Learning Scenes Supported by Smart Classroom and Its Application.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant