CN116416478A - Bioinformatics classification model based on graph structure data characteristics - Google Patents

Bioinformatics classification model based on graph structure data characteristics Download PDF

Info

Publication number
CN116416478A
CN116416478A CN202310659097.6A CN202310659097A CN116416478A CN 116416478 A CN116416478 A CN 116416478A CN 202310659097 A CN202310659097 A CN 202310659097A CN 116416478 A CN116416478 A CN 116416478A
Authority
CN
China
Prior art keywords
graph
pooling
nodes
channel
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310659097.6A
Other languages
Chinese (zh)
Other versions
CN116416478B (en
Inventor
魏玉锌
王翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian University of Technology
Original Assignee
Fujian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian University of Technology filed Critical Fujian University of Technology
Priority to CN202310659097.6A priority Critical patent/CN116416478B/en
Publication of CN116416478A publication Critical patent/CN116416478A/en
Application granted granted Critical
Publication of CN116416478B publication Critical patent/CN116416478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a bioinformatics classification model based on graph structure data characteristics, which comprises a plurality of characteristic extraction layers stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer; the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning and fusing local topological structure information, global topological structure information and dependence information of features among nodes of image structure data; the pooled graph obtained by any preceding-stage feature extraction layer is formed into residual connection by the corresponding read-out layer extracted graph feature representation and the graph feature representation extracted by the last-stage feature extraction layer through the corresponding read-out layer, and then the full-connection layer outputs the prediction result of the bioinformatics classification. The invention can fuse the characteristic information of various graphs together, and can better generate the characteristic representation of the whole graph, so that the classification is more accurate.

Description

Bioinformatics classification model based on graph structure data characteristics
Technical Field
The invention relates to the technical field of data processing, in particular to a bioinformatics classification model based on graph structure data characteristics.
Background
In real life, there is a large amount of complex network data, such as social networks, knowledge maps, proteins, viruses, shopping networks, molecular compounds, etc., which can be abstracted into one graph. Graph structure data has a more complex structure and higher dimensionality than traditional data types, and therefore analysis and processing of the graph structure data is also more challenging. Deep learning also exhibits a strong learning ability in processing the graph structure data, so that in recent years, more and more researchers apply deep learning to the fields of graph structure data analysis and processing, such as the fields of recommendation systems, link prediction, graph classification, node classification, and the like.
The graph classification task is mainly applied to bioinformatics classification, including the fields of drug discovery, virus analysis, protein analysis, molecular compound analysis and the like. Unlike image classification, these complex network data present a large amount of topology information that has a great impact on generating a level representation of the entire graph. However, in the process of modeling a graph classification task, how to simultaneously capture characteristic information of graph data and generate graph-level representation is still a core problem of modeling research. In the prior art of modeling of the graph classification model, the graph classification model is concentrated on the modeling of topological structure information of the graph structure or the modeling of graph characteristic information, so that fusion modeling of various information in graph structure data is ignored to a great extent, better graph characteristic representation cannot be obtained, and accuracy of bioinformatics classification is affected.
Disclosure of Invention
The invention aims to solve the technical problem of providing a biological informatics classification model based on graph structure data characteristics, which is a graph neural network model based on characteristic fusion, and can simultaneously capture local topological structure information, global topological structure information and dependence information of long-distance nodes of a graph, fuse various graph characteristic information together and better generate characteristic representation of the whole graph.
In a first aspect, the invention provides a bioinformatics classification model based on graph structure data characteristics, which comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer, wherein the feature extraction layers are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
One or more technical solutions provided in the embodiments of the present invention at least have the following technical effects or advantages: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, and the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, and the biological informatics of the graph structure data features can be classified more accurately.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of the structure of a bioinformatics classification model according to the present invention;
FIG. 2 is a flow chart of the processing principle of the pooling layer of the present invention.
Detailed Description
According to the embodiment of the application, the graph neural network model based on feature fusion can capture local topological structure information, global topological structure information and dependence information of long-distance nodes of a graph at the same time by providing the bioinformatics classification model based on graph structure data features, so that various graph feature information is fused together, and feature representation of the whole graph is generated better.
According to the technical scheme in the embodiment of the application, the overall thought is as follows: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel, and the three-channel pooling module is respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, fusion modeling is carried out on various information in the graph structure data, and classification accuracy of bioinformatics is greatly improved.
Regarding the graph structure data, for the fields of drug discovery, virus analysis, protein analysis, molecular compound analysis and the like, the graph structure data corresponds to the molecular structure of bioinformatics, and comprises atoms and chemical bonds between atoms, wherein the atoms are nodes, and the chemical bonds are connecting sides. Therefore, the feature information of the graph structure data includes feature information of nodes and dependency information of features between nodes.
As shown in fig. 1, the present embodiment provides a bioinformatics classification model based on graph structure data features, which includes a plurality of feature extraction layers stacked step by step, a plurality of readout layers, and a full connection layer; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
The respective components of the bioinformatics classification model based on the characteristics of the map structural data are described in detail below.
The feature extraction layer is used for extracting feature information and topological structure information of the graph structure data and carrying out feature fusion. The bioinformatics classification model is provided with a plurality of feature extraction layers which are stacked step by step, wherein any feature extraction layer comprises a graph convolution layer and a graph pooling layer. The pooling layer in the feature extraction layer of the previous stage is connected with the scroll lamination layer in the feature extraction layer of the next stage.
Graph convolution layer: for aggregating the characteristic information of the node itself and surrounding neighbor nodes. For each node, it is considered that the node is affected by all its surrounding neighbor nodes and itself. The graph convolution neural network can aggregate the characteristic information of the node and surrounding neighbor nodes, and the propagation formula is as follows:
Figure SMS_1
and (5) a pooling layer: for capturing map feature information. The capture of graph feature information is a key task to the graph classification task. The pooling layer can effectively capture the topological structure, deep node characteristics and other information of the graph structure data. As shown in fig. 2, the pooling layer of the present invention is an end-to-end three-channel pooling graph neural network model, which mainly comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel; the feature fusion module is composed of a cross-channel convolution module and an aggregation module.
Three-channel pooling module:
the first channel, namely the transform pooling channel, is based on TOP-K pooling model to capture the dependency information of the characteristics between nodes, namely the node long distance dependency information, the score calculation mode of TOP-K pooling model is obtained by transforming the graph, and the calculation formula is as follows:
Figure SMS_2
Figure SMS_3
the feature matrix is obtained by transforming the node feature X after passing through a transducer module, and the feature matrix is used as a feature for calculation;
Figure SMS_4
is a matrix of learnable parameters for learning the impact of each feature dimension of a node on the overall characteristics of the node,
Figure SMS_5
representing a matrix of real numbers,
Figure SMS_6
representing node feature dimensions;
Figure SMS_7
is the score of all nodes of the graph structure data,
Figure SMS_8
is the score of each node of the graph structure data,
Figure SMS_9
representing a matrix of real numbers,
Figure SMS_10
representing the number of original nodes;
Figure SMS_11
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the ordering
Figure SMS_12
Of individual nodes
Figure SMS_13
As a reserved node
Figure SMS_14
The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes areDiscarding;
after the rest nodes are discarded, the characteristic information of the discarded nodes is converged to the reserved nodes according to a certain proportion, and the specific formula is as follows:
Figure SMS_15
wherein:
Figure SMS_16
is the node discarded
Figure SMS_17
Figure SMS_18
Is an aggregation matrix of the feature information of the discarded nodes, the feature information of the discarded nodes is aggregated along the edges of the graph structure data to the top of the feature information of the reserved nodes,
Figure SMS_19
representing a matrix of real numbers,
Figure SMS_20
indicating the number of reserved nodes,
Figure SMS_21
representing the number of original nodes;
Figure SMS_22
and generating a node characteristic matrix after the channel is subjected to transform pooling.
And a second channel, namely the differential pooling channel, is used for capturing global topological structure information and generating a roughened sub-graph. The invention designs a graph clustering algorithm, which uses a graph convolution neural network to learn a soft distribution matrix for generating a coarsened graph
Figure SMS_23
Distribution matrix
Figure SMS_24
Is generated by the following formula:
Figure SMS_25
wherein:
Figure SMS_26
representing the transposed symbol.
The third channel, namely the graph rolling pooling channel, is used for capturing local topological structure information in a graph, and the graph rolling pooling channel is a node voting type graph pooling method based on a graph rolling neural network, wherein the node voting type graph pooling method captures the local topological structure information among nodes in graph structure data by using the graph rolling neural network, and the node score calculating mode is as follows:
Figure SMS_27
wherein:
Figure SMS_28
is a learnable parameter for learning the influence of each feature of a node of the graph structure data on the overall feature of the node,
Figure SMS_29
representing a matrix of real numbers,
Figure SMS_30
representing node feature dimensions;
Figure SMS_31
is the score of all nodes of the graph structure data,
Figure SMS_32
Figure SMS_33
Figure SMS_34
is the graph structure data eachThe node pools the scores of the channels at the graph volume,
Figure SMS_35
representing a matrix of real numbers,
Figure SMS_36
representing the number of original nodes;
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the ordering
Figure SMS_37
Personal node
Figure SMS_38
Of individual nodes
Figure SMS_39
As a reserve node
Figure SMS_40
The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after discarding the node, the characteristic information of the discarded node is converged to the reserved node according to a certain proportion, and the specific formula is as follows:
Figure SMS_41
wherein:
Figure SMS_42
is the node discarded
Figure SMS_43
Figure SMS_44
Is an aggregate matrix of the characteristic information of the discarded nodes,
Figure SMS_45
representing a matrix of real numbers,
Figure SMS_46
representing the number of original nodes and,
Figure SMS_47
representing the number of original nodes;
Figure SMS_48
the node characteristic matrix is generated after the path is pooled through graph convolution.
And a feature fusion module: the system comprises a cross-channel convolution module and an aggregation module;
the cross-channel convolution module adopts a cross-channel convolution method to fuse the dependency information of the characteristics among the nodes of the transform pooling channel and the global topology information of the differential pooling channel together, and fuses the local topology information of the graph convolution pooling channel and the global topology information of the differential pooling channel together, so that two cross-channel aggregation pooling graphs are obtained, and the cross-channel convolution method has the following formula:
wherein:
Figure SMS_49
Included
Figure SMS_50
and
Figure SMS_51
,
Figure SMS_52
reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the Transformer pooling channel;
Figure SMS_53
reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the graph rolling pooling channel;
Figure SMS_54
Included
Figure SMS_55
and
Figure SMS_56
,
Figure SMS_57
is a node characteristic matrix generated after a channel is subjected to transform pooling,
Figure SMS_58
the node characteristic matrix is generated after the path is pooled through graph convolution;
Figure SMS_59
is a node characteristic matrix generated after passing through the differential pooling channel;
Figure SMS_60
representing
Figure SMS_61
To the direction of
Figure SMS_62
A conversion matrix for conversion, wherein
Figure SMS_63
Representing a matrix of real numbers,
Figure SMS_64
a number of nodes representing graph structure data generated in the Transformer pooling pass or the graph convolution pooling pass,
Figure SMS_65
a number of nodes representing graph structure data generated by the differential pooling channel;
Figure SMS_66
wherein the method comprises the steps of
Figure SMS_67
Is a soft distribution matrix learned by a graph neural network in a differential pooling channel;
after the operation, two cross-channel aggregation pooling graphs are provided, and in order to aggregate the information of the two cross-channel aggregation pooling graphs, the invention designs an aggregation module.
The aggregation module represents the index of reserved nodes in the Transformer pooling channel as
Figure SMS_68
The reserved node index in the graph roll pooling channel is expressed as
Figure SMS_69
The method comprises the steps of carrying out a first treatment on the surface of the Taking the average value of node characteristics existing in both the transform pooling channel and the graph convolution pooling channel as the characteristic of a new node, and taking the characteristic of the node existing in only the transform pooling channel or the graph convolution pooling channel as the characteristic of the new node; the new node is a node of the graph structure data processed by the aggregation module, and the specific formula is as follows:
Figure SMS_70
extracting a sub-graph consisting of most representative nodes of the original graph structure data by indexes, wherein an adjacency matrix is expressed as follows:
Figure SMS_71
a readout layer for extracting a graph feature representation of each of the pooled graphs using a readout function, the readout function being:
Figure SMS_72
wherein the method comprises the steps of
Figure SMS_73
A feature representation representing the pooling graph,
Figure SMS_74
representing the characteristic dimension of the node,
Figure SMS_75
representing the number of nodes of the pooling graph.
The pooling layer of each feature extraction layer is connected with a reading layer, and the pooling image obtained by the feature extraction layer at any previous stage is connected with the image feature representation extracted by the feature extraction layer at the last stage through the corresponding image feature representation extracted by the feature extraction layer at the last stage after the image feature representation is represented by the corresponding reading layer, so that the phenomena of over-smoothing and over-fitting of the model can be relieved.
The full-connection layer comprises a full-connection layer and an activation function, adopts a multi-layer perceptron as a classifier and classifies the input graph characteristic representation, and the formula is as follows:
Figure SMS_76
wherein the method comprises the steps of
Figure SMS_77
Is a bioinformatics category of graph structure data prediction.
And finally obtaining a final output prediction result through a full-connection layer based on the bioinformatics classification model of the graph structure data characteristics.
The following illustrates the implementation of a bioinformatics classification model based on the characteristics of the graph structure data, which includes the following steps:
s1, preprocessing graph data: the common protein dataset DD and the biomedical dataset NCI1 were selected for model verification use according to 7:1.5: the data set is divided into three parts of a training set, a verification set and a test set according to the proportion of 1.5, and standardized processing is carried out uniformly.
S2, establishing a graph neural network model, and in order to complete the extraction of the characteristics and topological structure information of graph data and realize the prediction task of protein attributes, adopting a layered pooling structure on a model structure, namely, carrying out layered sampling on the graph structure data, reducing the number of nodes in each layer, converging the node characteristics of each layer, and finally obtaining an integral characteristic vector representation.
The FIPool model is formed by stacking a plurality of feature extraction layers, wherein the feature extraction layers comprise a graph convolution layer and a graph pooling layer so as to extract and fuse features of graph data. The three-channel map pooling layer is designed for extracting characteristic information and topological structure information of the map structure data, so that local topological structure information, global topological structure information and dependency information of characteristics among nodes of the map structure data can be effectively learned; the pooling layer also comprises a feature fusion module which can effectively aggregate the feature information of different channels in a convolution calculation mode between the different channels.
The feature extraction layer is connected with the final output through the reading layer except the final one to form residual connection, and the full connection layer is used for outputting the classification prediction result.
S3, model training and parameter tuning: for a sequence of input graph structure data
Figure SMS_78
After S1, inputting the result into the model constructed in S2, and finally outputting the classification prediction result through the last full connection layer of the model
Figure SMS_79
. In the whole model training process, the super-parameter combination which enables the model to perform best on the test data set is finally searched by adjusting the loss function, the optimizer function and the learnable super-parameters, and the model is built.
S4, analyzing and evaluating the performance of the graph neural network model: and comparing and analyzing the established model with a plurality of reference models by taking the set evaluation indexes as standards, and verifying the performance of the evaluation models.
The method, the device, the system, the equipment and the medium provided by the embodiment of the application have at least the following technical effects or advantages: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, and the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, and the biological informatics of the graph structure data features can be classified more accurately.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims (7)

1. The bioinformatics classification model based on the graph structure data characteristics is characterized in that: the device comprises a plurality of feature extraction layers stacked step by step, a plurality of reading layers and a full connection layer; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
2. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the transducer pooling channel captures the dependency information of the characteristics among nodes based on a TOP-K pooling model, the score calculation mode of the TOP-K pooling model is obtained by converting a graph by the transducer, and the calculation formula is as follows:
Figure QLYQS_1
wherein:
Figure QLYQS_2
the feature matrix is obtained by transforming the node feature X after passing through a transducer module, and the feature matrix is used as a feature for calculation;
Figure QLYQS_3
is a matrix of learnable parameters for learning the impact of each feature dimension of a node on the overall characteristics of the node,
Figure QLYQS_4
representing a matrix of real numbers,
Figure QLYQS_5
representing node feature dimensions;
Figure QLYQS_6
is the score of all nodes of the graph structure data,
Figure QLYQS_7
is the score of each node of the graph structure data,
Figure QLYQS_8
representing a matrix of real numbers,
Figure QLYQS_9
representing the number of original nodes;
Figure QLYQS_10
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the ordering
Figure QLYQS_11
Of individual nodes
Figure QLYQS_12
As a reserved node
Figure QLYQS_13
The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after the rest nodes are discarded, the characteristic information of the discarded nodes is converged to the reserved nodes according to a certain proportion, and the specific formula is as follows:
Figure QLYQS_14
wherein:
Figure QLYQS_15
is the node discarded
Figure QLYQS_16
Figure QLYQS_17
Is an aggregation matrix of the characteristic information of the discarded nodes, and the characteristic information of the discarded nodes is aggregated to the reserved nodes along the edges of the graph structure dataIn the light of the above-mentioned features,
Figure QLYQS_18
representing a matrix of real numbers,
Figure QLYQS_19
indicating the number of reserved nodes,
Figure QLYQS_20
representing the number of original nodes;
Figure QLYQS_21
is a node characteristic matrix generated after the channel is subjected to transform pooling.
3. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the differential pooling channel learns soft allocation matrices using a graph rolling neural network for generating coarsened graphs
Figure QLYQS_24
Distribution matrix
Figure QLYQS_25
Is generated by the following formula:
Figure QLYQS_28
wherein:
Figure QLYQS_23
representing an adjacency matrix;
Figure QLYQS_27
representing a node feature matrix; in obtaining the distribution matrix
Figure QLYQS_30
Thereafter, the pattern is coarsened
Figure QLYQS_32
Feature matrix of (a)
Figure QLYQS_22
And adjacent matrix
Figure QLYQS_26
Is generated by the following formula:
Figure QLYQS_29
wherein:
Figure QLYQS_31
representing the transposed mathematical symbol.
4. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the graph rolling pooling channel is a node voting type graph pooling method based on a graph rolling neural network, the node voting type graph pooling method captures local topological structure information among nodes in graph structure data by using the graph rolling neural network, and the node score calculation mode is as follows:
Figure QLYQS_33
wherein:
Figure QLYQS_34
is a learnable parameter for learning the influence of each feature of a node of the graph structure data on the overall feature of the node,
Figure QLYQS_35
representing a matrix of real numbers,
Figure QLYQS_36
representing node feature dimensions;
Figure QLYQS_37
is the score of all nodes of the graph structure data,
Figure QLYQS_38
is the score of each node of the graph structure data at the graph convolution pooling channel,
Figure QLYQS_39
representing a matrix of real numbers,
Figure QLYQS_40
representing the number of original nodes;
Figure QLYQS_41
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the ordering
Figure QLYQS_42
Personal node
Figure QLYQS_43
Of individual nodes
Figure QLYQS_44
As a reserve node
Figure QLYQS_45
The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after discarding the node, the characteristic information of the discarded node is converged to the reserved node according to a certain proportion, and the specific formula is as follows:
Figure QLYQS_46
wherein:
Figure QLYQS_47
is the node discarded
Figure QLYQS_48
Figure QLYQS_49
Is an aggregate matrix of the characteristic information of the discarded nodes,
Figure QLYQS_50
representing a matrix of real numbers,
Figure QLYQS_51
representing the number of original nodes and,
Figure QLYQS_52
representing the number of reserved nodes;
Figure QLYQS_53
the node characteristic matrix is generated after the path is pooled through graph convolution.
5. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the feature fusion module comprises a cross-channel convolution module and an aggregation module;
the cross-channel convolution module adopts a cross-channel convolution method to fuse the dependency information of the characteristics among the nodes of the transform pooling channel and the global topology information of the differential pooling channel together, and fuses the local topology information of the graph convolution pooling channel and the global topology information of the differential pooling channel together, so that two cross-channel aggregation pooling graphs are obtained, and the cross-channel convolution method has the following formula:
Figure QLYQS_54
wherein:
Figure QLYQS_55
Included
Figure QLYQS_56
and
Figure QLYQS_57
,
Figure QLYQS_58
reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the Transformer pooling channel;
Figure QLYQS_59
reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the graph rolling pooling channel;
Figure QLYQS_60
representing an activation function;
Figure QLYQS_61
Included
Figure QLYQS_62
and
Figure QLYQS_63
,
Figure QLYQS_64
is a node characteristic matrix generated after a channel is subjected to transform pooling,
Figure QLYQS_65
the node characteristic matrix is generated after the path is pooled through graph convolution;
Figure QLYQS_66
is a node characteristic matrix generated after passing through the differential pooling channel;
Figure QLYQS_67
representing
Figure QLYQS_68
To the direction of
Figure QLYQS_69
A conversion matrix for conversion, wherein
Figure QLYQS_70
Representing a matrix of real numbers,
Figure QLYQS_71
a number of nodes representing graph structure data generated in the Transformer pooling pass or the graph convolution pooling pass,
Figure QLYQS_72
a number of nodes representing graph structure data generated by the differential pooling channel;
Figure QLYQS_73
wherein the method comprises the steps of
Figure QLYQS_74
Is a soft distribution matrix learned by a graph neural network in a differential pooling channel;
the aggregation module represents the index of reserved nodes in the Transformer pooling channel as
Figure QLYQS_75
The reserved node index in the graph roll pooling channel is expressed as
Figure QLYQS_76
The method comprises the steps of carrying out a first treatment on the surface of the Taking the average value of node characteristics existing in both the transform pooling channel and the graph convolution pooling channel as the characteristic of a new node, and taking the characteristic of the node existing in only the transform pooling channel or the graph convolution pooling channel as the characteristic of the new node; the specific formula is as follows:
Figure QLYQS_77
extracting a sub-graph consisting of most representative nodes of the original graph structure data by indexes, wherein an adjacency matrix is expressed as follows:
Figure QLYQS_78
wherein: />
Figure QLYQS_79
Representing an adjacency matrix for extracting sub-graphs consisting of the most representative nodes of the original graph structure data by indexing;
Figure QLYQS_80
representing the number of nodes to be reserved;
Figure QLYQS_81
is the total number of nodes in the original graph structure data;
the pooling graph is then generated using the following two formulas:
Figure QLYQS_82
wherein the method comprises the steps of
Figure QLYQS_83
Figure QLYQS_84
Is a node characteristic of the aggregated graph structure data,
Figure QLYQS_85
is an adjacency matrix of aggregated graph structure data.
6. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the readout layer extracts a graph feature representation of each of the pooled graphs using a readout function, the readout function being:
Figure QLYQS_86
wherein the method comprises the steps of
Figure QLYQS_87
A feature representation representing the pooling graph,
Figure QLYQS_88
representing the characteristic dimension of the node,
Figure QLYQS_89
representing the number of nodes of the pooling graph.
7. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the full-connection layer adopts a multi-layer perceptron as a classifier to classify the input graph characteristic representation, and the formula is as follows
Figure QLYQS_90
Wherein->
Figure QLYQS_91
Is a bioinformatics category of graph structure data prediction.
CN202310659097.6A 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics Active CN116416478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310659097.6A CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310659097.6A CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Publications (2)

Publication Number Publication Date
CN116416478A true CN116416478A (en) 2023-07-11
CN116416478B CN116416478B (en) 2023-09-26

Family

ID=87059631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310659097.6A Active CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Country Status (1)

Country Link
CN (1) CN116416478B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821452A (en) * 2023-08-28 2023-09-29 南京邮电大学 Graph node classification model training method and graph node classification method
CN117688425A (en) * 2023-12-07 2024-03-12 重庆大学 Multi-task graph classification model construction method and system for Non-IID graph data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211685A (en) * 2019-06-10 2019-09-06 珠海上工医信科技有限公司 Sugar network screening network structure model based on complete attention mechanism
CN110993037A (en) * 2019-10-28 2020-04-10 浙江工业大学 Protein activity prediction device based on multi-view classification model
US20220101040A1 (en) * 2020-09-30 2022-03-31 Fujitsu Limited Device and method for classification using classification model and computer readable storage medium
CN114693971A (en) * 2022-03-29 2022-07-01 深圳市大数据研究院 Classification prediction model generation method, classification prediction method, system and platform
CN115618927A (en) * 2022-11-17 2023-01-17 中国人民解放军陆军防化学院 Gas type identification method based on time sequence-graph fusion neural network
CN116127353A (en) * 2022-12-28 2023-05-16 马上消费金融股份有限公司 Classification method, classification model training method, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211685A (en) * 2019-06-10 2019-09-06 珠海上工医信科技有限公司 Sugar network screening network structure model based on complete attention mechanism
CN110993037A (en) * 2019-10-28 2020-04-10 浙江工业大学 Protein activity prediction device based on multi-view classification model
US20220101040A1 (en) * 2020-09-30 2022-03-31 Fujitsu Limited Device and method for classification using classification model and computer readable storage medium
CN114693971A (en) * 2022-03-29 2022-07-01 深圳市大数据研究院 Classification prediction model generation method, classification prediction method, system and platform
CN115618927A (en) * 2022-11-17 2023-01-17 中国人民解放军陆军防化学院 Gas type identification method based on time sequence-graph fusion neural network
CN116127353A (en) * 2022-12-28 2023-05-16 马上消费金融股份有限公司 Classification method, classification model training method, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821452A (en) * 2023-08-28 2023-09-29 南京邮电大学 Graph node classification model training method and graph node classification method
CN116821452B (en) * 2023-08-28 2023-11-14 南京邮电大学 Graph node classification model training method and graph node classification method
CN117688425A (en) * 2023-12-07 2024-03-12 重庆大学 Multi-task graph classification model construction method and system for Non-IID graph data

Also Published As

Publication number Publication date
CN116416478B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
Xinyi et al. Capsule graph neural network
CN116416478B (en) Bioinformatics classification model based on graph structure data characteristics
CN112508085B (en) Social network link prediction method based on perceptual neural network
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN112861967B (en) Social network abnormal user detection method and device based on heterogeneous graph neural network
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN110569814B (en) Video category identification method, device, computer equipment and computer storage medium
CN112199536A (en) Cross-modality-based rapid multi-label image classification method and system
CN116628597B (en) Heterogeneous graph node classification method based on relationship path attention
CN111178319A (en) Video behavior identification method based on compression reward and punishment mechanism
CN113283902B (en) Multichannel blockchain phishing node detection method based on graphic neural network
CN112801063B (en) Neural network system and image crowd counting method based on neural network system
CN112381179A (en) Heterogeneous graph classification method based on double-layer attention mechanism
CN112036445A (en) Cross-social-network user identity recognition method based on neural tensor network
CN113255895A (en) Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
CN112311608B (en) Multilayer heterogeneous network space node characterization method
CN115983984A (en) Multi-model fusion client risk rating method
CN113095948A (en) Multi-source heterogeneous network user alignment method based on graph neural network
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN116206327A (en) Image classification method based on online knowledge distillation
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN110796182A (en) Bill classification method and system for small amount of samples
CN114265954B (en) Graph representation learning method based on position and structure information
CN110633394A (en) Graph compression method based on feature enhancement
CN115601745A (en) Multi-view three-dimensional object identification method facing application end

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant