CN116416478B - Bioinformatics classification model based on graph structure data characteristics - Google Patents

Bioinformatics classification model based on graph structure data characteristics Download PDF

Info

Publication number
CN116416478B
CN116416478B CN202310659097.6A CN202310659097A CN116416478B CN 116416478 B CN116416478 B CN 116416478B CN 202310659097 A CN202310659097 A CN 202310659097A CN 116416478 B CN116416478 B CN 116416478B
Authority
CN
China
Prior art keywords
graph
pooling
nodes
channel
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310659097.6A
Other languages
Chinese (zh)
Other versions
CN116416478A (en
Inventor
魏玉锌
王翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian University of Technology
Original Assignee
Fujian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian University of Technology filed Critical Fujian University of Technology
Priority to CN202310659097.6A priority Critical patent/CN116416478B/en
Publication of CN116416478A publication Critical patent/CN116416478A/en
Application granted granted Critical
Publication of CN116416478B publication Critical patent/CN116416478B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioethics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a bioinformatics classification model based on graph structure data characteristics, which comprises a plurality of characteristic extraction layers stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer; the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning and fusing local topological structure information, global topological structure information and dependence information of features among nodes of image structure data; the pooled graph obtained by any preceding-stage feature extraction layer is formed into residual connection by the corresponding read-out layer extracted graph feature representation and the graph feature representation extracted by the last-stage feature extraction layer through the corresponding read-out layer, and then the full-connection layer outputs the prediction result of the bioinformatics classification. The application can fuse the characteristic information of various graphs together, and can better generate the characteristic representation of the whole graph, so that the classification is more accurate.

Description

Bioinformatics classification model based on graph structure data characteristics
Technical Field
The application relates to the technical field of data processing, in particular to a bioinformatics classification model based on graph structure data characteristics.
Background
In real life, there is a large amount of complex network data, such as social networks, knowledge maps, proteins, viruses, shopping networks, molecular compounds, etc., which can be abstracted into one graph. Graph structure data has a more complex structure and higher dimensionality than traditional data types, and therefore analysis and processing of the graph structure data is also more challenging. Deep learning also exhibits a strong learning ability in processing the graph structure data, so that in recent years, more and more researchers apply deep learning to the fields of graph structure data analysis and processing, such as the fields of recommendation systems, link prediction, graph classification, node classification, and the like.
The graph classification task is mainly applied to bioinformatics classification, including the fields of drug discovery, virus analysis, protein analysis, molecular compound analysis and the like. Unlike image classification, these complex network data present a large amount of topology information that has a great impact on generating a level representation of the entire graph. However, in the process of modeling a graph classification task, how to simultaneously capture characteristic information of graph data and generate graph-level representation is still a core problem of modeling research. In the prior art of modeling of the graph classification model, the graph classification model is concentrated on the modeling of topological structure information of the graph structure or the modeling of graph characteristic information, so that fusion modeling of various information in graph structure data is ignored to a great extent, better graph characteristic representation cannot be obtained, and accuracy of bioinformatics classification is affected.
Disclosure of Invention
The application aims to solve the technical problem of providing a biological informatics classification model based on graph structure data characteristics, which is a graph neural network model based on characteristic fusion, and can simultaneously capture local topological structure information, global topological structure information and dependence information of long-distance nodes of a graph, fuse various graph characteristic information together and better generate characteristic representation of the whole graph.
In a first aspect, the application provides a bioinformatics classification model based on graph structure data characteristics, which comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer, wherein the feature extraction layers are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
One or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, and the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, and the biological informatics of the graph structure data features can be classified more accurately.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
The application will be further described with reference to examples of embodiments with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of the structure of a bioinformatics classification model according to the present application;
FIG. 2 is a flow chart of the processing principle of the pooling layer of the present application.
Detailed Description
The embodiment of the application provides a biological informatics classification model based on graph structure data characteristics, which is a graph neural network model based on characteristic fusion, and can simultaneously capture local topological structure information, global topological structure information and dependence information of long-distance nodes of a graph, fuse various graph characteristic information together and better generate characteristic representation of the whole graph.
The technical scheme in the embodiment of the application has the following overall thought: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel, and the three-channel pooling module is respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, fusion modeling is carried out on various information in the graph structure data, and classification accuracy of bioinformatics is greatly improved.
Regarding the graph structure data, for the fields of drug discovery, virus analysis, protein analysis, molecular compound analysis and the like, the graph structure data corresponds to the molecular structure of bioinformatics, and comprises atoms and chemical bonds between atoms, wherein the atoms are nodes, and the chemical bonds are connecting sides. Therefore, the feature information of the graph structure data includes feature information of nodes and dependency information of features between nodes.
As shown in fig. 1, the present embodiment provides a bioinformatics classification model based on graph structure data features, which includes a plurality of feature extraction layers stacked step by step, a plurality of readout layers, and a full connection layer; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
The respective components of the bioinformatics classification model based on the characteristics of the map structural data are described in detail below.
The feature extraction layer is used for extracting feature information and topological structure information of the graph structure data and carrying out feature fusion. The bioinformatics classification model is provided with a plurality of feature extraction layers which are stacked step by step, wherein any feature extraction layer comprises a graph convolution layer and a graph pooling layer. The pooling layer in the feature extraction layer of the previous stage is connected with the scroll lamination layer in the feature extraction layer of the next stage.
Graph convolution layer: for aggregating the characteristic information of the node itself and surrounding neighbor nodes. For each node, it is considered that the node is affected by all its surrounding neighbor nodes and itself. The graph convolution neural network can aggregate the characteristic information of the node and surrounding neighbor nodes, and the propagation formula is as follows:
and (5) a pooling layer: for capturing map feature information. The capture of graph feature information is a key task to the graph classification task. The pooling layer can effectively capture the topological structure, deep node characteristics and other information of the graph structure data. As shown in fig. 2, the pooling layer of the present application is an end-to-end three-channel pooling graph neural network model, which mainly comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel; the feature fusion module is composed of a cross-channel convolution module and an aggregation module.
Three-channel pooling module:
the first channel, namely the transform pooling channel, is based on TOP-K pooling model to capture the dependency information of the characteristics between nodes, namely the node long distance dependency information, the score calculation mode of TOP-K pooling model is obtained by transforming the graph, and the calculation formula is as follows:
the feature matrix is obtained by transforming the node feature X after passing through a transducer module, and the feature matrix is used as a feature for calculation;
is a matrix of learnable parameters for learning the impact of each feature dimension of a node on the overall characteristics of the node,representing a matrix of real numbers,representing node feature dimensions;
is the number of the graph structuresAccording to the scores of all the nodes,is the score of each node of the graph structure data,representing a matrix of real numbers,representing the number of original nodes;
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the orderingOf individual nodesAs a reserved nodeThe reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after the rest nodes are discarded, the characteristic information of the discarded nodes is converged to the reserved nodes according to a certain proportion, and the specific formula is as follows:wherein:
is the node discarded
Is an aggregation matrix of the characteristic information of the discarded nodes, and the characteristic information of the discarded nodes is a graph-following structureEdges of the data are aggregated over the retained node characteristic information,representing a matrix of real numbers,indicating the number of reserved nodes,representing the number of original nodes;
and generating a node characteristic matrix after the channel is subjected to transform pooling.
And a second channel, namely the differential pooling channel, is used for capturing global topological structure information and generating a roughened sub-graph. The application designs a graph clustering algorithm, which uses a graph convolution neural network to learn a soft distribution matrix for generating a coarsened graphDistribution matrixIs generated by the following formula:
wherein:representing the transposed symbol.
The third channel, namely the graph rolling pooling channel, is used for capturing local topological structure information in a graph, and the graph rolling pooling channel is a node voting type graph pooling method based on a graph rolling neural network, wherein the node voting type graph pooling method captures the local topological structure information among nodes in graph structure data by using the graph rolling neural network, and the node score calculating mode is as follows:
wherein:
is a learnable parameter for learning the influence of each feature of a node of the graph structure data on the overall feature of the node,representing a matrix of real numbers,representing node feature dimensions;
is the score of all nodes of the graph structure data, is the score of each node of the graph structure data at the graph convolution pooling channel,representing a matrix of real numbers,representing the number of original nodes;
the nodes are ordered according to the scores calculated based on the node long-distance dependency information, and the highest score is taken out after the orderingOf individual nodesAs a reserve nodeThe reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after discarding the node, the characteristic information of the discarded node is converged to the reserved node according to a certain proportion, and the specific formula is as follows:
wherein:
is the node discarded
Is an aggregate matrix of the characteristic information of the discarded nodes,representing a matrix of real numbers,representing the number of original nodes and,representing the number of original nodes;
the node characteristic matrix is generated after the path is pooled through graph convolution.
And a feature fusion module: the system comprises a cross-channel convolution module and an aggregation module;
the cross-channel convolution module adopts a cross-channel convolution method to fuse the dependency information of the characteristics among the nodes of the transform pooling channel and the global topology information of the differential pooling channel together, and fuses the local topology information of the graph convolution pooling channel and the global topology information of the differential pooling channel together, so that two cross-channel aggregation pooling graphs are obtained, and the cross-channel convolution method has the following formula:
wherein:
Includedand,reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the Transformer pooling channel;reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the graph rolling pooling channel;
Includedand,is a node characteristic matrix generated after a channel is subjected to transform pooling,the node characteristic matrix is generated after the path is pooled through graph convolution;
is a node characteristic matrix generated after passing through the differential pooling channel;
representingTo the direction ofA conversion matrix for conversion, whereinRepresenting a matrix of real numbers,a number of nodes representing graph structure data generated in the Transformer pooling pass or the graph convolution pooling pass,a number of nodes representing graph structure data generated by the differential pooling channel;
wherein the method comprises the steps ofIs a soft distribution matrix learned by a graph neural network in a differential pooling channel;
after the operation, two cross-channel aggregation pooling graphs are provided, and in order to aggregate the information of the two cross-channel aggregation pooling graphs, the application designs an aggregation module.
The aggregation module represents the index of reserved nodes in the Transformer pooling channel asThe reserved node index in the graph roll pooling channel is expressed asThe method comprises the steps of carrying out a first treatment on the surface of the Taking the average value of node characteristics existing in both the transform pooling channel and the graph convolution pooling channel as the characteristic of a new node, the average value only exists in the transform poolThe characteristics of the nodes in the path or the path of the graph convolution pool are used as the characteristics of the new nodes; the new node is a node of the graph structure data processed by the aggregation module, and the specific formula is as follows:
extracting a sub-graph consisting of most representative nodes of the original graph structure data by indexes, wherein an adjacency matrix is expressed as follows:
a readout layer for extracting a graph feature representation of each of the pooled graphs using a readout function, the readout function being:
wherein the method comprises the steps ofA feature representation representing the pooling graph,representing the characteristic dimension of the node,representing the number of nodes of the pooling graph.
The pooling layer of each feature extraction layer is connected with a reading layer, and the pooling image obtained by the feature extraction layer at any previous stage is connected with the image feature representation extracted by the feature extraction layer at the last stage through the corresponding image feature representation extracted by the feature extraction layer at the last stage after the image feature representation is represented by the corresponding reading layer, so that the phenomena of over-smoothing and over-fitting of the model can be relieved.
The full-connection layer comprises a full-connection layer and an activation function, adopts a multi-layer perceptron as a classifier and classifies the input graph characteristic representation, and the formula is as follows:
wherein the method comprises the steps ofIs a bioinformatics category of graph structure data prediction.
And finally obtaining a final output prediction result through a full-connection layer based on the bioinformatics classification model of the graph structure data characteristics.
The following illustrates the implementation of a bioinformatics classification model based on the characteristics of the graph structure data, which includes the following steps:
s1, preprocessing graph data: the common protein dataset DD and the biomedical dataset NCI1 were selected for model verification use according to 7:1.5: the data set is divided into three parts of a training set, a verification set and a test set according to the proportion of 1.5, and standardized processing is carried out uniformly.
S2, establishing a graph neural network model, and in order to complete the extraction of the characteristics and topological structure information of graph data and realize the prediction task of protein attributes, adopting a layered pooling structure on a model structure, namely, carrying out layered sampling on the graph structure data, reducing the number of nodes in each layer, converging the node characteristics of each layer, and finally obtaining an integral characteristic vector representation.
The FIPool model is formed by stacking a plurality of feature extraction layers, wherein the feature extraction layers comprise a graph convolution layer and a graph pooling layer so as to extract and fuse features of graph data. The three-channel map pooling layer is designed for extracting characteristic information and topological structure information of the map structure data, so that local topological structure information, global topological structure information and dependency information of characteristics among nodes of the map structure data can be effectively learned; the pooling layer also comprises a feature fusion module which can effectively aggregate the feature information of different channels in a convolution calculation mode between the different channels.
The feature extraction layer is connected with the final output through the reading layer except the final one to form residual connection, and the full connection layer is used for outputting the classification prediction result.
S3, model training and parameter tuning: for input graph structure dataIs a sequence of (2)After S1, inputting the result into the model constructed in S2, and finally outputting the classification prediction result through the last full connection layer of the model. In the whole model training process, the super-parameter combination which enables the model to perform best on the test data set is finally searched by adjusting the loss function, the optimizer function and the learnable super-parameters, and the model is built.
S4, analyzing and evaluating the performance of the graph neural network model: and comparing and analyzing the established model with a plurality of reference models by taking the set evaluation indexes as standards, and verifying the performance of the evaluation models.
The method, the device, the system, the equipment and the medium provided by the embodiment of the application have at least the following technical effects or advantages: providing a bioinformatics classification model based on graph structure data characteristics, wherein the model comprises a plurality of feature extraction layers, a plurality of readout layers and a full connection layer which are stacked step by step; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein each graph pooling layer is composed of a three-channel pooling module and a feature fusion module, and the three-channel pooling module comprises a graph convolution pooling channel, a differential pooling channel and a transducer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependency information of features among nodes of graph structure data, so that a constructed model has better performance expression in a graph classification task, and the biological informatics of the graph structure data features can be classified more accurately.
While specific embodiments of the application have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the application, and that equivalent modifications and variations of the application in light of the spirit of the application will be covered by the claims of the present application.

Claims (7)

1. The bioinformatics classification model based on the graph structure data characteristics is characterized in that: the device comprises a plurality of feature extraction layers stacked step by step, a plurality of reading layers and a full connection layer; any feature extraction layer comprises a graph convolution layer and a graph pooling layer, wherein the graph convolution layer is correspondingly connected with a reading layer through the graph pooling layer, and the reading layers are connected with the full-connection layer;
the image pooling layer comprises a three-channel pooling module and a feature fusion module, wherein the three-channel pooling module comprises an image convolution pooling channel, a differential pooling channel and a Transformer pooling channel which are respectively used for learning local topological structure information, global topological structure information and dependence information of features among nodes of image structure data, and the feature fusion module fuses the local topological structure information, the global topological structure information and the dependence information of the features among the nodes to obtain a pooling image;
and inputting the pooled graph obtained by the feature extraction layer of the previous stage into a graph rolling layer of the feature extraction layer of the next stage, extracting graph features of the pooled graph obtained by the feature extraction layer of any previous stage by the corresponding readout layer, forming residual connection with graph feature representations extracted by the feature extraction layer of the last stage by the corresponding readout layer, and outputting a prediction result of bioinformatics classification by the full connection layer.
2. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the transducer pooling channel captures the dependency information of the characteristics among nodes based on a TOP-K pooling model, the score calculation mode of the TOP-K pooling model is obtained by converting a graph by the transducer, and the calculation formula is as follows:
X T =Transformer(X);
S T =X T W 1
id 1 =TOP(S T ,k);
wherein:
X T the feature matrix is obtained by transforming the node feature X after passing through a transducer module, and the feature matrix is used as a feature for calculation;
W 1 ∈R d×1 the method is a learnable parameter matrix and is used for learning the influence of each characteristic dimension of the node on the overall characteristics of the node, wherein R represents a real number matrix and d represents the characteristic dimension of the node;
S T is the score of all nodes of the graph structure data, S T ∈R n×1 Is the score of each node of the graph structure data, R represents a real matrix, and n represents the number of original nodes;
TOP(S T k) sorting the nodes according to the scores calculated based on the node long-distance dependency information, and taking out the ids of k nodes with the highest scores as reserved node ids after sorting 1 The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after the rest nodes are discarded, the characteristic information of the discarded nodes is converged to the reserved nodes according to a certain proportion, and the specific formula is as follows:
wherein:
is the id of the discarded node;
the characteristic information of the discarded nodes is aggregated along the edges of the graph structure data to the upper part of the characteristic information of the reserved nodes, R represents a real matrix, k represents the number of reserved nodes, and n represents the number of original nodes;
is a node characteristic matrix generated after the channel is subjected to transform pooling.
3. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the differential pooling channel learns soft allocation matrices using a graph rolling neural network for generating a coarsened graph G C The allocation matrix C is generated by the following formula:
C=GCN(A,X);
wherein: a represents an adjacency matrix; x represents a node feature matrix;
after obtaining the allocation matrix C, the graph G is coarsened C Feature matrix X of (2) C And adjacent matrix A C Is generated by the following formula:
X C =C T X;
A C =C T AC;
wherein: t represents the transposed mathematical symbol.
4. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the graph rolling pooling channel is a node voting type graph pooling method based on a graph rolling neural network, the node voting type graph pooling method captures local topological structure information among nodes in graph structure data by using the graph rolling neural network, and the node score calculation mode is as follows:
S G =GCN(A,X,W 3 );
id 3 =TOP(S G ,k);
wherein:
W 3 ∈R d×1 is a learnable parameter, and is used for learning the influence of each feature of nodes of the graph structure data on the overall feature of the nodes, R represents a real number matrix, and d represents the feature dimension of the nodes;
S G is the score of all nodes of the graph structure data, S G ∈R n×1 The score of each node of the graph structure data in the graph convolution pooling channel is that R represents a real number matrix and n represents the number of original nodes;
TOP(S G k) sorting the nodes according to the scores calculated based on the node long-distance dependency information, and taking out the ids of k nodes with the highest scores after sorting as the ids of reserved nodes 3 The reserved nodes are regarded as important nodes in the graph structure data, and the rest nodes are discarded;
after discarding the node, the characteristic information of the discarded node is converged to the reserved node according to a certain proportion, and the specific formula is as follows:
wherein:
is the id of the discarded node;
is an aggregation matrix of characteristic information of discarded nodes, R represents a real matrix, n represents the number of original nodes, and k represents the number of reserved nodes;
the node characteristic matrix is generated after the path is pooled through graph convolution.
5. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the feature fusion module comprises a cross-channel convolution module and an aggregation module;
the cross-channel convolution module adopts a cross-channel convolution method to fuse the dependency information of the characteristics among the nodes of the transform pooling channel and the global topology information of the differential pooling channel together, and fuses the local topology information of the graph convolution pooling channel and the global topology information of the differential pooling channel together, so that two cross-channel aggregation pooling graphs are obtained, and the cross-channel convolution method has the following formula:
X=σ(X F +A cross X C );
wherein:
x includes X 1 And X 2 ,X 1 Reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the Transformer pooling channel; x is X 2 Reserving a node characteristic matrix generated after the cross-channel convolution of the nodes in the graph rolling pooling channel;
sigma represents an activation function;
X F comprises X F1 And X F3 ,X F1 Is a node characteristic matrix X generated after a conversion channel is formed F3 The node characteristic matrix is generated after the path is pooled through graph convolution;
X C is a node characteristic matrix generated after passing through the differential pooling channel;
A cross ∈R k×c represented by X C To X direction F A conversion matrix for conversion, wherein R represents a real matrix, k represents the number of nodes of the graph structure data generated in the conversion former pooling channel or the graph convolution pooling channel, and c represents the number of nodes of the graph structure data generated in the differential pooling channel; a is that cross The method is obtained by the following formula:
A cross [i]=C[i],i∈id 1 or id 3
wherein C is a soft allocation matrix learned by a graph neural network in the differential pooling channel;
the aggregation module represents the index of the reserved nodes in the transform pooling channel as id 1 The reserved node index in the graph volume pooling channel is denoted as id 2 The method comprises the steps of carrying out a first treatment on the surface of the Taking the average value of node characteristics existing in both the transform pooling channel and the graph convolution pooling channel as the characteristic of a new node, and taking the characteristic of the node existing in only the transform pooling channel or the graph convolution pooling channel as the characteristic of the new node; the specific formula is as follows:
id=id 1 ∪id 2
extracting a sub-graph consisting of most representative nodes of the original graph structure data by indexes, wherein an adjacency matrix is expressed as follows:
wherein:
representing an adjacency matrix for extracting sub-graphs consisting of the most representative nodes of the original graph structure data by indexing;
k= |id| represents the number of nodes to be reserved;
n is the total number of nodes in the original graph structure data;
the pooling graph is then generated using the following two formulas:
wherein X is P ∈R K×d Is the node characteristic of the aggregated graph structure data, A P ∈{0,1} K×K Is an adjacency matrix of aggregated graph structure data.
6. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the readout layer extracts a graph feature representation of each of the pooled graphs using a readout function, the readout function being:
wherein Read E R d And the characteristic representation of the pooling graph is represented, d represents the characteristic dimension of the node, and n represents the number of nodes of the pooling graph.
7. The bioinformatics classification model based on graph structure data features of claim 1, wherein: the full-connection layer adopts a multi-layer perceptron as a classifier to classify the input graph characteristic representation, and the formula is as follows:
P=SOFTMAX(Liner(RELU(Liner(Read))));
where P is the bioinformatics class of graph structure data predictions.
CN202310659097.6A 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics Active CN116416478B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310659097.6A CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310659097.6A CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Publications (2)

Publication Number Publication Date
CN116416478A CN116416478A (en) 2023-07-11
CN116416478B true CN116416478B (en) 2023-09-26

Family

ID=87059631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310659097.6A Active CN116416478B (en) 2023-06-06 2023-06-06 Bioinformatics classification model based on graph structure data characteristics

Country Status (1)

Country Link
CN (1) CN116416478B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821452B (en) * 2023-08-28 2023-11-14 南京邮电大学 Graph node classification model training method and graph node classification method
CN117688425B (en) * 2023-12-07 2024-07-16 重庆大学 Multi-task graph classification model construction method and system for Non-IID graph data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211685A (en) * 2019-06-10 2019-09-06 珠海上工医信科技有限公司 Sugar network screening network structure model based on complete attention mechanism
CN110993037A (en) * 2019-10-28 2020-04-10 浙江工业大学 Protein activity prediction device based on multi-view classification model
CN114693971A (en) * 2022-03-29 2022-07-01 深圳市大数据研究院 Classification prediction model generation method, classification prediction method, system and platform
CN115618927A (en) * 2022-11-17 2023-01-17 中国人民解放军陆军防化学院 Gas type identification method based on time sequence-graph fusion neural network
CN116127353A (en) * 2022-12-28 2023-05-16 马上消费金融股份有限公司 Classification method, classification model training method, equipment and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332523A (en) * 2020-09-30 2022-04-12 富士通株式会社 Apparatus and method for classification using classification model and computer-readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211685A (en) * 2019-06-10 2019-09-06 珠海上工医信科技有限公司 Sugar network screening network structure model based on complete attention mechanism
CN110993037A (en) * 2019-10-28 2020-04-10 浙江工业大学 Protein activity prediction device based on multi-view classification model
CN114693971A (en) * 2022-03-29 2022-07-01 深圳市大数据研究院 Classification prediction model generation method, classification prediction method, system and platform
CN115618927A (en) * 2022-11-17 2023-01-17 中国人民解放军陆军防化学院 Gas type identification method based on time sequence-graph fusion neural network
CN116127353A (en) * 2022-12-28 2023-05-16 马上消费金融股份有限公司 Classification method, classification model training method, equipment and medium

Also Published As

Publication number Publication date
CN116416478A (en) 2023-07-11

Similar Documents

Publication Publication Date Title
Xinyi et al. Capsule graph neural network
CN116416478B (en) Bioinformatics classification model based on graph structure data characteristics
CN112508085B (en) Social network link prediction method based on perceptual neural network
CN108805200B (en) Optical remote sensing scene classification method and device based on depth twin residual error network
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN112861967B (en) Social network abnormal user detection method and device based on heterogeneous graph neural network
CN109063649B (en) Pedestrian re-identification method based on twin pedestrian alignment residual error network
CN110569814B (en) Video category identification method, device, computer equipment and computer storage medium
CN112199536A (en) Cross-modality-based rapid multi-label image classification method and system
CN112381179B (en) Heterogeneous graph classification method based on double-layer attention mechanism
CN116628597B (en) Heterogeneous graph node classification method based on relationship path attention
CN117237559B (en) Digital twin city-oriented three-dimensional model data intelligent analysis method and system
CN113283902B (en) Multichannel blockchain phishing node detection method based on graphic neural network
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN115983984A (en) Multi-model fusion client risk rating method
CN116206327A (en) Image classification method based on online knowledge distillation
CN110993037A (en) Protein activity prediction device based on multi-view classification model
CN112801063A (en) Neural network system and image crowd counting method based on neural network system
CN113869424A (en) Semi-supervised node classification method based on two-channel graph convolutional network
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
Zhang et al. Reinforced adaptive knowledge learning for multimodal fake news detection
CN114494809A (en) Feature extraction model optimization method and device and electronic equipment
CN110633394B (en) Graph compression method based on feature enhancement
CN117639057A (en) New energy power distribution area topology association analysis method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant