CN115631847A - Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment - Google Patents
Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment Download PDFInfo
- Publication number
- CN115631847A CN115631847A CN202211280689.9A CN202211280689A CN115631847A CN 115631847 A CN115631847 A CN 115631847A CN 202211280689 A CN202211280689 A CN 202211280689A CN 115631847 A CN115631847 A CN 115631847A
- Authority
- CN
- China
- Prior art keywords
- layer
- matrix
- lung cancer
- classifier
- gene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Early lung cancer diagnosis system, storage medium and equipment based on multiple mathematical characteristics belong to the technical field of cancer diagnosis. The method aims to solve the problem that the early lung cancer screening only using clinical images is low in accuracy. The system comprises a neural network prediction unit, an omic feature processing unit and a classifier prediction unit, wherein the neural network prediction unit is used for predicting the imaging data converted into a matrix form by respectively using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN), the omic feature processing unit is used for obtaining a weighted feature matrix of a global gene relation matrix by using a graph convolutional network so as to obtain a plurality of groups of chemical features for each omic feature, and the classifier prediction unit is used for predicting the multi-omic data by respectively using a multi-classifier. The invention is suitable for early lung cancer diagnosis.
Description
Technical Field
The invention belongs to the technical field of cancer diagnosis, and particularly relates to an early lung cancer diagnosis system, a storage medium and equipment.
Background
Early stage lung cancer lacks typical symptoms, and the definitive diagnosis of lung cancer is often judged by pathologists based on clinical symptoms, signs, imaging examinations, and histopathological examinations. The early diagnosis of the lung cancer has important significance, the lung cancer can be diagnosed and treated at the early stage of pathological changes, the best curative effect can be obtained, and the cure rate of the early stage lung cancer is far higher than that of the middle and late stage lung cancer. Weijian committee recommends carrying out chest X-ray film or CT examination every year on people over 40 years of age, which is helpful for detecting early lung cancer.
However, the early lung cancer screening only using clinical images has certain limitations and low accuracy, and the diagnosis result depends on the judgment of a pathologist. For the central lung cancer, the sputum cytology can effectively detect the fallen cancer cells, and has the characteristics of easy sampling and low cost. In recent years, with the development of sequencing technology, the cost of gene detection is greatly reduced, and multiple omics characteristics such as transcriptomics, genomics, proteomics and the like are provided for the detection of early lung cancer, so that the requirement of a machine learning model for processing multiomics data is generated, and the early lung cancer diagnosis auxiliary system is clinically required to have high efficiency, accuracy, comprehensiveness and high biological interpretability.
Disclosure of Invention
The invention aims to solve the problem of low accuracy in early lung cancer screening only by using clinical images.
An early lung cancer diagnostic system based on multiple sets of mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;
a neural network prediction unit: respectively predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) aiming at the imaging data converted into the matrix form;
omics feature processing unit: aiming at each omic feature, a weighted feature matrix of a global genetic relationship matrix is obtained by utilizing a graph convolution network, and then multiple groups of mathematical features are obtained
Obtaining a weighted feature matrix of the global gene relation matrix by using the graph convolution network, and further obtaining a plurality of groups of mathematical featuresComprises the following steps:
using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network g 、M p 、M s Represent, then M g 、M p 、M s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A ij =1 representative of Gene i regulatory Gene j, A ij =0 represents gene i without regulatory gene j;
obtaining a weighted feature matrix by utilizing multilayer graph convolution, wherein the multilayer graph convolution refers to the calculation depth of graph convolution, namely iteration times; information propagation between each layer in the multi-layer graph convolution network is represented as follows:
whereinI denotes a matrix of units, I being,is thatIs a diagonal matrix with diagonal elements passing throughCalculating to obtain; w is a gene weight matrix, and is initialized to be equal to all gene weights; h (l) Is a characteristic of each of the layers that,for input layer H (l) Is the omics feature to be analyzed F om Om represents omics;
aiming at each omics feature, inputting an adjacent matrix A and an initial weight W of a global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iterationom represents different omics;
if there are multiple groups of the omics characteristics, the omics characteristics are linearly combined, such as formula (5), and then the integrated characteristics are input into a classifier unit;
wherein the content of the first and second substances,a feature matrix representing the weight corresponding to each omic in the multiomics; i represents that a plurality of groups of mathematical characteristics are linearly combined;
A classifier prediction unit: targeting multiomic dataRespectively using SVM-rbf, SVM-poly and RF classifiers for prediction;
an ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
Further, the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.
Further, the convolutional neural network CNN includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;
further, the convolutional neural network CNN is obtained by training lung cancer imaging data acquired by using a public database.
Further, the deep neural network DNN includes: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;
wherein, hidden represents a Hidden layer.
Further, in information propagation between each layer of the layer graph convolution networkBy the following D ij The replacement results in:
where Din represents an in-degree matrix of the node, and in-degree is an edge pointing to the node.
Further, the process of distributing the weights for CNN, DNN, SVM-rbf, SVM-poly and RF by the integrated learning automatic weight balance unit comprises the following steps:
the Bayesian modeling is utilized to derive the weight of the joint multi-classifier, and the prediction loss of the prediction task of a single classifier is calculated through cross entropyThen predicting the loss of prediction of the task by the single classifierWeighted summation:
wherein the content of the first and second substances,predicting the loss, w, for each individual classifier ω Weights corresponding to individual classifiers, ω represents ω classifiers, w 1:ω Representing the weights used by all classifier pairs; pred ω Indicating the prediction result, pred, of each classifier 1:ω The prediction results of all classifiers are obtained; 2log (w) 1 *…*w ω ) Is a penalty term;
in the training process, the formula (7) is solved to pred 1:ω Under the condition thatTask weight w with minimum value 1:ω Solving formula (7) by gradient descent method to automatically generate task weight w 1:ω 。
Further, SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier through a grid search method.
A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the system for early lung cancer diagnosis based on multiple sets of mathematical features.
An early lung cancer diagnosis device based on multiple group chemical characteristics comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded by the processor and executed to realize the early lung cancer diagnosis system based on multiple group chemical characteristics.
Has the advantages that:
the core of the invention is to use image data obtained by imaging and radiology, and use genome, transcriptome and other data to carry out ensemble learning, thus effectively improving the accuracy of early lung cancer diagnosis. The method can greatly improve the prediction accuracy rate aiming at early lung cancer screening.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
FIG. 2 is a schematic diagram of the multi-component features of FIG. 1.
Fig. 3 is a schematic structural diagram of a convolutional neural network CNN.
Fig. 4 is a schematic structural diagram of a deep neural network DNN.
Detailed Description
The first embodiment is as follows: the present embodiment is described with reference to figures 1 and 2,
the early lung cancer diagnosis system based on the multiple mathematical characteristics according to the embodiment includes:
a digital image processing unit: converting the iconography data into matrix-form iconography data;
aiming at the imaging data such as X-ray, CT (computed tomography) and the like, radiology data and the like, firstly, converting an image into a matrix form, wherein each element of the matrix corresponds to a pixel at a corresponding position in a graph, and the value of each element corresponds to the gray value of the pixel;
and obtaining corresponding gray values by a method of superposing three primary color channels for the color image.
The digital image has translation invariance, namely, the local information of the image is processed regardless of the position of the part of the information on the image, for example, when a Convolutional Neural Network (CNN) model is used for distinguishing cancer cells in a tissue section, the model makes a distinction according to the characteristics of cell morphology and the like regardless of the position of the cancer cells.
A neural network prediction unit: predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) respectively aiming at the imaging data converted into the matrix form;
inputting the image converted into the matrix form into a trained convolutional neural network CNN; the CNN structure is shown in fig. 3, and includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;
a first convolution layer (7 x 7, conv, 64), a first pooling layer (0.5), each convolution layer from the first residual module to the third residual module being (3 x 3, conv, 64); the convolution layers of the fifth residual block are all (3 × 3, conv, 128), and the convolution layers of the fourth residual block are (3 × 3, conv,128, 0.5), respectively, in order to make the matrix sizes before and after addition consistent. 0.5 of the convolutional layer parameters of the fourth residual module is the inter-layer scaling parameter, 64=128 × 0.5.
The layers in fig. 2 have the following meanings: conv is the convolutional layer, the preceding number is the perceptual domain, the following number is the batch size; the FC is a full connection layer and plays a role of a classifier in the whole convolutional neural network; "+" is a residual operation; "+'" is a residual operation with convolution, i.e., a residual block in order to make the matrix size consistent before and after addition;
the invention uses lung cancer imaging data obtained by a public database (https:// www. Cancerigingachive. Net/collections /) to train and obtain the convolutional neural network CNN.
Inputting the image converted into the matrix form into a trained deep neural network DNN; the DNN structure is shown in fig. 4, and includes an input layer, a first Hidden layer, a second Hidden layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, a first pooling layer, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and an output layer;
wherein, hidden is a Hidden layer; the Pool layer can reduce the space size of a data body, so that the number of parameters in a network can be reduced, the consumption of computing resources is reduced, and overfitting can be effectively controlled; the FC is a full connection layer, plays a role of a classifier in the whole neural network, and outputs a prediction result through the FC layer.
Omics feature processing unit: for eachOmics characteristics, namely obtaining a weighted characteristic matrix of a global gene relation matrix by utilizing a graph convolution network so as to obtain multiple groups of characteristics
Knowledge networks such as gene-gene regulatory networks, protein-protein interaction networks (ppi), and biomolecule signaling pathways (pathway) in cancer research belong to the topology structure.
The topological graph is composed of nodes (nodes) and edges (edges), the edges of the connected nodes represent the relationship among the nodes, the edges in the graph can have directions, when the edges in the topological graph are all undirected edges, the graph is called as an undirected graph, when the edges in the topological graph have directions, the graph is called as a directed graph, the directed graph can also be converted into a matrix form and is usually represented by a square matrix, the value of (i, j) in the matrix is a relationship coefficient from the Node i to the Node j, the relationship matrix of the directed graph is a symmetric matrix, and the relationship matrix of the undirected graph is usually an asymmetric matrix.
The topological Graph has an irregular data structure, the topological structures around each node are different from each other, and the topological Graph does not have translation invariance, so that the traditional Convolutional neural Network cannot be applied to an undirected Graph.
The essence of the graph convolution network is a first-order local approximation of spectrum convolution based on information propagation of a topological structure, in a multilayer graph convolution network (multilayer graph convolution refers to the calculation depth of graph convolution and is iteration times), each layer only processes first-order neighborhood information of a node, information propagation of a multi-order neighborhood is realized through superposition of a plurality of layers, and an information propagation rule between each layer is as follows:
wherein the content of the first and second substances,a is a gene relation network communication matrix, aij =1 represents a gene i to regulate a gene j, and Aij =0 represents the gene i to regulate the gene j; w is a gene weight matrix, and is initialized to be equal to all gene weights; h (l) is the characteristic of each layer, and for the input layer, H (l) is the omics characteristic Fom to be analyzed, and om represents omics;is thatThe degree matrix of (1) is a diagonal matrix, the elements on the diagonal of which pass throughAnd (4) calculating.
Because of the pairThe operation of (2) is more complicated, and in order to simplify the information propagation operation, the digraph is improved as follows, and Dij in the formula (2) is used for replacing the Dij in the formula (1)Din in the formula represents an in-degree matrix of a node, in-degree refers to an edge pointing to the node (out-degree corresponding to in-degree is an edge sent by the node, and is not used here):
the method comprises the following specific steps:
first, a matrix a (adjacency matrix) in formula (1) is obtained by a priori knowledge modeling. For gene regulatory networks, the topology N is used g-g (Gene, regulation) wherein a node is a Gene, and an edge is a Gene Regulation relationship, and is a directed graph; for protein relationship networks, the topology N is used p-p (Protein, interaction) indicates that the node is Protein, the edge is Protein Interaction relation, and the node is an undirected graph; topology for biological Signal pathway N s (Signature, definition), where nodes are biological signals and edges are signal interactions, is a directed graph. The three kinds of prior knowledge have the following relationship: for each protein in the protein interaction network that results from the transcriptional translation of the corresponding gene, nodes in the biological signaling pathway include, but are not limited to, gene and protein signals. Expressing the prior knowledge as a matrix M g 、M p 、M s Global gene relationship matrix M consisting of M g M p M s And acquiring a union set.
Aiming at each omics feature, inputting the adjacent matrix A and the initial weight W of the global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration (GCN performance is reduced due to excessive layers)om represents different omics; w (3) Namely the feature weights obtained through 3-layer GCN iteration.
If there are multiple sets of mathematical features, the linear combinations of features, such as equation (5), are then input to the classifier unit.
Wherein the content of the first and second substances,a feature matrix representing the weighting corresponding to each omic in the multiomic; | | indicates that there will be multiple sets of mathematical features that are linearly combined;
A classifier prediction unit: targeting multiomic dataRespectively using SVM-rbf, SVM-poly and RF classifiers to predict;
the classifier is obtained from a scipit-leam (https:// scipit-lean. Org/stable /) packet. The classifier parameters are adjusted using a grid search method. A grid search (GridSearch) is used to select the optimal hyper-parameters of the model. The mode of obtaining the optimal hyper-parameters can be used for drawing a verification curve, but the verification curve can only obtain one optimal hyper-parameter each time. If there are many permutation combinations of the plurality of hyper-parameters, a grid search may be used to find the combination of the optimal hyper-parameters.
An ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
ensemble learning involves integrating multiple classification tasks, and one common method is to weight and sum classifier results and implement inter-task coordination by setting different weights for the tasks. The simplest method is to set the weights of all tasks to be equal, however, the average weight method is only effective when no competition exists among the tasks, and in an actual situation, the tasks are naturally unbalanced due to a plurality of reasons such as different classifier precision and priori knowledge utilization capacity. Therefore, the invention utilizes Bayesian modeling to deduce the weight of the combined multi-classifier, and through a Bayesian task weight learning device, the balance among tasks is automatically realized.
The Bayes multitask weight learning method is briefly described as follows, and is embodied by a multitask loss function:
defining a multitask penalty asTotal loss prediction loss for each individual classifierWeight of (w) ω ) And, ω represents that there are ω classifiers.
Where n is the total number of instances used to train the classifier weights;
equation (6) is a generalized representation that can substitute predictions for any one classifier pair.
Equation (7) is a weighted summation representation of equation (6) that contains the predictions for all classifiers used:
wherein, y i Is a true label (the TCGA example is a labeled example, the label is the diagnosis of lung cancer, such as 'no lung cancer', 'early lung cancer', 'intermediate lung cancer', 'late lung cancer', and the like, and the label is 1 for the lung cancer and 0 for the no lung cancer);is a single task to a featureIs predicted (the prediction result is [0,1 ]]A number in the interval closer to 1 indicates a higher likelihood of lung cancer and closer to 0 indicates a higher likelihood of no lung cancer), although CNN and DNN are not characteristicFor convenience of illustration, the prediction results of CNN and DNN are also used for the predictionRepresenting;
meanwhile, it is to be noted that: in order to make multitask loseThe weighting coefficient is not equal to 0, and is positive, and a weighting form with square as denominator is adopted to facilitate the operation.
It should be noted that, labels of "early lung cancer", "intermediate lung cancer" and "late lung cancer" exist in the training process, and actually, in the practical application process of the present invention, results of "early lung cancer", "intermediate lung cancer" and "late lung cancer" are also obtained. That is, the present invention may actually predict intermediate-stage lung cancer and advanced-stage lung cancer. In actual clinical practice, generally, lung cancer of middle stage and lung cancer of late stage are relatively easy to judge (the characteristics in the image are very obvious), and only lung cancer of early stage is not easy to identify, and the accuracy rate is low. Therefore, the invention also prepares labels of 'middle-stage lung cancer' and 'late-stage lung cancer' in the training process. The accuracy can be greatly improved when the early lung cancer is predicted by using the method.
The purpose of the training is to make formula (7) at pred 1:ω Under the conditions ofTask weight w with minimum value 1:ω ,2log(w 1 *…*w ω ) Is a punishment item, the formula (7) is solved by using a gradient descent method, and the task weight w is automatically generated 1:ω 。
The classifier results are as in equation (8):
vote [. Cndot. ] is voting operation, the prediction results of each classifier are weighted and then voted, and more models output lung cancer results when lung cancer exists, otherwise lung cancer-free results are output.
Early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
The second embodiment is as follows:
the present embodiment is a computer storage medium having at least one instruction stored therein, the at least one instruction being loaded and executed by a processor to implement the multiple mathematical features based early lung cancer diagnosis system.
It should be understood that any method described herein, including any methods described herein, may accordingly be provided as a computer program product, software, or computerized method, which may include a non-transitory machine-readable medium having stored thereon instructions, which may be used to program a computer system, or other electronic device. Storage media may include, but is not limited to, magnetic storage media, optical storage media; a magneto-optical storage medium comprising: read only memory ROM, random access memory RAM, erasable programmable memory (e.g., EPROM and EEPROM), and flash memory layers; or other type of media suitable for storing electronic instructions.
The third concrete implementation mode:
the embodiment is an early lung cancer diagnosis device based on multiple groups of chemical characteristics, the device comprises a processor and a memory, and it should be understood that the device comprises any device comprising the processor and the memory, which is described in the present invention, and can also comprise other units and modules which perform display, interaction, processing, control, etc. and other functions through signals or instructions;
the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the multiple mathematical features based early lung cancer diagnosis system.
The above-described calculation examples of the present invention are merely to describe the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.
Claims (10)
1. An early lung cancer diagnostic system based on multiple mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;
a neural network prediction unit: respectively predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) aiming at the imaging data converted into the matrix form;
omics feature processing unit: aiming at each omics feature, a weighted feature matrix of a global gene relation matrix is obtained by utilizing a graph convolution network, and then a plurality of groups of omics features are obtained
Obtaining a weighted feature matrix of the global genetic relationship matrix by using a graph convolution network, and further obtaining multigroup characteristicsComprises the following steps:
using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network g 、M p 、M s Represent, then M g 、M p 、M s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A ij =1 represents Gene i regulatory Gene j, A ij =0 represents gene i without regulatory gene j;
obtaining a weighted feature matrix by utilizing multilayer graph convolution, wherein the multilayer graph convolution refers to the calculation depth of graph convolution, namely iteration times; information propagation between each layer in the multi-layer graph convolution network is represented as follows:
whereinI denotes a unit matrix of the cell,is thatIs a diagonal matrix with diagonal elements passing throughCalculating to obtain; w is a gene weight matrix, and initialization is that all gene weights are equal; h (l) Is a feature of each layer, for input layer H (l) Is the omics feature to be analyzed F om Om represents omics;
aiming at each omics feature, inputting an adjacent matrix A and an initial weight W of a global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iterationom represents different omics;
if there are multiple groups of the omics characteristics, the omics characteristics are linearly combined, such as formula (5), and then the integrated characteristics are input into a classifier unit;
wherein the content of the first and second substances,a feature matrix representing the weighting corresponding to each omic in the multiomic; i represents that a plurality of groups of mathematical characteristics are linearly combined;
A classifier prediction unit: targeting multiomic dataRespectively using SVM-rbf, SVM-poly and RF classifiers for prediction;
an ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
2. The early lung cancer diagnosis system based on multiple sets of mathematical features according to claim 1, wherein the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.
3. The system according to claim 2, wherein the convolutional neural network CNN comprises: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module includes two convolutional layers.
4. The system of claim 3, wherein the convolutional neural network CNN is trained by using lung cancer imaging data obtained from a public database.
5. The system according to claim 4, wherein the deep neural network DNN comprises: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;
wherein, hidden represents a Hidden layer.
6. The multi-component signature based early stage lung cancer diagnosis system of claim 1, 2, 3, 4 or 5, wherein the layer map convolution network is in information propagation between each layerBy the following D ij The replacement results in:
where Din represents an in-degree matrix of the node, and the in-degree is an edge pointing to the node.
7. The system of claim 6, wherein the process of assigning weights for CNNs, DNNs, SVM-rbf, SVM-poly, RF by the integrated learning automatic weight balancing unit comprises the steps of:
the Bayesian modeling is utilized to derive the weight of the joint multi-classifier, and the prediction loss of the prediction task of a single classifier is calculated through cross entropyThen predicting the loss of prediction of the task by the single classifierWeighted summation:
wherein the content of the first and second substances,(. For each individual classifier, w ω Weights corresponding to individual classifiers, ω represents ω classifiers, w 1:ω Representing the weights used by all classifier pairs; pred ω Indicating the prediction result, pred, of each classifier 1:ω The prediction results of all classifiers are obtained; 2log (w) 1 *…*w ω ) Is a penalty term;
8. The multi-group signature-based early lung cancer diagnosis system of claim 7, wherein the SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier by a grid search method.
9. A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a multi-set mathematical signature-based early lung cancer diagnostic system according to any one of claims 1 to 8.
10. An early lung cancer diagnosis apparatus based on multiple sets of mathematical features, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the early lung cancer diagnosis system based on multiple sets of mathematical features according to one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211280689.9A CN115631847B (en) | 2022-10-19 | 2022-10-19 | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211280689.9A CN115631847B (en) | 2022-10-19 | 2022-10-19 | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115631847A true CN115631847A (en) | 2023-01-20 |
CN115631847B CN115631847B (en) | 2023-07-14 |
Family
ID=84906468
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211280689.9A Active CN115631847B (en) | 2022-10-19 | 2022-10-19 | Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115631847B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115938592A (en) * | 2023-03-09 | 2023-04-07 | 成都信息工程大学 | Cancer prognosis prediction method based on local enhancement graph convolution network |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103839412A (en) * | 2014-03-27 | 2014-06-04 | 北京建筑大学 | Combined estimation method for road junction dynamic steering proportion based on Bayes weighting |
CA2974199A1 (en) * | 2015-01-20 | 2016-07-28 | Nantomics, Llc | Systems and methods for response prediction to chemotherapy in high grade bladder cancer |
CN111028939A (en) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | Multigroup intelligent diagnosis system based on deep learning |
WO2020113673A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | Cancer subtype classification method employing multiomics integration |
US20200381083A1 (en) * | 2019-05-31 | 2020-12-03 | 410 Ai, Llc | Estimating predisposition for disease based on classification of artificial image objects created from omics data |
CN112201346A (en) * | 2020-10-12 | 2021-01-08 | 哈尔滨工业大学(深圳) | Cancer survival prediction method, apparatus, computing device and computer-readable storage medium |
AU2020103613A4 (en) * | 2020-11-23 | 2021-02-04 | Agricultural Information and Rural Economic Research Institute of Sichuan Academy of Agricultural Sciences | Cnn and transfer learning based disease intelligent identification method and system |
CN112925984A (en) * | 2021-04-02 | 2021-06-08 | 吉林大学 | GCN recommendation-based sample density aggregation method |
WO2021226778A1 (en) * | 2020-05-11 | 2021-11-18 | 浙江大学 | Epileptic electroencephalogram recognition system based on hierarchical graph convolutional neural network, terminal, and storage medium |
CN114154557A (en) * | 2021-11-08 | 2022-03-08 | 中央财经大学 | Cancer tissue classification method, apparatus, electronic device, and storage medium |
CA3131843A1 (en) * | 2020-09-25 | 2022-03-25 | Royal Bank Of Canada | System and method for structure learning for graph neural networks |
CN114418174A (en) * | 2021-12-13 | 2022-04-29 | 国网陕西省电力公司电力科学研究院 | Electric vehicle charging load prediction method |
CN114530222A (en) * | 2022-01-13 | 2022-05-24 | 华南理工大学 | Cancer patient classification system based on multiomics and image data fusion |
US20220222931A1 (en) * | 2019-06-06 | 2022-07-14 | NEC Laboratories Europe GmbH | Diversity-aware weighted majority vote classifier for imbalanced datasets |
CN114927162A (en) * | 2022-05-19 | 2022-08-19 | 大连理工大学 | Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution |
CN115171779A (en) * | 2022-07-13 | 2022-10-11 | 浙江大学 | Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion |
-
2022
- 2022-10-19 CN CN202211280689.9A patent/CN115631847B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103839412A (en) * | 2014-03-27 | 2014-06-04 | 北京建筑大学 | Combined estimation method for road junction dynamic steering proportion based on Bayes weighting |
CA2974199A1 (en) * | 2015-01-20 | 2016-07-28 | Nantomics, Llc | Systems and methods for response prediction to chemotherapy in high grade bladder cancer |
WO2020113673A1 (en) * | 2018-12-07 | 2020-06-11 | 深圳先进技术研究院 | Cancer subtype classification method employing multiomics integration |
US20200381083A1 (en) * | 2019-05-31 | 2020-12-03 | 410 Ai, Llc | Estimating predisposition for disease based on classification of artificial image objects created from omics data |
US20220222931A1 (en) * | 2019-06-06 | 2022-07-14 | NEC Laboratories Europe GmbH | Diversity-aware weighted majority vote classifier for imbalanced datasets |
CN111028939A (en) * | 2019-11-15 | 2020-04-17 | 华南理工大学 | Multigroup intelligent diagnosis system based on deep learning |
WO2021226778A1 (en) * | 2020-05-11 | 2021-11-18 | 浙江大学 | Epileptic electroencephalogram recognition system based on hierarchical graph convolutional neural network, terminal, and storage medium |
CA3131843A1 (en) * | 2020-09-25 | 2022-03-25 | Royal Bank Of Canada | System and method for structure learning for graph neural networks |
CN112201346A (en) * | 2020-10-12 | 2021-01-08 | 哈尔滨工业大学(深圳) | Cancer survival prediction method, apparatus, computing device and computer-readable storage medium |
AU2020103613A4 (en) * | 2020-11-23 | 2021-02-04 | Agricultural Information and Rural Economic Research Institute of Sichuan Academy of Agricultural Sciences | Cnn and transfer learning based disease intelligent identification method and system |
CN112925984A (en) * | 2021-04-02 | 2021-06-08 | 吉林大学 | GCN recommendation-based sample density aggregation method |
CN114154557A (en) * | 2021-11-08 | 2022-03-08 | 中央财经大学 | Cancer tissue classification method, apparatus, electronic device, and storage medium |
CN114418174A (en) * | 2021-12-13 | 2022-04-29 | 国网陕西省电力公司电力科学研究院 | Electric vehicle charging load prediction method |
CN114530222A (en) * | 2022-01-13 | 2022-05-24 | 华南理工大学 | Cancer patient classification system based on multiomics and image data fusion |
CN114927162A (en) * | 2022-05-19 | 2022-08-19 | 大连理工大学 | Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution |
CN115171779A (en) * | 2022-07-13 | 2022-10-11 | 浙江大学 | Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion |
Non-Patent Citations (3)
Title |
---|
仝宗和;袁立宁;王洋;: "图卷积神经网络理论与应用", 信息技术与信息化, no. 02, pages 193 - 198 * |
李昊天;盛益强;: "单时序特征图卷积网络融合预测方法", 计算机与现代化, no. 09, pages 36 - 40 * |
杨博雄 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115938592A (en) * | 2023-03-09 | 2023-04-07 | 成都信息工程大学 | Cancer prognosis prediction method based on local enhancement graph convolution network |
CN115938592B (en) * | 2023-03-09 | 2023-05-05 | 成都信息工程大学 | Cancer prognosis prediction method based on local enhancement graph convolution network |
Also Published As
Publication number | Publication date |
---|---|
CN115631847B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
He et al. | Ensemble transfer CNNs driven by multi-channel signals for fault diagnosis of rotating machinery cross working conditions | |
CN111739075B (en) | Deep network lung texture recognition method combining multi-scale attention | |
US20220367053A1 (en) | Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction | |
Li et al. | Stacked-autoencoder-based model for COVID-19 diagnosis on CT images | |
CN113990495A (en) | Disease diagnosis prediction system based on graph neural network | |
CN110660478A (en) | Cancer image prediction and discrimination method and system based on transfer learning | |
Widiyanto et al. | Implementation of convolutional neural network method for classification of diseases in tomato leaves | |
CN111274903A (en) | Cervical cell image classification method based on graph convolution neural network | |
CN114927162A (en) | Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution | |
CN112132818A (en) | Image processing method for constructing three stages based on graph convolution neural network | |
Savino et al. | Automated classification of civil structure defects based on convolutional neural network | |
CN115631847A (en) | Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment | |
CN115564114A (en) | Short-term prediction method and system for airspace carbon emission based on graph neural network | |
CN115879607A (en) | Electric energy meter state prediction method, system, equipment and storage medium | |
CN117153268A (en) | Cell category determining method and system | |
Liu et al. | Research on cassava disease classification using the multi-scale fusion model based on EfficientNet and attention mechanism | |
CN112966770B (en) | Fault prediction method and device based on integrated hybrid model and related equipment | |
Tyagi et al. | LCSCNet: A multi-level approach for lung cancer stage classification using 3D dense convolutional neural networks with concurrent squeeze-and-excitation module | |
CN114445356A (en) | Multi-resolution-based full-field pathological section image tumor rapid positioning method | |
CN112733724B (en) | Relativity relationship verification method and device based on discrimination sample meta-digger | |
CN117408167A (en) | Debris flow disaster vulnerability prediction method based on deep neural network | |
Patra et al. | Deep learning methods for scientific and industrial research | |
CN108846327B (en) | Intelligent system and method for distinguishing pigmented nevus and melanoma | |
CN115907079A (en) | Airspace traffic flow prediction method based on attention space-time diagram convolution network | |
CN113476065B (en) | Multiclass pneumonia diagnostic system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |