CN115631847A - Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment - Google Patents

Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment Download PDF

Info

Publication number
CN115631847A
CN115631847A CN202211280689.9A CN202211280689A CN115631847A CN 115631847 A CN115631847 A CN 115631847A CN 202211280689 A CN202211280689 A CN 202211280689A CN 115631847 A CN115631847 A CN 115631847A
Authority
CN
China
Prior art keywords
layer
matrix
lung cancer
classifier
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211280689.9A
Other languages
Chinese (zh)
Other versions
CN115631847B (en
Inventor
赵天意
许伊宁
刘博�
王亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202211280689.9A priority Critical patent/CN115631847B/en
Publication of CN115631847A publication Critical patent/CN115631847A/en
Application granted granted Critical
Publication of CN115631847B publication Critical patent/CN115631847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Early lung cancer diagnosis system, storage medium and equipment based on multiple mathematical characteristics belong to the technical field of cancer diagnosis. The method aims to solve the problem that the early lung cancer screening only using clinical images is low in accuracy. The system comprises a neural network prediction unit, an omic feature processing unit and a classifier prediction unit, wherein the neural network prediction unit is used for predicting the imaging data converted into a matrix form by respectively using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN), the omic feature processing unit is used for obtaining a weighted feature matrix of a global gene relation matrix by using a graph convolutional network so as to obtain a plurality of groups of chemical features for each omic feature, and the classifier prediction unit is used for predicting the multi-omic data by respectively using a multi-classifier. The invention is suitable for early lung cancer diagnosis.

Description

Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
Technical Field
The invention belongs to the technical field of cancer diagnosis, and particularly relates to an early lung cancer diagnosis system, a storage medium and equipment.
Background
Early stage lung cancer lacks typical symptoms, and the definitive diagnosis of lung cancer is often judged by pathologists based on clinical symptoms, signs, imaging examinations, and histopathological examinations. The early diagnosis of the lung cancer has important significance, the lung cancer can be diagnosed and treated at the early stage of pathological changes, the best curative effect can be obtained, and the cure rate of the early stage lung cancer is far higher than that of the middle and late stage lung cancer. Weijian committee recommends carrying out chest X-ray film or CT examination every year on people over 40 years of age, which is helpful for detecting early lung cancer.
However, the early lung cancer screening only using clinical images has certain limitations and low accuracy, and the diagnosis result depends on the judgment of a pathologist. For the central lung cancer, the sputum cytology can effectively detect the fallen cancer cells, and has the characteristics of easy sampling and low cost. In recent years, with the development of sequencing technology, the cost of gene detection is greatly reduced, and multiple omics characteristics such as transcriptomics, genomics, proteomics and the like are provided for the detection of early lung cancer, so that the requirement of a machine learning model for processing multiomics data is generated, and the early lung cancer diagnosis auxiliary system is clinically required to have high efficiency, accuracy, comprehensiveness and high biological interpretability.
Disclosure of Invention
The invention aims to solve the problem of low accuracy in early lung cancer screening only by using clinical images.
An early lung cancer diagnostic system based on multiple sets of mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;
a neural network prediction unit: respectively predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) aiming at the imaging data converted into the matrix form;
omics feature processing unit: aiming at each omic feature, a weighted feature matrix of a global genetic relationship matrix is obtained by utilizing a graph convolution network, and then multiple groups of mathematical features are obtained
Figure BDA0003897845710000011
Obtaining a weighted feature matrix of the global gene relation matrix by using the graph convolution network, and further obtaining a plurality of groups of mathematical features
Figure BDA0003897845710000012
Comprises the following steps:
using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network g 、M p 、M s Represent, then M g 、M p 、M s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A ij =1 representative of Gene i regulatory Gene j, A ij =0 represents gene i without regulatory gene j;
obtaining a weighted feature matrix by utilizing multilayer graph convolution, wherein the multilayer graph convolution refers to the calculation depth of graph convolution, namely iteration times; information propagation between each layer in the multi-layer graph convolution network is represented as follows:
Figure BDA0003897845710000021
wherein
Figure BDA0003897845710000022
I denotes a matrix of units, I being,
Figure BDA0003897845710000023
is that
Figure BDA0003897845710000024
Is a diagonal matrix with diagonal elements passing through
Figure BDA0003897845710000025
Calculating to obtain; w is a gene weight matrix, and is initialized to be equal to all gene weights; h (l) Is a characteristic of each of the layers that,for input layer H (l) Is the omics feature to be analyzed F om Om represents omics;
aiming at each omics feature, inputting an adjacent matrix A and an initial weight W of a global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration
Figure BDA0003897845710000026
om represents different omics;
if there are multiple groups of the omics characteristics, the omics characteristics are linearly combined, such as formula (5), and then the integrated characteristics are input into a classifier unit;
Figure BDA0003897845710000027
wherein the content of the first and second substances,
Figure BDA0003897845710000028
a feature matrix representing the weight corresponding to each omic in the multiomics; i represents that a plurality of groups of mathematical characteristics are linearly combined;
if it is a single group of learning time,
Figure BDA0003897845710000029
is that corresponding
Figure BDA00038978457100000210
A classifier prediction unit: targeting multiomic data
Figure BDA00038978457100000211
Respectively using SVM-rbf, SVM-poly and RF classifiers for prediction;
an ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
Further, the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.
Further, the convolutional neural network CNN includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;
further, the convolutional neural network CNN is obtained by training lung cancer imaging data acquired by using a public database.
Further, the deep neural network DNN includes: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;
wherein, hidden represents a Hidden layer.
Further, in information propagation between each layer of the layer graph convolution network
Figure BDA00038978457100000212
By the following D ij The replacement results in:
Figure BDA0003897845710000031
where Din represents an in-degree matrix of the node, and in-degree is an edge pointing to the node.
Further, the process of distributing the weights for CNN, DNN, SVM-rbf, SVM-poly and RF by the integrated learning automatic weight balance unit comprises the following steps:
the Bayesian modeling is utilized to derive the weight of the joint multi-classifier, and the prediction loss of the prediction task of a single classifier is calculated through cross entropy
Figure BDA0003897845710000032
Then predicting the loss of prediction of the task by the single classifier
Figure BDA0003897845710000033
Weighted summation:
Figure BDA0003897845710000034
wherein the content of the first and second substances,
Figure BDA0003897845710000035
predicting the loss, w, for each individual classifier ω Weights corresponding to individual classifiers, ω represents ω classifiers, w 1:ω Representing the weights used by all classifier pairs; pred ω Indicating the prediction result, pred, of each classifier 1:ω The prediction results of all classifiers are obtained; 2log (w) 1 *…*w ω ) Is a penalty term;
in the training process, the formula (7) is solved to pred 1:ω Under the condition that
Figure BDA0003897845710000036
Task weight w with minimum value 1:ω Solving formula (7) by gradient descent method to automatically generate task weight w 1:ω
Further, SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier through a grid search method.
A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the system for early lung cancer diagnosis based on multiple sets of mathematical features.
An early lung cancer diagnosis device based on multiple group chemical characteristics comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded by the processor and executed to realize the early lung cancer diagnosis system based on multiple group chemical characteristics.
Has the advantages that:
the core of the invention is to use image data obtained by imaging and radiology, and use genome, transcriptome and other data to carry out ensemble learning, thus effectively improving the accuracy of early lung cancer diagnosis. The method can greatly improve the prediction accuracy rate aiming at early lung cancer screening.
Drawings
FIG. 1 is a schematic view of the overall process of the present invention.
FIG. 2 is a schematic diagram of the multi-component features of FIG. 1.
Fig. 3 is a schematic structural diagram of a convolutional neural network CNN.
Fig. 4 is a schematic structural diagram of a deep neural network DNN.
Detailed Description
The first embodiment is as follows: the present embodiment is described with reference to figures 1 and 2,
the early lung cancer diagnosis system based on the multiple mathematical characteristics according to the embodiment includes:
a digital image processing unit: converting the iconography data into matrix-form iconography data;
aiming at the imaging data such as X-ray, CT (computed tomography) and the like, radiology data and the like, firstly, converting an image into a matrix form, wherein each element of the matrix corresponds to a pixel at a corresponding position in a graph, and the value of each element corresponds to the gray value of the pixel;
and obtaining corresponding gray values by a method of superposing three primary color channels for the color image.
The digital image has translation invariance, namely, the local information of the image is processed regardless of the position of the part of the information on the image, for example, when a Convolutional Neural Network (CNN) model is used for distinguishing cancer cells in a tissue section, the model makes a distinction according to the characteristics of cell morphology and the like regardless of the position of the cancer cells.
A neural network prediction unit: predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) respectively aiming at the imaging data converted into the matrix form;
inputting the image converted into the matrix form into a trained convolutional neural network CNN; the CNN structure is shown in fig. 3, and includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;
a first convolution layer (7 x 7, conv, 64), a first pooling layer (0.5), each convolution layer from the first residual module to the third residual module being (3 x 3, conv, 64); the convolution layers of the fifth residual block are all (3 × 3, conv, 128), and the convolution layers of the fourth residual block are (3 × 3, conv,128, 0.5), respectively, in order to make the matrix sizes before and after addition consistent. 0.5 of the convolutional layer parameters of the fourth residual module is the inter-layer scaling parameter, 64=128 × 0.5.
The layers in fig. 2 have the following meanings: conv is the convolutional layer, the preceding number is the perceptual domain, the following number is the batch size; the FC is a full connection layer and plays a role of a classifier in the whole convolutional neural network; "+" is a residual operation; "+'" is a residual operation with convolution, i.e., a residual block in order to make the matrix size consistent before and after addition;
the invention uses lung cancer imaging data obtained by a public database (https:// www. Cancerigingachive. Net/collections /) to train and obtain the convolutional neural network CNN.
Inputting the image converted into the matrix form into a trained deep neural network DNN; the DNN structure is shown in fig. 4, and includes an input layer, a first Hidden layer, a second Hidden layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, a first pooling layer, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and an output layer;
wherein, hidden is a Hidden layer; the Pool layer can reduce the space size of a data body, so that the number of parameters in a network can be reduced, the consumption of computing resources is reduced, and overfitting can be effectively controlled; the FC is a full connection layer, plays a role of a classifier in the whole neural network, and outputs a prediction result through the FC layer.
Omics feature processing unit: for eachOmics characteristics, namely obtaining a weighted characteristic matrix of a global gene relation matrix by utilizing a graph convolution network so as to obtain multiple groups of characteristics
Figure BDA0003897845710000059
Knowledge networks such as gene-gene regulatory networks, protein-protein interaction networks (ppi), and biomolecule signaling pathways (pathway) in cancer research belong to the topology structure.
The topological graph is composed of nodes (nodes) and edges (edges), the edges of the connected nodes represent the relationship among the nodes, the edges in the graph can have directions, when the edges in the topological graph are all undirected edges, the graph is called as an undirected graph, when the edges in the topological graph have directions, the graph is called as a directed graph, the directed graph can also be converted into a matrix form and is usually represented by a square matrix, the value of (i, j) in the matrix is a relationship coefficient from the Node i to the Node j, the relationship matrix of the directed graph is a symmetric matrix, and the relationship matrix of the undirected graph is usually an asymmetric matrix.
The topological Graph has an irregular data structure, the topological structures around each node are different from each other, and the topological Graph does not have translation invariance, so that the traditional Convolutional neural Network cannot be applied to an undirected Graph.
The essence of the graph convolution network is a first-order local approximation of spectrum convolution based on information propagation of a topological structure, in a multilayer graph convolution network (multilayer graph convolution refers to the calculation depth of graph convolution and is iteration times), each layer only processes first-order neighborhood information of a node, information propagation of a multi-order neighborhood is realized through superposition of a plurality of layers, and an information propagation rule between each layer is as follows:
Figure BDA0003897845710000051
wherein the content of the first and second substances,
Figure BDA0003897845710000052
a is a gene relation network communication matrix, aij =1 represents a gene i to regulate a gene j, and Aij =0 represents the gene i to regulate the gene j; w is a gene weight matrix, and is initialized to be equal to all gene weights; h (l) is the characteristic of each layer, and for the input layer, H (l) is the omics characteristic Fom to be analyzed, and om represents omics;
Figure BDA0003897845710000053
is that
Figure BDA00038978457100000510
The degree matrix of (1) is a diagonal matrix, the elements on the diagonal of which pass through
Figure BDA0003897845710000055
And (4) calculating.
Because of the pair
Figure BDA0003897845710000056
The operation of (2) is more complicated, and in order to simplify the information propagation operation, the digraph is improved as follows, and Dij in the formula (2) is used for replacing the Dij in the formula (1)
Figure BDA0003897845710000057
Din in the formula represents an in-degree matrix of a node, in-degree refers to an edge pointing to the node (out-degree corresponding to in-degree is an edge sent by the node, and is not used here):
Figure BDA0003897845710000058
the method comprises the following specific steps:
first, a matrix a (adjacency matrix) in formula (1) is obtained by a priori knowledge modeling. For gene regulatory networks, the topology N is used g-g (Gene, regulation) wherein a node is a Gene, and an edge is a Gene Regulation relationship, and is a directed graph; for protein relationship networks, the topology N is used p-p (Protein, interaction) indicates that the node is Protein, the edge is Protein Interaction relation, and the node is an undirected graph; topology for biological Signal pathway N s (Signature, definition), where nodes are biological signals and edges are signal interactions, is a directed graph. The three kinds of prior knowledge have the following relationship: for each protein in the protein interaction network that results from the transcriptional translation of the corresponding gene, nodes in the biological signaling pathway include, but are not limited to, gene and protein signals. Expressing the prior knowledge as a matrix M g 、M p 、M s Global gene relationship matrix M consisting of M g M p M s And acquiring a union set.
Aiming at each omics feature, inputting the adjacent matrix A and the initial weight W of the global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration (GCN performance is reduced due to excessive layers)
Figure BDA0003897845710000061
om represents different omics; w (3) Namely the feature weights obtained through 3-layer GCN iteration.
If there are multiple sets of mathematical features, the linear combinations of features, such as equation (5), are then input to the classifier unit.
Figure BDA0003897845710000062
Wherein the content of the first and second substances,
Figure BDA0003897845710000063
a feature matrix representing the weighting corresponding to each omic in the multiomic; | | indicates that there will be multiple sets of mathematical features that are linearly combined;
if it is a single group of school hours,
Figure BDA0003897845710000064
is that corresponding
Figure BDA0003897845710000065
A classifier prediction unit: targeting multiomic data
Figure BDA0003897845710000066
Respectively using SVM-rbf, SVM-poly and RF classifiers to predict;
the classifier is obtained from a scipit-leam (https:// scipit-lean. Org/stable /) packet. The classifier parameters are adjusted using a grid search method. A grid search (GridSearch) is used to select the optimal hyper-parameters of the model. The mode of obtaining the optimal hyper-parameters can be used for drawing a verification curve, but the verification curve can only obtain one optimal hyper-parameter each time. If there are many permutation combinations of the plurality of hyper-parameters, a grid search may be used to find the combination of the optimal hyper-parameters.
An ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
ensemble learning involves integrating multiple classification tasks, and one common method is to weight and sum classifier results and implement inter-task coordination by setting different weights for the tasks. The simplest method is to set the weights of all tasks to be equal, however, the average weight method is only effective when no competition exists among the tasks, and in an actual situation, the tasks are naturally unbalanced due to a plurality of reasons such as different classifier precision and priori knowledge utilization capacity. Therefore, the invention utilizes Bayesian modeling to deduce the weight of the combined multi-classifier, and through a Bayesian task weight learning device, the balance among tasks is automatically realized.
The Bayes multitask weight learning method is briefly described as follows, and is embodied by a multitask loss function:
defining a multitask penalty as
Figure BDA0003897845710000067
Total loss prediction loss for each individual classifier
Figure BDA0003897845710000068
Weight of (w) ω ) And, ω represents that there are ω classifiers.
Calculating the prediction loss of a single classifier prediction task by equation (6)
Figure BDA0003897845710000071
Figure BDA0003897845710000072
Where n is the total number of instances used to train the classifier weights;
equation (6) is a generalized representation that can substitute predictions for any one classifier pair.
Equation (7) is a weighted summation representation of equation (6) that contains the predictions for all classifiers used:
Figure BDA0003897845710000073
wherein, y i Is a true label (the TCGA example is a labeled example, the label is the diagnosis of lung cancer, such as 'no lung cancer', 'early lung cancer', 'intermediate lung cancer', 'late lung cancer', and the like, and the label is 1 for the lung cancer and 0 for the no lung cancer);
Figure BDA0003897845710000074
is a single task to a feature
Figure BDA0003897845710000075
Is predicted (the prediction result is [0,1 ]]A number in the interval closer to 1 indicates a higher likelihood of lung cancer and closer to 0 indicates a higher likelihood of no lung cancer), although CNN and DNN are not characteristic
Figure BDA0003897845710000079
For convenience of illustration, the prediction results of CNN and DNN are also used for the prediction
Figure BDA0003897845710000076
Representing;
meanwhile, it is to be noted that: in order to make multitask lose
Figure BDA0003897845710000077
The weighting coefficient is not equal to 0, and is positive, and a weighting form with square as denominator is adopted to facilitate the operation.
It should be noted that, labels of "early lung cancer", "intermediate lung cancer" and "late lung cancer" exist in the training process, and actually, in the practical application process of the present invention, results of "early lung cancer", "intermediate lung cancer" and "late lung cancer" are also obtained. That is, the present invention may actually predict intermediate-stage lung cancer and advanced-stage lung cancer. In actual clinical practice, generally, lung cancer of middle stage and lung cancer of late stage are relatively easy to judge (the characteristics in the image are very obvious), and only lung cancer of early stage is not easy to identify, and the accuracy rate is low. Therefore, the invention also prepares labels of 'middle-stage lung cancer' and 'late-stage lung cancer' in the training process. The accuracy can be greatly improved when the early lung cancer is predicted by using the method.
The purpose of the training is to make formula (7) at pred 1:ω Under the conditions of
Figure BDA00038978457100000710
Task weight w with minimum value 1:ω ,2log(w 1 *…*w ω ) Is a punishment item, the formula (7) is solved by using a gradient descent method, and the task weight w is automatically generated 1:ω
The classifier results are as in equation (8):
Figure BDA0003897845710000078
vote [. Cndot. ] is voting operation, the prediction results of each classifier are weighted and then voted, and more models output lung cancer results when lung cancer exists, otherwise lung cancer-free results are output.
Early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
The second embodiment is as follows:
the present embodiment is a computer storage medium having at least one instruction stored therein, the at least one instruction being loaded and executed by a processor to implement the multiple mathematical features based early lung cancer diagnosis system.
It should be understood that any method described herein, including any methods described herein, may accordingly be provided as a computer program product, software, or computerized method, which may include a non-transitory machine-readable medium having stored thereon instructions, which may be used to program a computer system, or other electronic device. Storage media may include, but is not limited to, magnetic storage media, optical storage media; a magneto-optical storage medium comprising: read only memory ROM, random access memory RAM, erasable programmable memory (e.g., EPROM and EEPROM), and flash memory layers; or other type of media suitable for storing electronic instructions.
The third concrete implementation mode:
the embodiment is an early lung cancer diagnosis device based on multiple groups of chemical characteristics, the device comprises a processor and a memory, and it should be understood that the device comprises any device comprising the processor and the memory, which is described in the present invention, and can also comprise other units and modules which perform display, interaction, processing, control, etc. and other functions through signals or instructions;
the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the multiple mathematical features based early lung cancer diagnosis system.
The above-described calculation examples of the present invention are merely to describe the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims (10)

1. An early lung cancer diagnostic system based on multiple mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;
a neural network prediction unit: respectively predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) aiming at the imaging data converted into the matrix form;
omics feature processing unit: aiming at each omics feature, a weighted feature matrix of a global gene relation matrix is obtained by utilizing a graph convolution network, and then a plurality of groups of omics features are obtained
Figure FDA0003897845700000011
Obtaining a weighted feature matrix of the global genetic relationship matrix by using a graph convolution network, and further obtaining multigroup characteristics
Figure FDA00038978457000000113
Comprises the following steps:
using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network g 、M p 、M s Represent, then M g 、M p 、M s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A ij =1 represents Gene i regulatory Gene j, A ij =0 represents gene i without regulatory gene j;
obtaining a weighted feature matrix by utilizing multilayer graph convolution, wherein the multilayer graph convolution refers to the calculation depth of graph convolution, namely iteration times; information propagation between each layer in the multi-layer graph convolution network is represented as follows:
Figure FDA0003897845700000012
wherein
Figure FDA0003897845700000013
I denotes a unit matrix of the cell,
Figure FDA0003897845700000014
is that
Figure FDA0003897845700000015
Is a diagonal matrix with diagonal elements passing through
Figure FDA0003897845700000016
Calculating to obtain; w is a gene weight matrix, and initialization is that all gene weights are equal; h (l) Is a feature of each layer, for input layer H (l) Is the omics feature to be analyzed F om Om represents omics;
aiming at each omics feature, inputting an adjacent matrix A and an initial weight W of a global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration
Figure FDA0003897845700000017
om represents different omics;
if there are multiple groups of the omics characteristics, the omics characteristics are linearly combined, such as formula (5), and then the integrated characteristics are input into a classifier unit;
Figure FDA0003897845700000018
wherein the content of the first and second substances,
Figure FDA0003897845700000019
a feature matrix representing the weighting corresponding to each omic in the multiomic; i represents that a plurality of groups of mathematical characteristics are linearly combined;
if it is a single group of school hours,
Figure FDA00038978457000000110
is that corresponding
Figure FDA00038978457000000111
A classifier prediction unit: targeting multiomic data
Figure FDA00038978457000000112
Respectively using SVM-rbf, SVM-poly and RF classifiers for prediction;
an ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;
early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.
2. The early lung cancer diagnosis system based on multiple sets of mathematical features according to claim 1, wherein the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.
3. The system according to claim 2, wherein the convolutional neural network CNN comprises: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module includes two convolutional layers.
4. The system of claim 3, wherein the convolutional neural network CNN is trained by using lung cancer imaging data obtained from a public database.
5. The system according to claim 4, wherein the deep neural network DNN comprises: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;
wherein, hidden represents a Hidden layer.
6. The multi-component signature based early stage lung cancer diagnosis system of claim 1, 2, 3, 4 or 5, wherein the layer map convolution network is in information propagation between each layer
Figure FDA0003897845700000026
By the following D ij The replacement results in:
Figure FDA0003897845700000021
where Din represents an in-degree matrix of the node, and the in-degree is an edge pointing to the node.
7. The system of claim 6, wherein the process of assigning weights for CNNs, DNNs, SVM-rbf, SVM-poly, RF by the integrated learning automatic weight balancing unit comprises the steps of:
the Bayesian modeling is utilized to derive the weight of the joint multi-classifier, and the prediction loss of the prediction task of a single classifier is calculated through cross entropy
Figure FDA0003897845700000022
Then predicting the loss of prediction of the task by the single classifier
Figure FDA0003897845700000023
Weighted summation:
Figure FDA0003897845700000024
wherein the content of the first and second substances,
Figure FDA0003897845700000025
(. For each individual classifier, w ω Weights corresponding to individual classifiers, ω represents ω classifiers, w 1:ω Representing the weights used by all classifier pairs; pred ω Indicating the prediction result, pred, of each classifier 1:ω The prediction results of all classifiers are obtained; 2log (w) 1 *…*w ω ) Is a penalty term;
in the training process, the formula (7) is calculated to be pred 1:ω Under the condition that
Figure FDA0003897845700000031
Task weight w with minimum value 1:ω Solving formula (7) by gradient descent method to automatically generate task weight w 1:ω
8. The multi-group signature-based early lung cancer diagnosis system of claim 7, wherein the SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier by a grid search method.
9. A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a multi-set mathematical signature-based early lung cancer diagnostic system according to any one of claims 1 to 8.
10. An early lung cancer diagnosis apparatus based on multiple sets of mathematical features, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the early lung cancer diagnosis system based on multiple sets of mathematical features according to one of claims 1 to 8.
CN202211280689.9A 2022-10-19 2022-10-19 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics Active CN115631847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211280689.9A CN115631847B (en) 2022-10-19 2022-10-19 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211280689.9A CN115631847B (en) 2022-10-19 2022-10-19 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics

Publications (2)

Publication Number Publication Date
CN115631847A true CN115631847A (en) 2023-01-20
CN115631847B CN115631847B (en) 2023-07-14

Family

ID=84906468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211280689.9A Active CN115631847B (en) 2022-10-19 2022-10-19 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics

Country Status (1)

Country Link
CN (1) CN115631847B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115938592A (en) * 2023-03-09 2023-04-07 成都信息工程大学 Cancer prognosis prediction method based on local enhancement graph convolution network

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839412A (en) * 2014-03-27 2014-06-04 北京建筑大学 Combined estimation method for road junction dynamic steering proportion based on Bayes weighting
CA2974199A1 (en) * 2015-01-20 2016-07-28 Nantomics, Llc Systems and methods for response prediction to chemotherapy in high grade bladder cancer
CN111028939A (en) * 2019-11-15 2020-04-17 华南理工大学 Multigroup intelligent diagnosis system based on deep learning
WO2020113673A1 (en) * 2018-12-07 2020-06-11 深圳先进技术研究院 Cancer subtype classification method employing multiomics integration
US20200381083A1 (en) * 2019-05-31 2020-12-03 410 Ai, Llc Estimating predisposition for disease based on classification of artificial image objects created from omics data
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
AU2020103613A4 (en) * 2020-11-23 2021-02-04 Agricultural Information and Rural Economic Research Institute of Sichuan Academy of Agricultural Sciences Cnn and transfer learning based disease intelligent identification method and system
CN112925984A (en) * 2021-04-02 2021-06-08 吉林大学 GCN recommendation-based sample density aggregation method
WO2021226778A1 (en) * 2020-05-11 2021-11-18 浙江大学 Epileptic electroencephalogram recognition system based on hierarchical graph convolutional neural network, terminal, and storage medium
CN114154557A (en) * 2021-11-08 2022-03-08 中央财经大学 Cancer tissue classification method, apparatus, electronic device, and storage medium
CA3131843A1 (en) * 2020-09-25 2022-03-25 Royal Bank Of Canada System and method for structure learning for graph neural networks
CN114418174A (en) * 2021-12-13 2022-04-29 国网陕西省电力公司电力科学研究院 Electric vehicle charging load prediction method
CN114530222A (en) * 2022-01-13 2022-05-24 华南理工大学 Cancer patient classification system based on multiomics and image data fusion
US20220222931A1 (en) * 2019-06-06 2022-07-14 NEC Laboratories Europe GmbH Diversity-aware weighted majority vote classifier for imbalanced datasets
CN114927162A (en) * 2022-05-19 2022-08-19 大连理工大学 Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839412A (en) * 2014-03-27 2014-06-04 北京建筑大学 Combined estimation method for road junction dynamic steering proportion based on Bayes weighting
CA2974199A1 (en) * 2015-01-20 2016-07-28 Nantomics, Llc Systems and methods for response prediction to chemotherapy in high grade bladder cancer
WO2020113673A1 (en) * 2018-12-07 2020-06-11 深圳先进技术研究院 Cancer subtype classification method employing multiomics integration
US20200381083A1 (en) * 2019-05-31 2020-12-03 410 Ai, Llc Estimating predisposition for disease based on classification of artificial image objects created from omics data
US20220222931A1 (en) * 2019-06-06 2022-07-14 NEC Laboratories Europe GmbH Diversity-aware weighted majority vote classifier for imbalanced datasets
CN111028939A (en) * 2019-11-15 2020-04-17 华南理工大学 Multigroup intelligent diagnosis system based on deep learning
WO2021226778A1 (en) * 2020-05-11 2021-11-18 浙江大学 Epileptic electroencephalogram recognition system based on hierarchical graph convolutional neural network, terminal, and storage medium
CA3131843A1 (en) * 2020-09-25 2022-03-25 Royal Bank Of Canada System and method for structure learning for graph neural networks
CN112201346A (en) * 2020-10-12 2021-01-08 哈尔滨工业大学(深圳) Cancer survival prediction method, apparatus, computing device and computer-readable storage medium
AU2020103613A4 (en) * 2020-11-23 2021-02-04 Agricultural Information and Rural Economic Research Institute of Sichuan Academy of Agricultural Sciences Cnn and transfer learning based disease intelligent identification method and system
CN112925984A (en) * 2021-04-02 2021-06-08 吉林大学 GCN recommendation-based sample density aggregation method
CN114154557A (en) * 2021-11-08 2022-03-08 中央财经大学 Cancer tissue classification method, apparatus, electronic device, and storage medium
CN114418174A (en) * 2021-12-13 2022-04-29 国网陕西省电力公司电力科学研究院 Electric vehicle charging load prediction method
CN114530222A (en) * 2022-01-13 2022-05-24 华南理工大学 Cancer patient classification system based on multiomics and image data fusion
CN114927162A (en) * 2022-05-19 2022-08-19 大连理工大学 Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
仝宗和;袁立宁;王洋;: "图卷积神经网络理论与应用", 信息技术与信息化, no. 02, pages 193 - 198 *
李昊天;盛益强;: "单时序特征图卷积网络融合预测方法", 计算机与现代化, no. 09, pages 36 - 40 *
杨博雄 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115938592A (en) * 2023-03-09 2023-04-07 成都信息工程大学 Cancer prognosis prediction method based on local enhancement graph convolution network
CN115938592B (en) * 2023-03-09 2023-05-05 成都信息工程大学 Cancer prognosis prediction method based on local enhancement graph convolution network

Also Published As

Publication number Publication date
CN115631847B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
He et al. Ensemble transfer CNNs driven by multi-channel signals for fault diagnosis of rotating machinery cross working conditions
CN111739075B (en) Deep network lung texture recognition method combining multi-scale attention
US20220367053A1 (en) Multimodal fusion for diagnosis, prognosis, and therapeutic response prediction
Li et al. Stacked-autoencoder-based model for COVID-19 diagnosis on CT images
CN113990495A (en) Disease diagnosis prediction system based on graph neural network
CN110660478A (en) Cancer image prediction and discrimination method and system based on transfer learning
Widiyanto et al. Implementation of convolutional neural network method for classification of diseases in tomato leaves
CN111274903A (en) Cervical cell image classification method based on graph convolution neural network
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN112132818A (en) Image processing method for constructing three stages based on graph convolution neural network
Savino et al. Automated classification of civil structure defects based on convolutional neural network
CN115631847A (en) Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
CN115564114A (en) Short-term prediction method and system for airspace carbon emission based on graph neural network
CN115879607A (en) Electric energy meter state prediction method, system, equipment and storage medium
CN117153268A (en) Cell category determining method and system
Liu et al. Research on cassava disease classification using the multi-scale fusion model based on EfficientNet and attention mechanism
CN112966770B (en) Fault prediction method and device based on integrated hybrid model and related equipment
Tyagi et al. LCSCNet: A multi-level approach for lung cancer stage classification using 3D dense convolutional neural networks with concurrent squeeze-and-excitation module
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
CN112733724B (en) Relativity relationship verification method and device based on discrimination sample meta-digger
CN117408167A (en) Debris flow disaster vulnerability prediction method based on deep neural network
Patra et al. Deep learning methods for scientific and industrial research
CN108846327B (en) Intelligent system and method for distinguishing pigmented nevus and melanoma
CN115907079A (en) Airspace traffic flow prediction method based on attention space-time diagram convolution network
CN113476065B (en) Multiclass pneumonia diagnostic system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant