CN115631847A

CN115631847A - Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment

Info

Publication number: CN115631847A
Application number: CN202211280689.9A
Authority: CN
Inventors: 赵天意; 许伊宁; 刘博�; 王亚东
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-01-20
Anticipated expiration: 2042-10-19
Also published as: CN115631847B

Abstract

Early lung cancer diagnosis system, storage medium and equipment based on multiple mathematical characteristics belong to the technical field of cancer diagnosis. The method aims to solve the problem that the early lung cancer screening only using clinical images is low in accuracy. The system comprises a neural network prediction unit, an omic feature processing unit and a classifier prediction unit, wherein the neural network prediction unit is used for predicting the imaging data converted into a matrix form by respectively using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN), the omic feature processing unit is used for obtaining a weighted feature matrix of a global gene relation matrix by using a graph convolutional network so as to obtain a plurality of groups of chemical features for each omic feature, and the classifier prediction unit is used for predicting the multi-omic data by respectively using a multi-classifier. The invention is suitable for early lung cancer diagnosis.

Description

Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment

Technical Field

The invention belongs to the technical field of cancer diagnosis, and particularly relates to an early lung cancer diagnosis system, a storage medium and equipment.

Background

Early stage lung cancer lacks typical symptoms, and the definitive diagnosis of lung cancer is often judged by pathologists based on clinical symptoms, signs, imaging examinations, and histopathological examinations. The early diagnosis of the lung cancer has important significance, the lung cancer can be diagnosed and treated at the early stage of pathological changes, the best curative effect can be obtained, and the cure rate of the early stage lung cancer is far higher than that of the middle and late stage lung cancer. Weijian committee recommends carrying out chest X-ray film or CT examination every year on people over 40 years of age, which is helpful for detecting early lung cancer.

However, the early lung cancer screening only using clinical images has certain limitations and low accuracy, and the diagnosis result depends on the judgment of a pathologist. For the central lung cancer, the sputum cytology can effectively detect the fallen cancer cells, and has the characteristics of easy sampling and low cost. In recent years, with the development of sequencing technology, the cost of gene detection is greatly reduced, and multiple omics characteristics such as transcriptomics, genomics, proteomics and the like are provided for the detection of early lung cancer, so that the requirement of a machine learning model for processing multiomics data is generated, and the early lung cancer diagnosis auxiliary system is clinically required to have high efficiency, accuracy, comprehensiveness and high biological interpretability.

Disclosure of Invention

The invention aims to solve the problem of low accuracy in early lung cancer screening only by using clinical images.

An early lung cancer diagnostic system based on multiple sets of mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;

a neural network prediction unit: respectively predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) aiming at the imaging data converted into the matrix form;

omics feature processing unit: aiming at each omic feature, a weighted feature matrix of a global genetic relationship matrix is obtained by utilizing a graph convolution network, and then multiple groups of mathematical features are obtained

Obtaining a weighted feature matrix of the global gene relation matrix by using the graph convolution network, and further obtaining a plurality of groups of mathematical features

Comprises the following steps:

using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network _g 、M _p 、M _s Represent, then M _g 、M _p 、M _s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A _ij =1 representative of Gene i regulatory Gene j, A _ij =0 represents gene i without regulatory gene j;

obtaining a weighted feature matrix by utilizing multilayer graph convolution, wherein the multilayer graph convolution refers to the calculation depth of graph convolution, namely iteration times; information propagation between each layer in the multi-layer graph convolution network is represented as follows:

wherein

I denotes a matrix of units, I being,

is that

Is a diagonal matrix with diagonal elements passing through

Calculating to obtain; w is a gene weight matrix, and is initialized to be equal to all gene weights; h ^(l) Is a characteristic of each of the layers that,for input layer H ^(l) Is the omics feature to be analyzed F ^om Om represents omics;

aiming at each omics feature, inputting an adjacent matrix A and an initial weight W of a global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration

om represents different omics;

if there are multiple groups of the omics characteristics, the omics characteristics are linearly combined, such as formula (5), and then the integrated characteristics are input into a classifier unit;

wherein the content of the first and second substances,

a feature matrix representing the weight corresponding to each omic in the multiomics; i represents that a plurality of groups of mathematical characteristics are linearly combined;

if it is a single group of learning time,

is that corresponding

A classifier prediction unit: targeting multiomic data

Respectively using SVM-rbf, SVM-poly and RF classifiers for prediction;

an ensemble learning automatic weight balancing unit: distributing weights for CNN, DNN, SVM-rbf, SVM-poly and RF;

early lung cancer diagnosis machine learning model unit: voting is carried out on the early lung cancer diagnosis result according to the CNN, DNN, SVM-rbf, SVM-poly and RF which are distributed with the weight, and finally the early lung cancer diagnosis result is determined.

Further, the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.

Further, the convolutional neural network CNN includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;

further, the convolutional neural network CNN is obtained by training lung cancer imaging data acquired by using a public database.

Further, the deep neural network DNN includes: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;

wherein, hidden represents a Hidden layer.

Further, in information propagation between each layer of the layer graph convolution network

By the following D _ij The replacement results in:

where Din represents an in-degree matrix of the node, and in-degree is an edge pointing to the node.

Further, the process of distributing the weights for CNN, DNN, SVM-rbf, SVM-poly and RF by the integrated learning automatic weight balance unit comprises the following steps:

the Bayesian modeling is utilized to derive the weight of the joint multi-classifier, and the prediction loss of the prediction task of a single classifier is calculated through cross entropy

Then predicting the loss of prediction of the task by the single classifier

Weighted summation:

wherein the content of the first and second substances,

predicting the loss, w, for each individual classifier _ω Weights corresponding to individual classifiers, ω represents ω classifiers, w _1：ω Representing the weights used by all classifier pairs; pred _ω Indicating the prediction result, pred, of each classifier _1：ω The prediction results of all classifiers are obtained; 2log (w) ₁ *…*w _ω ) Is a penalty term;

in the training process, the formula (7) is solved to pred _1：ω Under the condition that

Task weight w with minimum value _1：ω Solving formula (7) by gradient descent method to automatically generate task weight w _1：ω 。

Further, SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier through a grid search method.

A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the system for early lung cancer diagnosis based on multiple sets of mathematical features.

An early lung cancer diagnosis device based on multiple group chemical characteristics comprises a processor and a memory, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded by the processor and executed to realize the early lung cancer diagnosis system based on multiple group chemical characteristics.

Has the advantages that:

the core of the invention is to use image data obtained by imaging and radiology, and use genome, transcriptome and other data to carry out ensemble learning, thus effectively improving the accuracy of early lung cancer diagnosis. The method can greatly improve the prediction accuracy rate aiming at early lung cancer screening.

Drawings

FIG. 1 is a schematic view of the overall process of the present invention.

FIG. 2 is a schematic diagram of the multi-component features of FIG. 1.

Fig. 3 is a schematic structural diagram of a convolutional neural network CNN.

Fig. 4 is a schematic structural diagram of a deep neural network DNN.

Detailed Description

The first embodiment is as follows: the present embodiment is described with reference to figures 1 and 2,

the early lung cancer diagnosis system based on the multiple mathematical characteristics according to the embodiment includes:

a digital image processing unit: converting the iconography data into matrix-form iconography data;

aiming at the imaging data such as X-ray, CT (computed tomography) and the like, radiology data and the like, firstly, converting an image into a matrix form, wherein each element of the matrix corresponds to a pixel at a corresponding position in a graph, and the value of each element corresponds to the gray value of the pixel;

and obtaining corresponding gray values by a method of superposing three primary color channels for the color image.

The digital image has translation invariance, namely, the local information of the image is processed regardless of the position of the part of the information on the image, for example, when a Convolutional Neural Network (CNN) model is used for distinguishing cancer cells in a tissue section, the model makes a distinction according to the characteristics of cell morphology and the like regardless of the position of the cancer cells.

A neural network prediction unit: predicting by using a Convolutional Neural Network (CNN) and a Deep Neural Network (DNN) respectively aiming at the imaging data converted into the matrix form;

inputting the image converted into the matrix form into a trained convolutional neural network CNN; the CNN structure is shown in fig. 3, and includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module comprises two convolution layers;

a first convolution layer (7 x 7, conv, 64), a first pooling layer (0.5), each convolution layer from the first residual module to the third residual module being (3 x 3, conv, 64); the convolution layers of the fifth residual block are all (3 × 3, conv, 128), and the convolution layers of the fourth residual block are (3 × 3, conv,128, 0.5), respectively, in order to make the matrix sizes before and after addition consistent. 0.5 of the convolutional layer parameters of the fourth residual module is the inter-layer scaling parameter, 64=128 × 0.5.

The layers in fig. 2 have the following meanings: conv is the convolutional layer, the preceding number is the perceptual domain, the following number is the batch size; the FC is a full connection layer and plays a role of a classifier in the whole convolutional neural network; "+" is a residual operation; "+'" is a residual operation with convolution, i.e., a residual block in order to make the matrix size consistent before and after addition;

the invention uses lung cancer imaging data obtained by a public database (https:// www. Cancerigingachive. Net/collections /) to train and obtain the convolutional neural network CNN.

Inputting the image converted into the matrix form into a trained deep neural network DNN; the DNN structure is shown in fig. 4, and includes an input layer, a first Hidden layer, a second Hidden layer, a first convolutional layer, a second convolutional layer, a third convolutional layer, a first pooling layer, a first fully-connected layer, a second fully-connected layer, a third fully-connected layer, and an output layer;

wherein, hidden is a Hidden layer; the Pool layer can reduce the space size of a data body, so that the number of parameters in a network can be reduced, the consumption of computing resources is reduced, and overfitting can be effectively controlled; the FC is a full connection layer, plays a role of a classifier in the whole neural network, and outputs a prediction result through the FC layer.

Omics feature processing unit: for eachOmics characteristics, namely obtaining a weighted characteristic matrix of a global gene relation matrix by utilizing a graph convolution network so as to obtain multiple groups of characteristics

Knowledge networks such as gene-gene regulatory networks, protein-protein interaction networks (ppi), and biomolecule signaling pathways (pathway) in cancer research belong to the topology structure.

The topological graph is composed of nodes (nodes) and edges (edges), the edges of the connected nodes represent the relationship among the nodes, the edges in the graph can have directions, when the edges in the topological graph are all undirected edges, the graph is called as an undirected graph, when the edges in the topological graph have directions, the graph is called as a directed graph, the directed graph can also be converted into a matrix form and is usually represented by a square matrix, the value of (i, j) in the matrix is a relationship coefficient from the Node i to the Node j, the relationship matrix of the directed graph is a symmetric matrix, and the relationship matrix of the undirected graph is usually an asymmetric matrix.

The topological Graph has an irregular data structure, the topological structures around each node are different from each other, and the topological Graph does not have translation invariance, so that the traditional Convolutional neural Network cannot be applied to an undirected Graph.

The essence of the graph convolution network is a first-order local approximation of spectrum convolution based on information propagation of a topological structure, in a multilayer graph convolution network (multilayer graph convolution refers to the calculation depth of graph convolution and is iteration times), each layer only processes first-order neighborhood information of a node, information propagation of a multi-order neighborhood is realized through superposition of a plurality of layers, and an information propagation rule between each layer is as follows:

wherein the content of the first and second substances,

a is a gene relation network communication matrix, aij =1 represents a gene i to regulate a gene j, and Aij =0 represents the gene i to regulate the gene j; w is a gene weight matrix, and is initialized to be equal to all gene weights; h (l) is the characteristic of each layer, and for the input layer, H (l) is the omics characteristic Fom to be analyzed, and om represents omics;

is that

The degree matrix of (1) is a diagonal matrix, the elements on the diagonal of which pass through

And (4) calculating.

Because of the pair

The operation of (2) is more complicated, and in order to simplify the information propagation operation, the digraph is improved as follows, and Dij in the formula (2) is used for replacing the Dij in the formula (1)

Din in the formula represents an in-degree matrix of a node, in-degree refers to an edge pointing to the node (out-degree corresponding to in-degree is an edge sent by the node, and is not used here):

the method comprises the following specific steps:

first, a matrix a (adjacency matrix) in formula (1) is obtained by a priori knowledge modeling. For gene regulatory networks, the topology N is used _g-g (Gene, regulation) wherein a node is a Gene, and an edge is a Gene Regulation relationship, and is a directed graph; for protein relationship networks, the topology N is used _p-p (Protein, interaction) indicates that the node is Protein, the edge is Protein Interaction relation, and the node is an undirected graph; topology for biological Signal pathway N _s (Signature, definition), where nodes are biological signals and edges are signal interactions, is a directed graph. The three kinds of prior knowledge have the following relationship: for each protein in the protein interaction network that results from the transcriptional translation of the corresponding gene, nodes in the biological signaling pathway include, but are not limited to, gene and protein signals. Expressing the prior knowledge as a matrix M _g 、M _p 、M _s Global gene relationship matrix M consisting of M _g M _p M _s And acquiring a union set.

Aiming at each omics feature, inputting the adjacent matrix A and the initial weight W of the global genetic relationship matrix M into a graph convolution network as original features, and obtaining a weighted feature matrix through 3-layer iteration (GCN performance is reduced due to excessive layers)

om represents different omics; w ⁽³⁾ Namely the feature weights obtained through 3-layer GCN iteration.

If there are multiple sets of mathematical features, the linear combinations of features, such as equation (5), are then input to the classifier unit.

Wherein the content of the first and second substances,

a feature matrix representing the weighting corresponding to each omic in the multiomic; | | indicates that there will be multiple sets of mathematical features that are linearly combined;

if it is a single group of school hours,

is that corresponding

A classifier prediction unit: targeting multiomic data

Respectively using SVM-rbf, SVM-poly and RF classifiers to predict;

the classifier is obtained from a scipit-leam (https:// scipit-lean. Org/stable /) packet. The classifier parameters are adjusted using a grid search method. A grid search (GridSearch) is used to select the optimal hyper-parameters of the model. The mode of obtaining the optimal hyper-parameters can be used for drawing a verification curve, but the verification curve can only obtain one optimal hyper-parameter each time. If there are many permutation combinations of the plurality of hyper-parameters, a grid search may be used to find the combination of the optimal hyper-parameters.

ensemble learning involves integrating multiple classification tasks, and one common method is to weight and sum classifier results and implement inter-task coordination by setting different weights for the tasks. The simplest method is to set the weights of all tasks to be equal, however, the average weight method is only effective when no competition exists among the tasks, and in an actual situation, the tasks are naturally unbalanced due to a plurality of reasons such as different classifier precision and priori knowledge utilization capacity. Therefore, the invention utilizes Bayesian modeling to deduce the weight of the combined multi-classifier, and through a Bayesian task weight learning device, the balance among tasks is automatically realized.

The Bayes multitask weight learning method is briefly described as follows, and is embodied by a multitask loss function:

defining a multitask penalty as

Total loss prediction loss for each individual classifier

Weight of (w) _ω ) And, ω represents that there are ω classifiers.

Calculating the prediction loss of a single classifier prediction task by equation (6)

Where n is the total number of instances used to train the classifier weights;

equation (6) is a generalized representation that can substitute predictions for any one classifier pair.

Equation (7) is a weighted summation representation of equation (6) that contains the predictions for all classifiers used:

wherein, y _i Is a true label (the TCGA example is a labeled example, the label is the diagnosis of lung cancer, such as 'no lung cancer', 'early lung cancer', 'intermediate lung cancer', 'late lung cancer', and the like, and the label is 1 for the lung cancer and 0 for the no lung cancer);

is a single task to a feature

Is predicted (the prediction result is [0,1 ]]A number in the interval closer to 1 indicates a higher likelihood of lung cancer and closer to 0 indicates a higher likelihood of no lung cancer), although CNN and DNN are not characteristic

For convenience of illustration, the prediction results of CNN and DNN are also used for the prediction

Representing;

meanwhile, it is to be noted that: in order to make multitask lose

The weighting coefficient is not equal to 0, and is positive, and a weighting form with square as denominator is adopted to facilitate the operation.

It should be noted that, labels of "early lung cancer", "intermediate lung cancer" and "late lung cancer" exist in the training process, and actually, in the practical application process of the present invention, results of "early lung cancer", "intermediate lung cancer" and "late lung cancer" are also obtained. That is, the present invention may actually predict intermediate-stage lung cancer and advanced-stage lung cancer. In actual clinical practice, generally, lung cancer of middle stage and lung cancer of late stage are relatively easy to judge (the characteristics in the image are very obvious), and only lung cancer of early stage is not easy to identify, and the accuracy rate is low. Therefore, the invention also prepares labels of 'middle-stage lung cancer' and 'late-stage lung cancer' in the training process. The accuracy can be greatly improved when the early lung cancer is predicted by using the method.

The purpose of the training is to make formula (7) at pred _1：ω Under the conditions of

Task weight w with minimum value _1：ω ，2log(w ₁ *…*w _ω ) Is a punishment item, the formula (7) is solved by using a gradient descent method, and the task weight w is automatically generated _1：ω 。

The classifier results are as in equation (8):

vote [. Cndot. ] is voting operation, the prediction results of each classifier are weighted and then voted, and more models output lung cancer results when lung cancer exists, otherwise lung cancer-free results are output.

The second embodiment is as follows:

the present embodiment is a computer storage medium having at least one instruction stored therein, the at least one instruction being loaded and executed by a processor to implement the multiple mathematical features based early lung cancer diagnosis system.

It should be understood that any method described herein, including any methods described herein, may accordingly be provided as a computer program product, software, or computerized method, which may include a non-transitory machine-readable medium having stored thereon instructions, which may be used to program a computer system, or other electronic device. Storage media may include, but is not limited to, magnetic storage media, optical storage media; a magneto-optical storage medium comprising: read only memory ROM, random access memory RAM, erasable programmable memory (e.g., EPROM and EEPROM), and flash memory layers; or other type of media suitable for storing electronic instructions.

The third concrete implementation mode:

the embodiment is an early lung cancer diagnosis device based on multiple groups of chemical characteristics, the device comprises a processor and a memory, and it should be understood that the device comprises any device comprising the processor and the memory, which is described in the present invention, and can also comprise other units and modules which perform display, interaction, processing, control, etc. and other functions through signals or instructions;

the memory has stored therein at least one instruction that is loaded and executed by the processor to implement the multiple mathematical features based early lung cancer diagnosis system.

The above-described calculation examples of the present invention are merely to describe the calculation model and the calculation flow of the present invention in detail, and are not intended to limit the embodiments of the present invention. It will be apparent to those skilled in the art that other variations and modifications of the present invention can be made based on the above description, and it is not intended to be exhaustive or to limit the invention to the precise form disclosed, and all such modifications and variations are possible and contemplated as falling within the scope of the invention.

Claims

1. An early lung cancer diagnostic system based on multiple mathematical features, comprising: the system comprises a neural network prediction unit, an omic feature processing unit, a classifier prediction unit, an integrated learning automatic weight balance unit and an early lung cancer diagnosis machine learning model unit;

omics feature processing unit: aiming at each omics feature, a weighted feature matrix of a global gene relation matrix is obtained by utilizing a graph convolution network, and then a plurality of groups of omics features are obtained

Obtaining a weighted feature matrix of the global genetic relationship matrix by using a graph convolution network, and further obtaining multigroup characteristics

Comprises the following steps:

using matrix M to separate gene relationships in gene regulation and control network, protein relationship network and gene set in biological signal path network _g 、M _p 、M _s Represent, then M _g 、M _p 、M _s Taking a union set to obtain a global gene relation matrix M; obtaining a corresponding adjacency matrix A according to the global gene relation matrix M; element A in A _ij =1 represents Gene i regulatory Gene j, A _ij =0 represents gene i without regulatory gene j;

wherein

I denotes a unit matrix of the cell,

is that

Is a diagonal matrix with diagonal elements passing through

Calculating to obtain; w is a gene weight matrix, and initialization is that all gene weights are equal; h ^(l) Is a feature of each layer, for input layer H ^(l) Is the omics feature to be analyzed F ^om Om represents omics;

om represents different omics;

wherein the content of the first and second substances,

a feature matrix representing the weighting corresponding to each omic in the multiomic; i represents that a plurality of groups of mathematical characteristics are linearly combined;

if it is a single group of school hours,

is that corresponding

A classifier prediction unit: targeting multiomic data

Respectively using SVM-rbf, SVM-poly and RF classifiers for prediction;

2. The early lung cancer diagnosis system based on multiple sets of mathematical features according to claim 1, wherein the system further comprises a neural network prediction unit; the digital image processing unit converts the iconography data into the matrix-form iconography data.

3. The system according to claim 2, wherein the convolutional neural network CNN comprises: the device comprises an input layer, a first convolution layer, a first pooling layer, a first residual module, a second residual module, a third residual module, a fourth residual module, a fifth residual module, a second pooling layer, a first full-link layer and an output layer; each residual module includes two convolutional layers.

4. The system of claim 3, wherein the convolutional neural network CNN is trained by using lung cancer imaging data obtained from a public database.

5. The system according to claim 4, wherein the deep neural network DNN comprises: the multilayer structure comprises an input layer, a first Hidden layer, a second Hidden layer, a first convolution layer, a second convolution layer, a third convolution layer, a first pooling layer, a first full-link layer, a second full-link layer, a third full-link layer and an output layer;

wherein, hidden represents a Hidden layer.

6. The multi-component signature based early stage lung cancer diagnosis system of claim 1, 2, 3, 4 or 5, wherein the layer map convolution network is in information propagation between each layer

By the following D _ij The replacement results in:

where Din represents an in-degree matrix of the node, and the in-degree is an edge pointing to the node.

7. The system of claim 6, wherein the process of assigning weights for CNNs, DNNs, SVM-rbf, SVM-poly, RF by the integrated learning automatic weight balancing unit comprises the steps of:

Then predicting the loss of prediction of the task by the single classifier

Weighted summation:

wherein the content of the first and second substances,

(. For each individual classifier, w _ω Weights corresponding to individual classifiers, ω represents ω classifiers, w _1:ω Representing the weights used by all classifier pairs; pred _ω Indicating the prediction result, pred, of each classifier _1:ω The prediction results of all classifiers are obtained; 2log (w) ₁ *…*w _ω ) Is a penalty term;

in the training process, the formula (7) is calculated to be pred _1:ω Under the condition that

Task weight w with minimum value _1:ω Solving formula (7) by gradient descent method to automatically generate task weight w _1:ω 。

8. The multi-group signature-based early lung cancer diagnosis system of claim 7, wherein the SVM-rbf, SVM-poly and RF classifiers in the classifier prediction unit determine the parameters of the classifier by a grid search method.

9. A computer storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a multi-set mathematical signature-based early lung cancer diagnostic system according to any one of claims 1 to 8.

10. An early lung cancer diagnosis apparatus based on multiple sets of mathematical features, the apparatus comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the early lung cancer diagnosis system based on multiple sets of mathematical features according to one of claims 1 to 8.