CN113469954B

CN113469954B - Method and device for evaluating benign and malignant lung nodule

Info

Publication number: CN113469954B
Application number: CN202110660081.8A
Authority: CN
Inventors: 周振; 李一鸣; 俞益洲; 乔昕
Original assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Current assignee: Beijing Shenrui Bolian Technology Co Ltd; Shenzhen Deepwise Bolian Technology Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2024-04-09
Anticipated expiration: 2041-06-15
Also published as: CN113469954A

Abstract

The invention provides a method and a device for evaluating benign and malignant lung nodule. The method comprises the following steps: constructing an evaluation model comprising a feature extraction module, an estimation module and a data fusion module; training parameters of the feature extraction module and each estimator by using a training data set consisting of lung nodule pairs and labels of different doctors, wherein the labels are marked by the lung nodule pairs and relate to the benign and malignant degrees of the two lung nodules, and the parameters of the feature extraction module and each estimator are trained by using a loss function obtained based on the output of each estimator and the labels of each estimator; and inputting the CT image of the lung nodule to be evaluated into a trained evaluation model to obtain the benign and malignant scoring of the lung nodule. Because the data set for training the evaluation model adopts the labels with relatively high and low benign and malignant degrees of the two lung nodules, and the consistency of the labels marked by different doctors is good, the accuracy of evaluating the benign and malignant degrees of the lung nodules is improved; the training method of the invention also reduces the time complexity and improves the speed of model training.

Description

Method and device for evaluating benign and malignant lung nodule

Technical Field

The invention relates to the technical field of image processing by utilizing a neural network, in particular to a method and a device for evaluating benign and malignant lung nodule.

Background

In the prior art, the identification of benign and malignant lung nodules refers to the automatic identification of benign and malignant lung nodules in a given lung CT image through a deep neural network. The accuracy of neural networks in identifying benign and malignant lung nodules is very dependent on the accuracy of benign and malignant labels of lung CT images in training data. Labels for CT images of lung nodules have two sources: the first is that the imaging doctor judges the benign and malignant lung nodules; the second is to judge the benign and malignant lung nodules by microscopic pathological analysis. The cost of marking by the imaging doctor is low, and the marking accuracy is about 75%. The marking cost of pathological analysis is high, the accuracy is close to 100%, and the pathological analysis marking is generally regarded as a true benign and malignant marking of lung nodules. A specific method for imaging physician marking uses several imaging physicians to score lung CT images from 1 to 5, 1 representing the lowest malignancy and 5 representing the highest malignancy. If the average score of the lung CT image is higher than 3.5 points, the label of the image is malignant, if the average score is lower than 2.5 points, the label of the image is benign, and the image with the average score of 2.5 to 3.5 points is discarded. Most of the basic lung nodule CT image datasets currently used are marked by imaging doctors. However, the accuracy of the imaging physician marking is greatly affected by human factors, the standards of the lung nodule benign and malignant scoring by different imaging physicians are inconsistent, some diagnoses are more conservative, and some diagnoses are more aggressive. There is a great difference between the scores given by different imaging physicians to the same CT image of a lung nodule. Thereby resulting in a lower quality neural network data set that affects neural network training.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method and a device for evaluating benign and malignant lung nodule.

In order to achieve the above object, the present invention adopts the following technical scheme.

In a first aspect, the present invention provides a method for evaluating benign and malignant lung nodule, comprising the steps of:

constructing an evaluation model comprising a feature extraction module, an estimation module and a data fusion module, wherein each estimator of the estimation module is used for scoring according to the lung nodule features output by the feature extraction module, and the data fusion module is used for calculating the weighted average of the scoring;

training parameters of the feature extraction module and each estimator by using a training data set consisting of lung nodule pairs and labels of different doctors, wherein the labels are marked by the lung nodule pairs and relate to the benign and malignant degrees of the two lung nodules, and the parameters of the feature extraction module and each estimator are trained by using a loss function obtained based on the output of each estimator and the labels of each estimator;

and inputting the CT image of the lung nodule to be evaluated into a trained evaluation model to obtain the benign and malignant scoring of the lung nodule.

Further, the method of determining the output scoring weight of each estimator comprises the following steps for each estimator:

selecting N lung nodule CT images as input of an evaluation model, wherein the number of malignant lung nodules is A, the number of benign lung nodules is B, and N=A+B;

when the statistical threshold T is changed from the minimum value of scoring to the maximum value according to the step length, calculating Y=TP/A and X=FP/B by calculating the number TP of malignant lung nodules with the scoring value exceeding T and the number FP of benign lung nodules without exceeding T output by an estimator;

and drawing a curve by taking X as an abscissa and Y as an ordinate, and calculating the area under the curve to obtain the weight of the output scoring of the estimator.

Further, the estimator output score has a minimum value of 1 and a maximum value of 5.

Further, the tag is: if the malignancy of the first lung nodule is more severe, the label is 1; if the malignancy of the second lung nodule is more severe, the label is-1; if the malignancy of the two lung nodules is comparable, the label is 0.

Further, the loss function is:

wherein is L _r A loss function of the r-th estimator;labeling the ith lung nodule pair for the nth doctor with a relatively high or low label on the benign or malignant extent of both lung nodules, c= { -1,0,1}, i=1, 2, …, n, n being the number of lung nodule pairs, r=1, 2, …, R being the number of doctors or estimators; epsilon is a random error subject to sigmoid distribution; />Scoring the first lung nodule in the pair according to lung nodule +.>And a second lung nodule score +.>The obtained quantized value of the benign and malignant degree is relatively high and low; lambda is a set boundary threshold; />To indicate the function, when->The value is 1 when +.>When the value is 0;is->The probability values at the time are respectively:

wherein,

in a second aspect, the present invention provides a pulmonary nodule benign and malignant evaluation apparatus comprising:

the modeling module is used for constructing an evaluation model comprising a feature extraction module, an estimation module and a data fusion module, wherein each estimator of the estimation module is used for scoring according to the lung nodule features output by the feature extraction module, and the data fusion module is used for calculating the weighted mean value of the scoring;

the training module is used for training parameters of the feature extraction module and each estimator by using a training data set consisting of lung nodule pairs and labels which are marked by different doctors and are relatively high and low in benign and malignant degrees of the two lung nodules and adopting a loss function obtained based on the output of each estimator and the labels of the estimators;

and the evaluation module is used for inputting the CT image of the lung nodule to be evaluated into the trained evaluation model to obtain the benign and malignant scoring of the lung nodule.

selecting CT images of N lung nodules as input of an evaluation model, wherein the number of malignant lung nodules is A, the number of benign lung nodules is B, and N=A+B;

Further, the loss function is:

wherein is L _r A loss function of the r-th estimator;labeling the ith lung nodule pair for the nth doctor with a relatively high or low label on the benign or malignant extent of both lung nodules, c= { -1,0,1}, i=1, 2, …, n, n being the number of lung nodule pairs, r=1, 2, …, R being the number of doctors or estimators; epsilon is a random error subject to sigmoid distribution; />Scoring the first lung nodule in the pair according to lung nodule +.>And a second lung nodule score +.>The obtained benign and malignant phasesA quantized value of the height; lambda is a set boundary threshold; />To indicate the function, when->The value is 1 when +.>When the value is 0;is->The probability values at the time are respectively:

wherein,

compared with the prior art, the invention has the following beneficial effects.

According to the invention, an evaluation model comprising a feature extraction module, an estimation module and a data fusion module is constructed, a training data set consisting of a lung nodule pair and labels which are marked by different doctors and are relatively high and low in benign and malignant degrees of two lung nodules in the lung nodule pair is utilized to train the evaluation model, parameters of the feature extraction module and each estimator are trained based on the output of each estimator and a loss function obtained by the labels of the estimator, the lung nodule to be evaluated is input into the trained evaluation model, the benign and malignant scoring of the lung nodule is obtained, and quantitative evaluation of the benign and malignant of the lung nodule is realized. Because the data set for training the evaluation model adopts the labels with relatively high and low benign and malignant degrees of the two lung nodules, and the consistency of the labels marked by different doctors is good, namely the label precision is high, the accuracy of the benign and malignant evaluation of the lung nodules is improved; meanwhile, the parameters of the feature extraction module and each estimator are trained by the loss function obtained based on the output of each estimator and the label thereof, so that the time complexity is reduced, and the model training speed is improved.

Drawings

Fig. 1 is a flowchart of a method for evaluating benign and malignant lung nodules according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of the evaluation model.

Fig. 3 is a block diagram of a pulmonary nodule benign and malignant evaluation apparatus according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the drawings and the detailed description below, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Fig. 1 is a flowchart of a method for evaluating benign and malignant lung nodule according to an embodiment of the present invention, including the steps of:

step 101, constructing an evaluation model comprising a feature extraction module, an estimation module and a data fusion module, wherein each estimator of the estimation module is used for scoring according to the lung nodule features output by the feature extraction module, and the data fusion module is used for calculating the weighted mean value of the scoring;

step 102, training parameters of a feature extraction module and each estimator by using a training data set consisting of lung nodule pairs and labels, which are marked by different doctors, about the relative high and low degrees of benign and malignant of the two lung nodules and adopting a loss function obtained based on the output of each estimator and the labels thereof;

and step 103, inputting the CT image of the lung nodule to be evaluated into a trained evaluation model to obtain a benign and malignant score of the lung nodule.

In this embodiment, an artificial neural network (such as convolutional neural network CNN) is used to construct an evaluation model of benign and malignant lung nodules. The input of the evaluation model is a lung nodule CT image, and the output is a quantitative evaluation of the benign and malignant properties of the input lung nodule, namely a comprehensive score. The benign and malignant scoring can be automatically output only by inputting CT images of lung nodules to be evaluated into a trained evaluation model.

In this embodiment, step 101 is mainly used for constructing a benign and malignant evaluation model of lung nodules. The evaluation model mainly comprises a feature extraction module, an estimation module and a data fusion module, as shown in fig. 2. The feature extraction module is used for extracting features of the input lung nodule CT image. The estimation module is composed of a plurality of estimators, all estimators share the output of the feature extraction module, and scoring is performed according to the lung nodule feature output by the feature extraction module, for example, the larger the lung nodule is, the more branches are, the higher the possibility of malignancy is, and the scoring value is higher. The data fusion module is used for fusing the scores output by all the estimators to obtain a comprehensive score. The fusion method adopted by the embodiment is to calculate the average value of the scoring weights of the estimators, and the calculation formula is as follows:

wherein S is the final integrated score, S _r Scoring the r-th estimator, k _r Weights scoring the R-th estimator, R being the number of estimators. The simplest weighted value strategies are to take 1, and the data fusion is equivalent to averaging. However, in order to improve the effect of data fusion, different weights are generally set according to different reliability of the estimator, and the better the reliability, the larger the estimator weight. Will be laterA specific embodiment for determining the estimator weights is presented.

In this embodiment, step 102 is mainly used for training the evaluation model. The model training of this embodiment is mainly to optimize the parameters of each estimator and the parameters of the feature extraction module. The training data set in the prior art generally consists of a single lung nodule CT image sample and benign and malignant labels marked by doctors, and the label accuracy is not high due to inconsistent scoring standards of different doctors, so that the prediction accuracy of a trained model is not high. But the consistency of the evaluation made by different doctors of the relative high and low (i.e. which is more serious) of the benign and malignant degrees of two different lung nodules is quite good, as compared with the difficulty in accurately speaking the specific height of a person, but the accurate judgment of who is higher than the two persons is easy to make. For this reason, in this embodiment, the lung nodule CT image samples are "paired two by two", that is, two are paired, and then different doctors evaluate the relative height of the same lung nodule on the benign and malignant degree, and use this as the label of the lung nodule pair, so as to obtain the training data set composed of the lung nodule pair and its label. The loss function adopted in the training in this embodiment is also different from the prior art, in which the model is generally trained by using a total loss function, and in this embodiment, a loss function determined by the output and the label difference is provided for each estimator, so that the parameters of each estimator and the feature extraction module are optimized and trained. The processing method of the embodiment not only can improve the prediction precision of the model, but also can obviously reduce the time complexity of training, reduce the training time and improve the training speed.

In this embodiment, step 103 is mainly used to obtain a score of the benign or malignant lung nodule to be evaluated. And inputting the CT image of the lung nodule to be evaluated into a trained evaluation model, so that the benign and malignant scoring of the lung nodule can be conveniently obtained.

As an alternative embodiment, the method of determining the output scoring weight of each estimator comprises the following steps for each estimator:

The present embodiment provides a technical solution for determining the output scoring weight of the estimator. The present embodiment uses the ROC (Receiver Operating Characteristic ) curve to determine the estimator output scoring weights. ROC curves were first applied in the field of radar signal detection to distinguish between signals and noise. Later on, one uses it to evaluate the predictive ability of the model, and the ROC curve is derived based on the confusion matrix. The thresholds of a classification model may be set to be high or low, each threshold setting may result in different false positive rate FPR and true positive rate TPR, and the (FPR, TPR) coordinates of each threshold of the same model are plotted in ROC space to form the ROC curve of the specific model. ROC curve is FPR on the abscissa and TPR on the ordinate. AUC (Area under the Curve) is the area under the ROC curve, and when different classification models are compared, the ROC curve of each model can be drawn, and the area under the curve is used as an index of the model quality. AUC has the following characteristics: the value range is [0,1]; assuming positive above the threshold and negative below the threshold, the AUC is equal to the probability that the classifier correctly judges that the positive sample has a higher value than the negative sample. Thus, the higher the AUC value of the classifier, the higher the accuracy. The larger the AUC value, the higher the estimator feasibility, and the larger the scoring weight. Therefore, only the corresponding ROC curve is drawn for each estimator, and the area under the curve is calculated to obtain the scoring weight.

As an alternative embodiment, the minimum value of the estimator output score is 1 and the maximum value is 5.

The present embodiment gives a range of values for the output score of the estimator, with a minimum value of 1 and a maximum value of 5. The greater the scoring score, the more severe the malignancy of the lung nodules. It should be noted that this example is given as a preferred embodiment, and is not intended to negate or exclude other embodiments that are possible, for example, whether ten or a percentage may be used.

As an alternative embodiment, the tag is: if the malignancy of the first lung nodule is more severe, the label is 1; if the malignancy of the second lung nodule is more severe, the label is-1; if the malignancy of the two lung nodules is comparable, the label is 0.

The specific values of the labels representing the relative high and low degrees of benign and malignant of two lung nodules in the lung nodule pair are respectively 1,0 and-1. Labels 1 and-1 indicate that one lung nodule is more severe than the other, respectively, and 0 indicates that the benign and malignant extent of both lung nodules is the same or similar. Likewise, the embodiment only provides a preferred embodiment, and does not negate or exclude other possible embodiments, and any three different integers may be used as the label, so that the solution of the embodiment is more intuitive and concise.

As an alternative embodiment, the loss function is:

wherein,

this embodimentA technical solution for the loss function is given. As described above, this embodiment sets a loss function for each estimator for training optimization of the corresponding estimator parameters, and therefore uses the serial number r of the estimator as the index of the loss function. Since the tag has three values of 1,0 and-1, the probability equal to the tag is equal to the sum of the probabilities when the tag is 1,0 and-1 respectively, and the loss function comprises the variable of the tagAnd (5) summing operation. Further, since the average value is calculated for all lung nodules in the training data set, the loss function also includes calculation of summing up (averaging) the lung nodules with the variable i. Quantized score of relatively high or low degree of benign malignancy of two lung nodules estimated based on score actually output by estimator>Is a piecewise function, takes three different values 1,0, -1, -lambda, lambda are the demarcation points of the three intervals, also called boundary threshold, see in particular +.>Is an expression of (2). Labels are also provided aboveWhen 1,0, -1, respectively, +.>Probability equal to tag->The probability obeys a sigmoid distribution.

Fig. 3 is a schematic diagram of the composition of a device for evaluating benign and malignant lung nodules according to an embodiment of the present invention, the device comprising:

the modeling module 11 is configured to construct an evaluation model including a feature extraction module, an estimation module, and a data fusion module, where each estimator of the estimation module is configured to score according to the lung nodule feature output by the feature extraction module, and the data fusion module is configured to calculate a weighted average of the scores;

a training module 12 for training parameters of the feature extraction module and each estimator by using a training data set composed of the lung nodule pairs and labels of different doctors, which are relatively high and low with respect to benign and malignant degrees of the two lung nodules, and using a loss function obtained based on the output of each estimator and the labels thereof;

and the evaluation module 13 is used for inputting the CT image of the lung nodule to be evaluated into the trained evaluation model to obtain the benign and malignant score of the lung nodule.

The device of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and its implementation principle and technical effects are similar, and are not described here again. As well as the latter embodiments, will not be explained again.

As an alternative embodiment, the loss function is:

wherein,

the foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for evaluating benign and malignant lung nodules, comprising the steps of:

inputting CT images of the lung nodules to be evaluated into a trained evaluation model to obtain benign and malignant scores of the lung nodules;

the label is as follows: if the malignancy of the first lung nodule is more severe, the label is 1; if the malignancy of the second lung nodule is more severe, the label is-1; if the malignancy of the two lung nodules is comparable, the label is 0;

the loss function is:

wherein,

2. the method of assessing the malignancy of pulmonary nodules of claim 1 wherein the method of determining the output scoring weight for each estimator comprises the steps of, for each estimator:

3. The method of evaluating benign and malignant lung nodule according to claim 2, wherein the minimum value of the estimator output score is 1 and the maximum value is 5.

4. A pulmonary nodule benign and malignant evaluation apparatus, comprising:

the evaluation module is used for inputting CT images of the lung nodules to be evaluated into the trained evaluation model to obtain benign and malignant scoring of the lung nodules;

the loss function is:

wherein,

5. the pulmonary nodule benign and malignant evaluation apparatus of claim 4, wherein the method of determining the output scoring weight for each estimator comprises the steps of, for each estimator:

6. The pulmonary nodule benign and malignant evaluation apparatus of claim 5, wherein the minimum value of the estimator output score is 1 and the maximum value is 5.