CN112802023A

CN112802023A - Knowledge distillation method and device for pleural lesion segmentation based on lifelong learning

Info

Publication number: CN112802023A
Application number: CN202110398753.2A
Authority: CN
Inventors: 杜强; 欧阳金鹏; 郭雨晨; 聂方兴
Original assignee: Beijing Xbentury Network Technology Co ltd
Current assignee: Beijing Xbentury Network Technology Co ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-05-14

Abstract

The invention discloses a knowledge distillation method and a knowledge distillation device for pleural lesion segmentation based on lifelong learning, wherein the knowledge distillation method comprises the following steps: distilling softmax distribution knowledge of a plurality of original models of the trained pleural lesion diseases to obtain a new model; preprocessing the CT image in the pleural lesion picture set; respectively inputting a preprocessed CT image into a plurality of trained original models to obtain a plurality of segmentation results, stacking the segmentation results and calculating a first loss value with an original CT image label; obtaining a second loss value after the label calculation loss of the prediction image and the original CT image; weighting and summing the first loss value and the second loss value by using a gradient descent algorithm; carrying out backward propagation on the gradient, and updating the weight parameter of the new model; and under the condition of adding a new disease, adding the trained new disease model into the original model, adding the new disease CT image into the original pleural lesion picture set, and executing the steps to carry out lifelong learning. The segmentation model has the advantage of strong adaptability.

Description

Knowledge distillation method and device for pleural lesion segmentation based on lifelong learning

Technical Field

The invention relates to the field of image segmentation, in particular to a knowledge distillation method and a knowledge distillation device for pleural lesion segmentation based on lifetime learning.

Background

With the rapid development of artificial intelligence technology in recent years, more attention has been paid to how to effectively apply the most advanced technology to the clinical field. Four major factors of data, algorithm, computing power and specialty promote the development of medical artificial intelligence.

Deep learning brings the performance of many computer vision tasks to an unprecedented level, and complex models can significantly improve the learning performance of deep learning tasks, but at the same time bring about high consumption of storage space and computing resources. In order to solve the problem, the model compression method greatly alleviates the problem of insufficient computing resources and storage space, and the knowledge distillation method is a specific method under the model compression.

First, as information technology advances, various data is growing explosively. Secondly, the traditional machine learning algorithm is not suitable for a large number of application problems in a big data environment, because the traditional machine learning algorithm mostly only focuses on the work such as classification in a small sample range, and the like, and lacks adaptability to the big data environment. In such a context, lifelong machine learning comes up.

Disclosure of Invention

The invention aims to provide a knowledge distillation method and a knowledge distillation device for pleural lesion segmentation based on lifelong learning, and aims to solve the problem that the traditional machine learning lacks adaptability.

The invention provides a knowledge distillation method for pleural lesion segmentation based on lifelong learning, which comprises the following steps:

s1, distilling softmax distribution knowledge of a plurality of original models of the trained pleural lesion disease types to obtain original model knowledge, and endowing the original model knowledge to softmax distribution of a model to be established to obtain a new model;

s2, preprocessing the CT image in the pleural lesion picture set;

s3, inputting the preprocessed CT image into a plurality of trained original models respectively to obtain a plurality of segmentation results;

s4, stacking the segmentation results obtained in the S3 and calculating a first loss value with the label of the original CT image;

s5, inputting the preprocessed CT image of S3 into the new model to obtain a prediction image, and calculating the loss of the label of the prediction image and the label of the original CT image to obtain a second loss value;

s6, carrying out weighted summation on the first loss value and the second loss value by using a gradient descent algorithm to obtain a gradient for updating a new model;

s7, reversely propagating the gradient, and updating the weight parameter of the new model;

s8, under the condition of adding a new disease, adding a trained new disease model into the original model, adding a CT image of the new disease into the original pleural lesion picture set, and executing steps S1-S7 to carry out lifelong learning.

The invention also provides a knowledge distillation device for pleural lesion segmentation based on lifelong learning, which comprises:

the distillation module is used for distilling softmax distribution knowledge of a plurality of original models of the trained pleural lesion disease types to obtain original model knowledge, and endowing the original model knowledge to softmax distribution of a model to be established to obtain a new model;

the preprocessing module is used for preprocessing the CT image in the pleural lesion picture set;

the segmentation module is used for respectively inputting a preprocessed CT image into a plurality of trained original models to obtain a plurality of segmentation results;

the first loss value module is used for piling up a plurality of segmentation results obtained by the segmentation module and calculating a first loss value with a label of the original CT image;

the second loss value module is used for inputting the CT image preprocessed by the segmentation module into the new model to obtain a prediction image, and calculating the loss of the label of the prediction image and the label of the original CT image to obtain a second loss value;

the gradient module is used for carrying out weighted summation on the first loss value and the second loss value by using a gradient descent algorithm to obtain a gradient used for updating a new model;

the updating module is used for carrying out back propagation on the gradient and updating the weight parameter of the new model;

and the lifelong learning module is used for adding a trained new disease model into the original model under the condition of adding a new disease, adding a CT image of the new disease into the original pleural lesion picture set, and calling the module to perform lifelong learning.

The embodiment of the invention also provides a knowledge distillation device for pleural lesion segmentation based on lifelong learning, which comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the above-described method of knowledge distillation based on lifelong learning of pleural lesion segmentation.

Embodiments of the present invention further provide a computer-readable storage medium, on which an implementation program for information transmission is stored, and when the program is executed by a processor, the method implements the steps of the knowledge distillation method based on the pleural lesion segmentation of lifetime learning.

By adopting the embodiment of the invention, knowledge distillation and integration are carried out on the trained model with the single lesion, the model has stronger expansibility, and the adaptation of the model to newly added data and newly added lesions is realized by a lifelong learning method.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method of knowledge distillation for lifelong learning-based segmentation of pleural lesions in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of a knowledge distillation principle framework of lifelong learning-based pleural lesion segmentation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a prior art Unet network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a prior art Unnet network with ConLstm embedded therein according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a Went2D (3D) network in the prior art according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a knowledge distillation apparatus module for lifelong learning-based pleural lesion segmentation, according to an embodiment of the present invention;

fig. 7 is a schematic view of a knowledge distillation apparatus for lifelong learning-based pleura lesion segmentation in accordance with an embodiment of the present invention.

Description of reference numerals:

610: a distillation module; 620: a preprocessing module; 630: a segmentation module; 640: a first loss value module; 650: a second loss value module; 660: a gradient module; 670: an update module; 680: and a lifelong learning module.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise. Furthermore, the terms "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Method embodiment

According to an embodiment of the present invention, a knowledge distillation method for pleural lesion segmentation based on lifetime learning is provided, and fig. 1 is a flowchart of the knowledge distillation method for pleural lesion segmentation based on lifetime learning according to an embodiment of the present invention, as shown in fig. 1, specifically including:

s1 specifically includes:

establishing a generalized softmax function according to formula 1:

formula 1;

wherein T represents temperature, Z represents a vector, Z represents_iAnd Z_jIs an element thereof;

the crude model was distilled according to equation 2:

formula 2;

wherein q is the distribution of the new model generated by using the formula 1, p is the distribution of the original model generated by using the formula 1, and C is a loss value;

and extracting knowledge in the original model and endowing the knowledge to the model to be established to obtain a new model.

S2, preprocessing the CT image in the pleural lesion picture set;

S8 further includes:

if no new type of data is trained independently, preprocessing the CT image of the new disease, inputting the preprocessed CT image into a new model to obtain a prediction image, calculating loss between the prediction image and an original image label to obtain a loss value, and performing back propagation on the loss value to update the new model.

The preprocessing of the images in S2 and S8 specifically includes:

step one, converting the X-ray absorption coefficient of the CT image into a CT value according to a formula 3:

formula 3;

wherein HU represents a unit of CT value for representing the CT value, slope is a scaling slope of a linear transformation of a pixel stored in a disk representation to a pixel stored in an in-memory representation, and intercept is an intercept of the linear transformation of the pixel stored in the disk representation to the pixel stored in the in-memory representation; pixels are CT image pixels;

adjusting the window width and window level of the CT image to enable the pleura to obtain the best display;

thirdly, adopting a center cutting method for a picture with a small part of lesion parts, namely selecting an area with a target position and relatively amplifying the effective area;

selecting all slice indexes with target areas, randomly selecting indexes, randomly offsetting positive and negative N indexes by the randomly selected slice center, and taking front and back N slices to form a 2N slice as an integral 3D image;

and fifthly, adding a certain proportion of non-effective areas into the integral 3D image to obtain a complete 3D image, and finishing image preprocessing.

Fig. 2 is a schematic diagram of a knowledge distillation principle framework of pleural lesion segmentation based on lifetime learning, wherein a model of knowledge distillation is called a new model, and a large model participating in training is called an original model. The method is characterized by comprising the following steps of dividing the pleural thickening calcified adhesion, pleural pneumothorax, pleural effusion and pleural mesothelioma temporarily according to the related pathological changes of the existing pleural pathological changes. The targeted segmentation networks are trained using separate data, respectively. Finally, a new model is trained by knowledge distillation. And in which data and new lesions are continually added for lifelong learning of the model. The final model result is a channel with the number of lesion types being +1, and the channel with the highest average probability of each channel is the segmentation result of the lesion.

And (3) solving the mean square error of the output result of the single model and the output result of the new model by using a Gradient Descent Algorithm (gda), and performing back propagation after final summation to update the weight parameters of the model. And training weight parameters after multiple rounds of iteration to obtain a final integrated model.

The specific implementation method comprises the following steps:

s1 specifically includes:

establishing a generalized softmax function according to formula 1:

formula 1;

the crude model was distilled according to equation 2:

formula 2;

The following is an explanation of the trained model:

pleural lesions can be temporarily classified into pleural thickening calcified adhesions, pneumothorax, effusion and mesothelioma at present according to actual needs.

And respectively building a targeted independent segmentation model for the related lesions according to actual requirements. The targeted independent segmentation models are Unet2D (3D), Wnet2D (3D), Unet2The D + ConvLstm network is central to the CT image, assuming we observe a dynamic model over a spatial region represented by an M × N grid consisting of M rows and N columns. Inside each cell of the grid, there are P measurements, which vary with time. Thus, the observation at any time can be represented by the tensor X ∈ R, PxM × N, where R represents the domain of the observed feature. We scan a CT image at regular intervals, we will get tensor X₁，X₂，....,X_tThe sequence of (a). And obtaining the optimal model file according to the experimental effect.

As shown in fig. 3, the network is a Unet network, and the network is composed of a contraction path (contracting path) and an expansion path (expanding path). Wherein, the contraction path is used for obtaining context information (context), the expansion path is used for precise positioning (localization), and the two paths are symmetrical to each other. Effective annotation data can be more effectively used by means of data enhancement from few training images. The main idea is to supplement the usual shrinking network by successive layers, where the pooling operation is replaced by an upsampling (upsampling) operation. Thus, these layers improve the resolution of the output results. For the positioning operation, high resolution features from the systolic path are combined with the upsampling. Subsequent successive convolutional layers can learn based on this information, resulting in a more accurate output.

As shown in fig. 4: partial lesion features are also embedded into convLstm thought on the Unet network, LSTM has very strong capability of processing time series data, but if the time series data is an image, the feature extraction of the image is more effective by adding convolution operation on the basis of the LSTM.

The ConvLstm core again, by its nature, takes the output of the previous layer as input to the next layer, as is the LSTM. The difference is that after adding the convolution operation, not only the timing relationship can be obtained, but also the features can be extracted like a convolution layer. The invention embeds convLstm idea on the Unet network with stronger feature extraction capability, not only can extract stronger semantic features, but also can well reserve the time sequence space features of CT images.

As shown in fig. 5: and pleural related lesions are built by adopting a Wnet2D (3D) network, the Wnet network is a multilayer multi-scale parallel convolution network, different size characteristics of each layer of module are transmitted into the next layer of network by means of convolution downsampling and the like, and each convolution is carried out by a Batch Normalization (BN) layer and a Linear rectification function (ReLU) module. The characteristics of different scales can be extracted through a multilayer network, and the focus areas with different sizes are better identified. The Identity module ensures that gradient back propagation does not occur with gradient disappearance. And finally, integrating the results of each stage of network to obtain the final segmentation result.

In S1, knowledge is distilled, and in the context of deep learning, there are often two schemes for achieving better predictions: 1. the method is characterized in that an over-parameterized deep neural network is used, the network has very strong learning capacity, and therefore a certain regularization strategy (such as dropout) is often added; 2. an ensemble model (ensemble), which integrates many weak models, tends to achieve better predictions. Both of these schemes undoubtedly have a large expenditure, require a large amount of computation and computation resources, and are very disadvantageous to deployment. It is desirable to have a smaller scale model that achieves the same or comparable results as a larger model. We transfer the knowledge of the trained multiple targeted segmentation models to the small models.

The small model to be trained is called the new model, and the large model to be trained is called the old model. Originally we need to match the softmax distribution of the new model with the real tags, now we need only match the softmax distribution of the new model with the original model at a given input. Learning a new model from scratch is to approximate an unknown function from limited data. If the new model is made to approximate the original model, we can use many dummy data in the untrained set to train the new model, which is obviously more feasible, because the functions of the original model are known. Intuitively, the latter has such an advantage over the former: the softmax distribution of the trained original model contains certain knowledge. For knowledge distillation, our goal is to have the new model sufficiently close to the distribution of the softmax output of the original model. Doing so directly is problematic: in a general softmax function, the natural index e is first enlarged by the difference between logits, then normalized, and the resulting distribution is an approximation of argmax. In this case, the knowledge is very limited in its embodiment. We prefer to output softer than a hard output like one-hot.

One approach is to directly compare logits to avoid this problem. Specifically, for each piece of data, some logits generated by the memory model is V_iThe logits generated by the new model is Z_iWe need to minimize:

formula 4;

a more versatile approach is proposed. Consider a generalized softmax function:

formula 1;

where T is temperature, which is a concept borrowed from the Boltzmann distribution in statistical mechanics. It is easy to prove that when the temperature goes to 0, the softmax output will converge to a one-hot vector, and when the temperature T goes to infinity, the softmax output is softer. Therefore, when training a new model, a higher T can be used to make the softmax-generated distribution soft enough to let the softmax output of the new model approximate the original model.

Specifically, we need to minimize the Cross entropy (Cross-entropy) of the two distributions during training, noting that the new model generates a distribution q using the formula, and the original model generates a distribution p, we need to minimize.

Formula 2;

we extracted the knowledge in the original model by first raising the temperature and then returning it to the low temperature during the test phase, hence the name distillation²This is to have the gradient of the two terms of the loss function be roughly an order of magnitude.

S2, preprocessing the CT image in the pleural lesion picture set;

S8 further includes:

The preprocessing of the images in S2 and S8 specifically includes:

formula 3;

step three, adopting a method of cutting 128 by 128 at the center for a picture with a small part of lesion parts, namely selecting an area with a target position and relatively amplifying an effective area;

selecting all slice indexes with target areas, randomly selecting indexes, randomly offsetting positive and negative 8 indexes by the randomly selected slice center, and taking front and rear 8 slices to form a 3D image with 16 slices as a whole;

and step five, adding a certain proportion of non-effective areas into the whole 3D image to obtain a complete 3D image, and finishing image preprocessing.

In the complete 3D image, not every slice has an effective area, when the effective area is selected, a certain proportion of the ineffective area is reserved, namely the false positive rate, and the invention reserves the ineffective area which is 10 percent of the number of the complete 3D image slices as a false positive image so as to train the generalization capability of the model.

Based on the actual application scenario, the new model must have a certain adaptability to the new data. The situation of different data increase is roughly divided into two differences of new pleura lesion types and new training pictures of known types.

The most important point for lifelong Learning is the migration of knowledge, namely how old knowledge helps the Learning of new knowledge, and the migration of knowledge or Transfer Learning (Transfer Learning) is the basis of lifelong machine Learning.

Aiming at the expansion of the new pleura lesion types, the invention directly adds a training set on a new model and increases the adaptation of the new model to the new types. And (3) directly outputting results in the trained targeted independent segmentation model aiming at new and known species, and continuously distilling new knowledge into the new model in the loop iteration. And the adaptability is strong.

Apparatus embodiment one

According to an embodiment of the present invention, a knowledge distillation apparatus for pleural lesion segmentation based on lifetime learning is provided, and fig. 6 is a schematic diagram of a module of the knowledge distillation apparatus for pleural lesion segmentation based on lifetime learning, as shown in fig. 6, specifically including:

the distillation module is specifically configured to: establishing a generalized softmax function according to formula 1:

formula 1;

the crude model was distilled according to equation 2:

formula 2;

The lifelong learning module is further to: if no new type of data is trained independently, preprocessing the CT image of the new disease, inputting the preprocessed CT image into a new model to obtain a prediction image, calculating loss between the prediction image and an original image label to obtain a loss value, and performing back propagation on the loss value to update the new model.

The preprocessing module and the lifetime learning module are specifically configured to:

the X-ray absorption coefficient of the CT image is converted into CT values according to equation 3:

formula 3;

adjusting the window width and the window level of the CT image to enable the pleura to obtain the optimal display;

adopting a center cutting method for a part of pictures with small lesion parts, namely selecting only a region with a target position and relatively amplifying the effective region;

selecting all slice indexes with target areas, randomly selecting indexes, randomly offsetting positive and negative N indexes by the center of the randomly selected slice, and taking front and back N slices to form a 2N slice as an integral 3D image;

and adding a certain proportion of non-effective areas into the integral 3D image to obtain a complete 3D image, and finishing image preprocessing.

The embodiment of the present invention is an apparatus embodiment corresponding to the above method embodiment, and specific operations of each module may be understood with reference to the description of the method embodiment, which is not described herein again.

Device embodiment II

An embodiment of the present invention provides a knowledge distillation apparatus for pleural lesion segmentation based on lifelong learning, as shown in fig. 7, including: a memory 70, a processor 72 and a computer program stored on the memory 70 and executable on the processor 72, the computer program, when executed by the processor, implementing the steps of the above-described method embodiments.

Device embodiment III

The embodiment of the present invention provides a computer-readable storage medium, on which an implementation program for information transmission is stored, and when the program is executed by the processor 72, the steps in the above method embodiments are implemented.

The computer-readable storage medium of this embodiment includes, but is not limited to: ROM, RAM, magnetic or optical disks, and the like.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; however, these modifications or alternatives are not intended to depart from the scope of the corresponding technical solutions.

Claims

1. A knowledge distillation method for pleural lesion segmentation based on lifetime learning is characterized by comprising the following steps,

s2, preprocessing the CT image in the pleural lesion picture set;

2. The method according to claim 1, wherein the S1 specifically includes:

establishing a generalized softmax function according to formula 1:

formula 1;

the crude model was distilled according to equation 2:

formula 2;

3. The method according to claim 1, wherein the S8 further comprises:

4. The method according to claim 3, wherein said S2 and said pre-processing of CT images of the new patient specifically comprises:

formula 3;

5. A knowledge distillation apparatus for pleura lesion segmentation based on lifelong learning, comprising:

6. The apparatus according to claim 5, wherein the distillation module is specifically configured to: establishing a generalized softmax function according to formula 1:

formula 1;

the crude model was distilled according to equation 2:

formula 2;

7. The apparatus of claim 1, wherein the lifelong learning module is further configured to: if no new type of data is trained independently, preprocessing the CT image of the new disease, inputting the preprocessed CT image into a new model to obtain a prediction image, calculating loss between the prediction image and an original image label to obtain a loss value, and performing back propagation on the loss value to update the new model.

8. The apparatus of claim 5 or 7, wherein: the preprocessing module and the lifetime learning module are specifically configured to:

formula 3;

9. A knowledge distillation apparatus for pleura lesion segmentation based on lifelong learning, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the steps of the method of knowledge distillation based on lifelong learning of a pleural lesion segmentation according to any one of claims 1 to 4.

10. A computer-readable storage medium, characterized in that it has stored thereon a program for implementing information transfer, which program, when being executed by a processor, implements the steps of a method for knowledge distillation based on lifelong learning of pleural lesion segmentation according to any one of claims 1 to 4.