CN114743000A

CN114743000A - Multitask pneumothorax medical image semantic segmentation model system and method based on Unet

Info

Publication number: CN114743000A
Application number: CN202210312808.8A
Authority: CN
Inventors: 沈旭东; 吴湘莲; 楼平; 雷英栋; 陶冶博
Original assignee: Jiaxing Vocational and Technical College
Current assignee: Jiaxing Vocational and Technical College
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-12

Abstract

The invention discloses a multitask pneumothorax medical image semantic segmentation model system and a multitask pneumothorax medical image semantic segmentation model method based on Unet, wherein the multitask pneumothorax medical image semantic segmentation model method based on Unet comprises the following steps of S1: acquiring a pneumothorax CT image and carrying out coding processing on the pneumothorax CT image through a first model and a second model so as to obtain an image second classification. The invention discloses a multitask pneumothorax medical image semantic segmentation model system and a multitask pneumothorax medical image semantic segmentation method based on Unet, wherein the traditional Unet semantic segmentation is improved, Resnet34 is used as a frame in a coding stage, an SCSE module is introduced for correcting image characteristic information, global information of an image is acquired from two directions of space and a channel, a loss function is learned by adopting a multitask strategy of combining classification and segmentation, and the semantic segmentation is carried out on the pneumothorax medical image.

Description

Multitask pneumothorax medical image semantic segmentation model system and method based on Unet

Technical Field

The invention belongs to the technical field of pneumothorax image analysis, and particularly relates to a multitask pneumothorax medical image semantic segmentation model system based on Unet and a multitask pneumothorax medical image semantic segmentation model method based on Unet.

Background

Pneumothorax means that gas enters a pleural cavity to cause an pneumatosis state, the pneumothorax is a great risk to human life, the accuracy of general judgment is 50%, and the threat to human is great, wherein a general method is to diagnose through an X-ray film, but the X-ray film can be judged only by doctors with abundant experience, currently, with the continuous development of deep learning technology, whether patients suffer from pneumothorax diseases can be quickly identified through a deep learning model, the positions of focuses can be judged, whether the patients suffer from pneumothorax diseases is predicted through a network model, the burden of the doctors is greatly relieved, and an effective auxiliary means can be provided.

In the process of semantic segmentation by using a traditional Unet model, the first problem is that the recall rate of a positive sample is low, and an X-ray film has pneumothorax focus, but the positive sample is not identified in the process of prediction by using the semantic segmentation model; the second problem is whether to design a better network structure on the basis of the original structure and further improve the precision of semantic segmentation.

Therefore, the above problems are further improved.

Disclosure of Invention

The invention mainly aims to provide a multitask pneumothorax medical image semantic segmentation model system and a multitask pneumothorax medical image semantic segmentation method based on Unet, wherein the traditional Unet semantic segmentation is improved, Resnet34 is used as a frame in a coding stage, an SCSE module is introduced for correcting image characteristic information, global information of an image is obtained from two directions of space and a channel, a loss function is learned by adopting a multitask strategy of combining classification and segmentation, and the pneumothorax medical image is subjected to semantic segmentation, so that the semantic segmentation method has certain advantages in segmentation accuracy compared with the traditional Unet semantic segmentation method.

In order to achieve the above purposes, the invention provides a multitask pneumothorax medical image semantic segmentation model method based on Unet, which is used for performing semantic segmentation on a pneumothorax medical image and comprises the following steps:

step S1: acquiring a pneumothorax CT image, and carrying out coding processing on the pneumothorax CT image through a first model (an Unet model with a frame of Resnet34 and pre-training parameters) and a second model (an SCSE module structure) so as to obtain an image second classification;

step S2: during decoding processing, fusing characteristic information of each layer in the encoding processing stage, performing up-sampling, and combining a second model to continuously fuse the information of each layer subjected to up-sampling so as to obtain a semantic segmentation image;

step S3: and fusing the loss function of the image secondary classification and the loss function of the semantic segmentation image, learning the loss function by adopting a multi-task strategy of classification and segmentation fusion, fusing the local pixel points and the global pixel points in the classification, and obtaining the global loss function after adjusting the weight relation so as to realize the semantic segmentation of the pneumothorax CT image.

As a further preferable embodiment of the above technical means, step S1 is specifically implemented as the following steps:

step S1.1: the first model is used as a basic feature extraction network of the pneumothorax CT image and carries out basic feature extraction by a method of adding a residual error network in the network;

step S1.2: the second model corrects the extracted basic features through a space compression module and a channel compression module, so that correction data of each layer corresponding to the first model are obtained;

step S1.3: and carrying out coding processing on the pneumothorax CT image through the first model and the second model so as to obtain a second classification of the image.

As a further preferable embodiment of the above-mentioned technical means, step S2 is specifically implemented as the following steps:

step S2.1: a first module (64, 32, 32) in the encoding process upsamples (Bn1+ Relu + Maxpool) a module in which a largest pooling process in the encoding process is located to obtain a first feature map and performs a first modification process on the first feature map by a second model to obtain first modified data;

step S2.2: a second module (64, 64, 64) in the coding process up-samples the first correction data obtained by the first module and fuses correction data corresponding to a module in which the fifth layer of the first model in the coding process is located to obtain a second feature map and performs a second correction process on the second feature map through the second model to obtain second correction data;

step S2.3: a third module (64, 128, 128) in the coding process performs up-sampling on the second correction data obtained by the second module and fuses correction data corresponding to a module where a fourth layer of the first model in the coding process is located to obtain a third feature map and performs a third correction process on the third feature map through the second model to obtain third correction data;

step S2.4: a fourth module (64, 256, 256) in the encoding process up-samples the third correction data obtained by the third module and fuses the correction data corresponding to the module in which the third layer of the first model in the encoding process is located to obtain a fourth feature map and performs a fourth correction process on the fourth feature map through the second model to obtain fourth correction data;

step S2.5: a fifth module (64, 512, 512) in the encoding process performs up-sampling on the fourth correction data obtained by the fourth module and fuses correction data corresponding to a module in which the second layer of the first model in the encoding process is located to obtain a fifth feature map and performs a fifth correction process on the fifth feature map through the second model to obtain fifth correction data;

step S2.6: and fusing the corresponding correction data obtained by each module in the encoding process so as to complete the encoding process and further obtain the semantic segmentation image.

As a more preferable embodiment of the above solution, the processing of performing the correction by the space compression module and the channel compression module of the second model in step S1 and step S2 is:

suppose the input pneumothorax CT image is X ∈ R^H×W×CAssume that the basic feature obtained in the encoding process or the feature map obtained in the decoding process (collectively referred to as the feature map for the sake of description of the aspect) is U e R^H×W×C'H and W represent the height and width of the space of the image, C and C 'represent the number of input and output channels, after convolution and nonlinear transformation operation, U contains the information of the space and channel in X, after each encoding or decoding process, a second model is introduced to modify U into U',

for the spatial compression module, assume the input feature map U ═ U1, U2, … U_C]Wherein u is_i∈R^H×WRepresenting the feature map of each channel, obtaining the vector z ∈ R through the operation of Global Average Pooling (GAP)^1×1×CFor the kth channel profile, the transformation is as follows:

the transformation introduces global spatial feature information into z, and then passes z ═ W₁(δ(W₂z)) to give z', wherein

For two fully-connected networks, delta (·) is a ReLU operation, and then a normalized correction coefficient sigma (z'). epsilon. [0,1 ] is obtained through sigmod transformation]Finally, the correction coefficient is multiplied by the input feature map U to obtain U'_cSE，

U′_cSE＝[σ(z′₁)u₁,σ(z′₂)u₂,…σ(z′_C)u_C]；

(by the space compression module, information in the unimportant channels is reduced and suppressed, while information in the important channels is kept almost unchanged, so that the phase change is enhanced;

the index for measuring the importance of the channel can be obtained by compressing the information of the channel, and the index for measuring the importance of the space position can be obtained by compressing the information of the channel

For the channel compression module, assume the input signature U ═ U1, U2, … U_C]Converting to obtain U ═ U^1,1,u¹ ^,1…u^i,j…u^H,W]Wherein u is^i,j∈R^1×1×CRepresents the corresponding spatial position (i, j), by q ═ W_sqThe U convolution operation realizes the space extrusion operation, and then the normalized correction coefficient sigma (q) is obtained by sigma-mod transformation, wherein the sigma (q) belongs to [0,1 ]]Finally, the correction coefficient is multiplied by the input feature map U to obtain U'_sSE，

U′_sSE＝[σ(q^1,1)u^1,1,σ(q^1,2)u^1,1,…σ(q^h,w)u^H,W]；

Finally, the fusion information of the two is obtained by addition, and the importance of the space information and the channel information is weighted to obtain the output U'_sCSE＝U′_cSE+U′_sSE。

In order to achieve the above object, the present invention further provides a multitask pneumothorax medical image semantic segmentation model system based on the uet, which includes a memory, a processor and a computer program stored in the memory and operable on the processor, wherein the processor implements the steps of the multitask pneumothorax medical image semantic segmentation model method based on the uet when executing the computer program.

Drawings

FIG. 1 is a schematic diagram of a multitask pneumothorax medical image semantic segmentation model system based on Unet and a method thereof.

FIG. 2 is a block diagram of a second model of the Unet-based multitask pneumothorax medical image semantic segmentation model system and method thereof of the present invention.

FIG. 3 is a diagram of a multitask learning structure of the multitask pneumothorax medical image semantic segmentation model system and method thereof based on the Unet of the present invention.

Detailed Description

The following description is provided to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.

In a preferred embodiment of the present invention, those skilled in the art should note that the pneumothorax CT image and the like to which the present invention relates may be regarded as prior art.

Preferred embodiments.

The invention discloses a multitask pneumothorax medical image semantic segmentation model method based on Unet, which is used for performing semantic segmentation on a pneumothorax medical image and comprises the following steps of:

step S1: acquiring a pneumothorax CT image and carrying out coding processing on the pneumothorax CT image through a first model (a Unet model with a frame of Resnet34 and pre-training parameters) and a second model (an SCSE module structure), thereby acquiring an image second classification;

Specifically, step S1 is implemented as the following steps:

More specifically, step S2 is specifically implemented as the following steps:

step S2.1: a first module (64, 32, 32) in the encoding process upsamples (Bn1+ Relu + Maxpool) a module in which a largest pooling process in the encoding process is located to obtain a first feature map and performs a first modification process on the first feature map through a second model to obtain first modified data;

step S2.5: a fifth module (64, 512, 512) in the coding process up-samples the fourth correction data obtained by the fourth module and fuses the correction data corresponding to the module where the second layer of the first model in the coding process is located to obtain a fifth feature map and performs a fifth correction process on the fifth feature map through the second model to obtain fifth correction data;

step S2.6: and fusing the corresponding correction data obtained by each module in the coding process so as to complete the coding process and further obtain the semantic segmentation image.

Further, the processing of performing the correction by the space compression module and the channel compression module of the second model in step S1 and step S2 is:

for the spatial compression module, assume that the input eigenmap U ═ U1, U2, … U_C]Wherein u is_i∈R^H×WRepresenting the feature map of each channel, obtaining the vector z ∈ R through the operation of Global Average Pooling (GAP)^1×1×CFor the kth channel profile, the transformation is as follows:

For two fully connected networks, δ (-) is the ReLU operation, then through sigmod transformation to obtain normalized correction coefficient sigma (z') ∈ [0,1 ]]Finally, the correction coefficient is multiplied by the input feature map U to obtain U'_cSE，

U′_cSE＝[σ(z′₁)u₁,σ(z′₂)u₂,…σ(z′_C)u_C]；

(by the space compression module, information in the unimportant channels is reduced and suppressed, while information in the important channels remains almost unchanged, so that the phasing is enhanced;

For the channel compression module, assume the input signature U ═ U1, U2, … U_C]Converting to obtain U ═ U^1,1,u¹ ^,1…u^i,j…u^H,W]Wherein u is^i,j∈R^1×1×CRepresents the corresponding spatial position (i, j), by q ═ W_sqU convolution operation is used for achieving space extrusion operation, then the normalized correction coefficient sigma (q) belongs to [0,1 ] is obtained through sigmod transformation, and finally the correction coefficient is multiplied by the input feature graph U to obtain U'_sSE，

U′_sSE＝[σ(q^1,1)u^1,1,σ(q^1,2)u^1,1,…σ(q^h,w)u^H,W]；

Finally, obtaining the fusion information of the two through addition, and carrying out weight processing on the importance of the space information and the channel information, U'_sCSE＝U′_cSE+U′_sSE。

The invention also discloses a multitask pneumothorax medical image semantic segmentation model system based on Unet, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the multitask pneumothorax medical image semantic segmentation model method based on Unet when executing the computer program.

Preferably, as for the first problem in the background art, the reason is that the Unet is a semantic segmentation model for point-to-point pixel levels, and whether a lesion exists or not cannot be judged from a global perspective, so that for a small lesion, a fine prediction result cannot be obtained by simply using the Unet network for segmentation. For the second problem, the traditional solution is to increase the volume of the network, but the number of neurons is increased to design a deeper network structure, which will result in occupying very large hardware resources and also make training very difficult.

Preferably, as shown in fig. 1, the invention discloses an end-to-end semantic segmentation model for pneumothorax CT images, which adopts a coding and decoding network structure, wherein a basic feature extraction network in a coding stage is an important part when a two-class network and a semantic segmentation network are designed, and a pneumathorax CT image with a large amount of difficulty is obtained, the invention adopts a pnet model with a pretraining parameter and using Resnet34 as a frame as a feature extraction network because a method of adding a residual error network in the network solves the problems that the depth of the network reaches a certain degree, the error is increased, the effect is deteriorated, the gradient disappearance phenomenon is more obvious, and the minimum loss is difficult to realize by network back propagation. In the encoding stage, an SCSE module structure is introduced to obtain image two-classification loss, each layer of feature information in the encoding stage is fused during decoding, up-sampling is carried out, each layer of feature information which is up-sampled is fused again to obtain a semantic segmentation image, loss functions of image two-classification and image semantic segmentation are fused, the weight relation is adjusted, and a global loss function is obtained and optimized.

In fig. 1, Conv represents convolution, Bn1 represents batch normalization, Relu represents activation function, Resnet _ L1 represents Resnet34 first layer, SCSE represents Spatial-Channel sequence & Excitation, and upsampling represents upsampling.

Preferably, as shown in fig. 3, when performing semantic segmentation on a pneumothorax CT image, in order to reduce noise interference, when the summation of pixel points of the segmented image is less than a certain number, we predict that the image is free of lesion features, so that it is equivalent to performing two classification operations by using a semantic segmentation algorithm, but the semantic segmentation model and the loss function only consider the classification of a single pixel point, and do not consider the overall classification of the summation of all pixel points of the whole image^[8-9]Therefore, in the optimization process, only local optimization is performed, and overall optimization is not considered, for this problem, a multi-task learning strategy is introduced, under the condition that the whole network structure is guaranteed to be unchanged, a plurality of tasks of 'classification' and 'segmentation' are set, for two classifications, a logistic loss function loss1 is introduced, for segmentation tasks, a cross entropy loss function loss2 is introduced, and finally, a fused loss function loss1+ b loss2 is obtained. Wherein a and b are weighted values of the two.

Preferably, the invention adopts a pyrrch frame for design when designing related experiments, and a graphic workstation used for the experiments is configured as follows: 4-core CPU, memory 16G, GeForceGTX1080Ti GPU, video memory 16G, operating system Ubuntu 16.04, and the network model configuration is as follows: the optimizer adopts random gradient descent, momentum is 0.9, weight _ decade is 0.0001, Learningrate is 0.002, class weight a is 0.1, semantic segmentation loss weight b is 0.9, as a single picture is 512, the picture is large, and the batch processing size is set to 4, in order to increase the batch processing size, the invention adopts gradient accumulation to realize 'display memory expansion', 1 batch processing data is obtained each time, the gradient is calculated by a gradient descent method, the gradient is not cleared and is continuously accumulated, after a certain number of times of accumulation, network parameters are updated according to the accumulated gradient, then the gradient is cleared, and the next cycle is carried out. Under certain conditions, the larger the batch size is, the better the training effect is, the gradient accumulation realizes the phase change expansion of batch processing amount, and the accumulation times are set to be 4 times, namely, the batch processing size is set to be 16.

Preferably, when the model designed by the invention is used for predicting pneumothorax CT images, the score of the Dice similarity coefficient reaches 85.3%, taking 3 patients as an example for prediction.

The results of comparative analysis of different network models can be known, the Resnet34 is used as a coding frame, the SCSE module is introduced to improve the prediction accuracy by about 4% compared with a single Unet network, and the prediction accuracy by adopting the binary classification and semantic segmentation double tasks is improved by 1.2% compared with the prediction accuracy by adopting a single task.

Preferably, the deep learning model is a very powerful tool in the field of medical image detection, the semantic segmentation is used for predicting the focus position of a pneumothorax CT image to play a good auxiliary role in judging diseases for doctors, the semantic segmentation model designed by the invention adopts the resnet34 as a frame, the problem that the gradient disappears when the network depth reaches a certain degree is solved, the SCSE module is introduced, the information of the image is obtained from two directions of space and channel, the semantic segmentation precision is improved, the loss function adopts a classification and segmentation multi-task strategy for learning, the classification of local pixel points and the global image is fused, and the semantic segmentation precision is further improved.

It should be noted that the technical features such as the pneumothorax CT image and the like related to the patent application of the present invention should be regarded as the prior art, and the specific structure, the operation principle, the control mode and the spatial arrangement mode of the technical features may be selected conventionally in the field, and should not be regarded as the invention point of the patent application, and the patent application is not further specifically described in detail.

It will be apparent to those skilled in the art that modifications and equivalents can be made to the embodiments described above, or some features of the embodiments described above, and any modifications, equivalents, improvements, and the like, which fall within the spirit and principle of the present invention, are intended to be included within the scope of the present invention.

Claims

1. A multitask pneumothorax medical image semantic segmentation model method based on Unet is used for performing semantic segmentation on pneumothorax medical images and is characterized by comprising the following steps:

step S1: acquiring a pneumothorax CT image and coding the pneumothorax CT image through a first model and a second model so as to obtain a second classification of the image;

2. The Unet-based multitask pneumothorax medical image semantic segmentation model method according to claim 1, characterized in that step S1 is embodied as the following steps:

step S1.3: and carrying out encoding processing on the pneumothorax CT image through the first model and the second model so as to obtain an image second classification.

3. The Unet-based multitask pneumothorax medical image semantic segmentation model method according to claim 1, characterized in that step S2 is embodied as the following steps:

step S2.1: a first module in the coding process performs upsampling on a module where the largest pooling process in the coding process is located to obtain a first feature map and performs a first correction process on the first feature map through a second model to obtain first correction data;

step S2.2: a second module in the coding process performs up-sampling on the first correction data obtained by the first module and fuses correction data corresponding to a module where a fifth layer of the first model in the coding process is located to obtain a second feature map, and performs second correction processing on the second feature map through the second model to obtain second correction data;

step S2.3: a third module in the coding process performs up-sampling on second correction data obtained by the second module and fuses correction data corresponding to a module where a fourth layer of the first model in the coding process is located to obtain a third feature map, and performs third correction processing on the third feature map through the second model to obtain third correction data;

step S2.4: a fourth module in the encoding process performs up-sampling on the third correction data obtained by the third module and fuses correction data corresponding to a module where the third layer of the first model in the encoding process is located to obtain a fourth feature map, and performs a fourth correction process on the fourth feature map through the second model to obtain fourth correction data;

step S2.5: a fifth module in the encoding process performs up-sampling on fourth correction data obtained by the fourth module and fuses correction data corresponding to a module where the second layer of the first model in the encoding process is located, so as to obtain a fifth feature map, and performs fifth correction processing on the fifth feature map through the second model, so as to obtain fifth correction data;

4. The method for semantic segmentation of multitask pneumothorax medical image based on Unet claimed in claim 3, wherein the modification process by the space compression module and the channel compression module of the second model in steps S1 and S2 is:

suppose the input pneumothorax CT image is X ∈ R^H×W×CAssume that the basic feature obtained in the encoding process or the feature map obtained in the decoding process is U ∈ R^H×W×C'H and W represent the height and width of the space of the image, C and C 'represent the number of input and output channels, after convolution and nonlinear transformation operation, U contains the information of the space and channel in X, after each encoding or decoding process, a second model is introduced to modify U into U',

for the spatial compression module, assume the input feature map U ═ U1, U2, … U_C]Wherein u is_i∈R^H×WThe characteristic diagram representing each channel is subjected to global average pooling operation to obtain a vector z epsilon R^1×1×CFor the kth channel profile, the transformation is as follows:

For two fully-connected networks, delta (·) is a ReLU operation, and then a normalized correction coefficient sigma (z') ∈ [0,1 ] is obtained through sigmod transformation]Finally, the correction coefficient is multiplied by the input feature map U to obtain U'_cSE，

U′_cSE＝[σ(z′₁)u₁,σ(z'₂)u₂,…σ(z'_C)u_C]；

For the channel compression module, assume the input signature U ═ U1, U2, … U_C]Converting to obtain U ═ U^1,1,u^1,1…u^i,j…u^H,W]Wherein u is^i,j∈R^1×1×CRepresents the corresponding spatial position (i, j), by q ═ W_sqThe U convolution operation realizes the space extrusion operation, then the normalized correction coefficient sigma (q) is obtained from [0,1 ] through the sigmod transformation, and finally the correction coefficient is multiplied by the input characteristic diagramU to obtain U'_sSE，

U′_sSE＝[σ(q^1,1)u^1,1,σ(q^1,2)u^1,1,…σ(q^h,w)u^H,W]；

5. A Unet-based multitask pneumothorax medical image semantic segmentation model system, comprising a memory, a processor and a computer program stored in said memory and operable on said processor, wherein said computer program when executed by said processor implements the steps of the Unet-based multitask pneumothorax medical image semantic segmentation model method according to any one of claims 1-4.