CN114943721A

CN114943721A - Neck ultrasonic image segmentation method based on improved U-Net network

Info

Publication number: CN114943721A
Application number: CN202210638084.6A
Authority: CN
Inventors: 刘明珠; 付聪; 张晓琢
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-06-07
Filing date: 2022-06-07
Publication date: 2022-08-26

Abstract

A neck ultrasonic image segmentation method based on an improved U-Net network belongs to the field of image segmentation algorithms. The prior image segmentation method has the problem of insufficient application on medical images. A neck ultrasonic image segmentation method based on an improved U-Net network is established, a neck ultrasonic image data set is established, and the data set comprises a train part and a test part; preprocessing the data set; the preprocessing comprises filtering, image enhancement and denoising of the image; three SE model extension structures are introduced to improve a U-Net network structure; three SE model extension structures are introduced to improve the segmentation performance of the network from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of the U-Net network to perform up-sampling and down-sampling on an image respectively, and a Droupout layer is added to prevent overfitting; and performs an operation of improving the loss function. The image segmentation method established by the method of the invention improves the image segmentation effect.

Description

Neck ultrasonic image segmentation method based on improved U-Net network

Technical Field

The invention relates to a method for establishing image segmentation, in particular to a method for establishing neck ultrasonic image segmentation based on an improved U-Net network.

Background

The ultrasonic image is one of the medical diagnosis means which are commonly used at present, and has important significance for assisting doctors to diagnose the illness state. In the clinical practice of ultrasound medical diagnosis, professional medical personnel provide diagnosis suggestions and results by interpreting the position, morphology, gray scale, texture, and other relevant information of the median nerve in an ultrasound image. However, manual interpretation of ultrasound images relies heavily on the subjective knowledge of the physician, and is time-consuming and labor-intensive. With the continuous development of artificial intelligence technology, many computer image segmentation algorithms based on deep learning are applied to the field of medical images to automatically judge the position of a target focus in an image. The image segmentation is a primary link of intelligent diagnosis and can be used for well compacting and laying for subsequent quantitative analysis and disease identification.

High-precision segmentation can increase the degree of control of doctors on the neck disease condition of patients, improve the subsequent three-dimensional neck vertebra reconstruction effect and maximally restore the neck morphology of the patients. There are many common methods for medical image segmentation, such as a region segmentation-based method, an edge segmentation-based method, and the like, where U-Net has a good effect on small segmented objects and has an expandable structure, and has been drawing much attention since its introduction. However, due to the characteristics of the medical image and the influence of the imaging environment, the situations of uneven gray scale, large individual difference, large artifacts, large noise and the like can occur, the idea of the classical image segmentation method is to segment certain characteristics of the image, and the maximum utilization of image information cannot be achieved, so that the segmentation limitation is caused, and the segmentation precision is also improved. Therefore, the development and progress of the image segmentation technology have great influence on the medical image segmentation technology, and therefore, how to effectively segment the pathological image has important significance not only in academia but also in medicine. Therefore, the present invention is technically improved as follows.

Disclosure of Invention

The invention aims to solve the problem that the conventional image segmentation method is not sufficiently applied to medical images, and provides a neck ultrasonic image segmentation method based on an improved U-Net network.

A method for establishing a neck ultrasound image segmentation based on an improved U-Net network, the method comprising the following steps:

constructing a neck ultrasonic image data set, wherein the data set comprises a train part and a test part;

preprocessing the data set; the preprocessing comprises filtering, image enhancement and denoising of the image;

three SE model extension structures are introduced to improve a U-Net network structure;

three SE model expansion structures are introduced to improve the network segmentation performance from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of the U-Net network to perform up-sampling and down-sampling on an image respectively, and a Droupout layer is added to prevent overfitting; and performs an operation of improving the loss function.

Preferably, the process of the filtering operation using the wavelet threshold denoising method specifically includes:

after the wavelet transform processed by the Mallat algorithm is carried out on the signals, selecting the generated wavelet coefficient; setting a threshold value, wherein the wavelet coefficient which is larger than the threshold value is considered to be generated by a signal and is reserved, and the wavelet coefficient which is smaller than the threshold value is considered to be generated by noise and is set to be 0 so as to achieve the purpose of denoising.

Preferably, the three SE model extension structures are introduced to improve the network segmentation performance from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of a U-Net network to perform up-sampling and down-sampling on an image respectively, and a Droupout layer is added to prevent overfitting; and performing an operation to improve the loss function; the method specifically comprises the following steps:

firstly, the structure of the U-Net is as follows:

the whole network structure of the U-Net is symmetrical, and comprises an encoder for extracting a feature map from an image and a decoder constructed in the encoded feature map; the encoder follows the rules of a convolutional neural network, involving two 3 × 3 convolutional operations, a 2 × 2 max pooling operation with a step size of 2; the number of convolution kernels in the convolution layer is doubled after each downsampling, and after the last two 3 multiplied by 3 convolution operations are executed, the encoder is spliced to the decoder; on the other hand, the decoder firstly uses the transpose convolution of 2 multiplied by 2 to carry out up-sampling on the feature diagram, reduces the number of channels of the feature diagram by half, and then executes two convolution operations of 3 multiplied by 3; repeating the continuous up-sampling and two convolution operations for four times to reduce the number of convolution kernels in each stage by half, and finally executing 1 multiplied by 1 convolution operation to generate a final network prediction result; all convolutional layers in the network structure use Relu activation functions, and the last layer uses a Sigmoid activation function; the most clever operation of U-Net is to introduce long connections, four as seen by the horizontal gray arrows in the figure; before the pooling operation of the encoder, the output of the convolutional layer is connected in series with the output after the up-sampling operation, and the spliced characteristic diagram is transmitted to the next layer; these long connections can help the network to retrieve lost image information, helping to improve the performance of the model;

secondly, designing an improved U-Net network structure;

1) introducing 3 SE model extension structures which are respectively connected in series in the coding and decoding structures of the U-Net; the 1 st SE model expansion structure is a channel SE (cSE), a channel which can express the most characteristics is extracted through global pooling, and then information is fused into the original tensor; the 2 nd SE model expansion structure is space SE (aSE), a feature map is extracted to divide a feature area, and feature area information is fused into the original tensor; the 3 rd is that the SE model extension structure simultaneously performs space and channel SE (acSE), which is the combined output of cSE and sSE;

giving an input x with the number of characteristic channels being c _1, and obtaining a characteristic with the number of characteristic channels being c _2 through a series of convolution and other general transformations; the previously obtained features are recalibrated by three operations:

firstly, carrying out Squeeze operation, carrying out feature compression along spatial dimension, changing each two-dimensional feature channel into a real number, wherein the real number has a global receptive field to some extent, and the output dimension is matched with the number of input feature channels; the global distribution of the response on the characteristic channel is characterized, and the global receptive field can be obtained by a layer close to the input, which is very useful in many tasks;

secondly, performing an Excitation operation to generate a weight for each feature channel through parameters, wherein the parameters are learned to explicitly model the correlation among the feature channels;

finally, performing reweigh operation, regarding the weight output by the specification as the importance of each feature channel after feature selection, and then weighting the feature channels by channels through multiplication to the previous feature to finish the recalibration of the original feature in the channel dimension;

2) using a deformable rolling block as each unit of an encoder and a decoder, wherein the deformable rolling block simulates retinal blood vessels with different shapes and scales by learning local, dense and adaptive receptive fields, and adding a Droupout layer of the deformable rolling block as a core behind a U-Net network;

a Droupout layer is added after each convolution and de-convolution block, for a total of four inputs X _i One output y; droupout randomly subtracts some neurons in the training of each batch, sets the probability of each neural network layer for carrying out Droupout layer, takes off a part of neurons according to the corresponding probability, updates the parameters without taking off the neurons and the weights thereof during training, and retains the parameters, thereby obtaining the result of the training of the first batch.

Preferably, the operation of improving the loss function is specifically:

setting:

partial derivatives of the segmentation are solved by the Cos-Dice loss function, foreground and background balance is established, and a segmentation model is obtained:

the invention has the beneficial effects that:

(1) the invention improves the U-Net network structure, replaces the original basic convolution layer with the variable convolution block, introduces 3 SE modules which are sequentially connected in series in the variable convolution block, and adds a Droupout layer in each variable convolution block, thereby carrying out brand new improvement on U-Net. How accurately the introduced SE module acquires the importance of each feature channel. The combination of the deformable convolution block with Droupout is effective to alleviate the over-fitting problem. The deformable rolling block is used as each unit of the coder and the decoder. The deformable convolution block simulates retinal vessels of different shapes and scales by learning local, dense and adaptive receptive fields to achieve accurate segmentation. Adapting to different dimensions, shapes, orientations, etc. The same neck ultrasonic image is segmented by two U-Net networks which do not introduce an SE model structure and introduce the SE model structure respectively, the improvement of the segmentation effect caused by the space excitation and the channel excitation of the SE model structure is analyzed, and the characteristic enhancement effect is reflected from the side surface. The original convolution is replaced by a deformable convolution block and added. And performing comparative analysis on the Droupout layer and the non-added Droupout layer, then segmenting the neck ultrasonic image, and analyzing the robustness of the model and the difference of segmentation effects.

(2) The invention uses the Cos-Dice loss function as a brand new loss function, adds a K value in the Cos-Dice loss function, promotes the self finite sample similarity and the set sample similarity, and further promotes the segmentation effect according to the promotion standard. The Cos-Dice loss function is also used for solving partial derivatives of segmentation, so that weights do not need to be distributed to different classes, and the segmentation effect is improved.

Drawings

FIG. 1 is a flow chart of wavelet threshold denoising according to the present invention;

FIG. 2 is a diagram of a U-Net network architecture according to the present invention;

FIG. 3 is a schematic diagram of the SE model according to the present invention;

FIG. 4 is a block diagram of the flow structure after the combination of the deformable volume blocks with the Droupout layer in accordance with the present invention;

FIG. 5 is a sample point diagram of a deformable convolution according to the present invention and a standard convolution;

fig. 6 is a diagram of a neural network according to the present invention, assuming prior training.

Detailed Description

The first embodiment is as follows:

in this embodiment, a method for establishing a neck ultrasound image segmentation based on an improved U-Net network includes the following steps:

constructing a neck ultrasonic image data set, wherein the data set comprises a train part and a test part; the neck ultrasonic image data is acquired from a kaggle competition official website, 5000 neck ultrasonic images and marked mask images are arranged in a train part, and 5000 test images are arranged in a test part;

preprocessing the data set; the preprocessing comprises the steps of filtering, image enhancement and denoising the image, so that the image becomes clearer and is seen, and the effect of image segmentation is better and errors can be reduced;

three SE model extension structures are introduced to improve the segmentation performance of the network from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of the U-Net network to perform up-sampling and down-sampling on an image respectively, and a Droupout layer is added to prevent overfitting; and performs an operation of improving the loss function.

The invention can accurately acquire the importance degree of each characteristic channel through the introduced SE module. The problem of overfitting is effectively alleviated by the combination of the deformable convolution blocks with Droupout. In addition, parameter setting is also key to the whole algorithm.

The second embodiment is as follows:

different from the specific embodiment, in the method for establishing a neck ultrasound image segmentation method based on an improved U-Net network in the embodiment, the process of the filtering operation using the wavelet threshold denoising method specifically includes:

after the wavelet transform processed by the Mallat algorithm is carried out on the signals, selecting the generated wavelet coefficient; because the wavelet coefficient of the signal is larger after wavelet decomposition, the wavelet coefficient of the noise is smaller, and the wavelet coefficient of the noise is smaller than that of the signal. By means of the characteristic, a threshold value is set, the wavelet coefficient which is larger than the threshold value is considered to be generated by a signal and is reserved, and the wavelet coefficient which is smaller than the threshold value is considered to be generated by noise and is set to be 0, so that the purpose of denoising is achieved; the wavelet threshold denoising flow chart is shown in fig. 1.

The third concrete implementation mode:

different from the second specific embodiment, in the method for establishing the neck ultrasonic image segmentation based on the improved U-Net network, three SE model extension structures are introduced to improve the segmentation performance of the network from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of the U-Net network to perform up-sampling and down-sampling of the image respectively, and a Droupout layer is added to prevent over-fitting; and performing an operation of improving the loss function; the method specifically comprises the following steps:

firstly, the structure of the U-Net is as follows:

as shown in fig. 2, the overall network structure of U-Net is symmetrical, and includes an encoder for extracting a feature map from an image, and a decoder constructed in the encoded feature map; on the one hand, the encoder follows the rules of a convolutional neural network, involving two 3 × 3 convolutional operations, 2 × 2 max pooling operation with a step size of 2; the coding layer has four layers, after each downsampling, the number of convolution kernels in the convolution layer is doubled, and after the last two 3 multiplied by 3 convolution operations are executed, the coder is spliced to the decoder; on the other hand, the decoder firstly uses 2 × 2 transposed convolution to carry out up-sampling on the feature map, reduces the number of channels of the feature map by half, and then executes two 3 × 3 convolution operations; similar to the structure of the encoder, this successive upsampling and two convolution operations are repeated four times, so that the number of convolution kernels at each stage is reduced by half, and finally a 1 × 1 convolution operation is performed to generate a final network prediction result; all convolutional layers (except the last layer) in the network structure use Relu activation functions, and the last layer uses Sigmoid activation functions; the most clever operation of U-Net is to introduce long connections, four as seen by the horizontal gray arrows in the figure; before the pooling operation of the encoder, the output of the convolutional layer is connected in series with the output after the up-sampling operation, and the spliced characteristic diagram is transmitted to the next layer; these long connections can help the network to retrieve lost image information, helping to improve the performance of the model; the structure diagram of the U-Net network is shown in figure 2;

the CNN convolutional neural network is an image-level operation, while the FCN can perform a pixel-level operation on an image, and unlike the classic CNN which uses a full-link layer to obtain a fixed-length feature vector after convolutional layer classification (full-link layer + softmax output), the FCN can accept an input image of any size, which directly determines that the FCN is more suitable for image segmentation than the CNN.

FCN also has the following disadvantages in image segmentation:

1. the receptive field is too small to obtain global information. When neck medical image segmentation is performed, the inability to acquire global information is a fatal problem for medical segmentation and subsequent diagnosis.

2. The storage overhead is large and the calculation efficiency is low. The required storage space is 225 times of the original image, so that space is wasted, and meanwhile, a large amount of repeated calculation needs to be carried out on the computer, so that the calculation efficiency is low, and the learning efficiency is influenced.

3. The results after segmentation are not fine enough. The segmented image is too fuzzy or smooth and has no details of the target image at the segmentation position, which greatly affects the segmentation precision and is also a fatal defect for neck medical image segmentation.

In conclusion, the U-Net is selected as the segmentation selection network of the neck ultrasonic image.

Secondly, designing an improved U-Net network structure;

1) introducing 3 SE model extension structures which are respectively connected in series in the coding and decoding structures of the U-Net; the 1 st SE model expansion structure is a channel SE (cSE), a channel which can express the most characteristics is extracted through global pooling, and then information is fused into the original tensor; the 2 nd SE model expansion structure is space SE (aSE), a feature map is extracted to divide a feature area, and feature area information is fused into the original tensor; the 3 rd is that the SE model extension structure simultaneously performs space and channel SE (acSE), which is the combined output of cSE and sSE; the experimental results show that: spatial excitation produces higher gains more important to segmentation than channel excitation; scSE, although adding some computational complexity, performs better than standard networks. The SE model map is shown in FIG. 3.

The upper diagram is a schematic diagram of our proposed SE module. Giving an input x with the number of characteristic channels being c _1, and obtaining a characteristic with the number of characteristic channels being c _2 through a series of convolution and other general transformations; unlike conventional CNNs, the previously derived features are then recalibrated by three operations:

firstly, carrying out Squeeze operation, carrying out feature compression along spatial dimension, changing each two-dimensional feature channel into a real number, wherein the real number has a global receptive field to some extent, and the output dimension is matched with the input feature channel number; the global distribution of responses on the characteristic channels is characterized, and the global receptive field can be obtained by the layer close to the input, which is very useful in many tasks;

second, an Excitation operation is performed, which is a mechanism similar to the gates in the recurrent neural network. Generating a weight for each feature channel by a parameter, wherein the parameter is learned to explicitly model the correlation between feature channels;

and finally, performing Reweight operation, regarding the weight output by the Excitation as the importance of each feature channel after feature selection, and then weighting the feature channel by channel to the previous feature through multiplication to finish the recalibration of the original feature in the channel dimension.

2) And using a deformable rolling block as each unit of the encoder and the decoder, wherein the deformable rolling block simulates retinal blood vessels with different shapes and scales by learning local, dense and adaptive receptive fields so as to achieve accurate segmentation. Adapting to different dimensions, shapes, orientations, etc. Adding a Droupout layer of a deformable rolling block as a core behind the U-Net network; a flow chart of the processing section is shown in fig. 4.

The deformable convolution can realize random sampling near the current position and is not limited to the previous regular sampling relative to the regular sampling points of the standard convolution. The sampling points for the deformable convolution and the standard convolution are shown in fig. 5.

The first is the sample of the standard convolution and the other three are the samples of the deformable convolution. Sampling of the deformable convolution can be concentrated in an interested area or target in training a data set, so that the method adapts to complex geometric deformation in the current environment, and the final segmentation effect is improved.

To prevent the over-fitting (over-fitting) phenomenon, a Droupout layer is added after each volume block and anti-volume block to enhance the network model generalization force. The Droupout layer is a structure that can be used to reduce neural network overfitting. See the effect of Droupout. Fig. 6 shows the training effect after adding the droupot layer.

A total of four inputs X _i One output y; droupout randomly subtracts some neurons in the training of each batch, sets the probability of each neural network layer for carrying out Droupout layer, takes off a part of neurons according to the corresponding probability, updates parameters without taking off the neurons and weights thereof during training, and retains the parameters, thereby obtaining the result of the training of the first batch;

the U-Net specification layer enables the distribution of input data of each layer to be relatively stable, and the model learning speed is accelerated. And a Droupout layer is added here, so that parameter quantity and model complexity can be reduced to relieve overfitting, and the model is more robust.

The fourth concrete implementation mode:

different from the third embodiment, in the method for establishing a neck ultrasound image segmentation method based on an improved U-Net network according to the third embodiment, the operation of the improved loss function specifically includes:

one limitation of the Diceloss function is that FP and FN have equal detection weights, which results in a segmentation map with low recall, resulting in extreme imbalance of lesion data, minimal region of interest, and FN needs to be much higher than FP to increase recall.

Setting:

partial derivatives are obtained by the Cos-Dice loss function for segmentation, so that weights do not need to be distributed to different categories, foreground and background balance is established, the problem that when the prediction is close to the real situation, the effect is poor and vibration is caused is effectively solved, the learning process is accelerated, and the segmentation effect is improved.

Obtaining a segmentation model:

the embodiments of the present invention are disclosed as the preferred embodiments, but not limited thereto, and those skilled in the art can easily understand the spirit of the present invention and make various extensions and changes without departing from the spirit of the present invention.

Claims

1. A neck ultrasonic image segmentation method based on an improved U-Net network is established, and is characterized in that: the method comprises the following steps:

2. The method for establishing the neck ultrasonic image segmentation based on the improved U-Net network according to claim 1, wherein: the process of the filtering operation by adopting a wavelet threshold denoising method specifically comprises the following steps:

3. The method for establishing the neck ultrasonic image segmentation based on the improved U-Net network according to claim 2, characterized in that: the three SE model extension structures are introduced to improve the network segmentation performance from two dimensions of a channel and a space, a deformable convolution block is used as an encoder and an encoder of a U-Net network to perform up-sampling and down-sampling on an image respectively, and a Droupout layer is added to prevent overfitting; and performing an operation of improving the loss function; the method comprises the following specific steps:

firstly, the structure of the U-Net is as follows:

the whole network structure of the U-Net is symmetrical, and comprises an encoder for extracting a feature map from an image and a decoder constructed in the encoded feature map; the encoder follows the rules of a convolutional neural network, involving two 3 × 3 convolutional operations, a 2 × 2 max pooling operation with a step size of 2; the coding layer has four layers, after each down-sampling, the number of convolution kernels in the convolution layer is doubled, and after the last two 3 multiplied by 3 convolution operations are executed, the coder is spliced to the decoder; on the other hand, the decoder firstly uses the transpose convolution of 2 multiplied by 2 to carry out up-sampling on the feature diagram, reduces the number of channels of the feature diagram by half, and then executes two convolution operations of 3 multiplied by 3; repeating the continuous up-sampling and two convolution operations for four times to reduce the number of convolution kernels in each stage by half, and finally executing 1 multiplied by 1 convolution operation to generate a final network prediction result; all the convolution layers in the network structure use Relu activation functions, and the last layer uses Sigmoid activation functions; the most clever operation of U-Net is to introduce long connections, four as seen by the horizontal gray arrows in the figure; before the pooling operation of the encoder, the output of the convolutional layer is connected in series with the output after the up-sampling operation, and the spliced characteristic diagram is transmitted to the next layer; these long connections can help the network to retrieve lost image information, helping to improve the performance of the model;

secondly, designing an improved U-Net network structure;

firstly, carrying out Squeeze operation, carrying out feature compression along spatial dimension, changing each two-dimensional feature channel into a real number, wherein the real number has a global receptive field to some extent, and the output dimension is matched with the input feature channel number; the global distribution of the response on the characteristic channel is characterized, and the global receptive field can be obtained by a layer close to the input, which is very useful in many tasks;

secondly, performing an Excitation operation, and generating a weight for each feature channel through parameters, wherein the parameters are learned to explicitly model the correlation among the feature channels;

4. The method for establishing the neck ultrasonic image segmentation based on the improved U-Net network according to claim 3, characterized in that: the operation of the modified loss function is specifically as follows:

setting: