CN114004811A - Image segmentation method and system based on multi-scale residual error coding and decoding network - Google Patents
Image segmentation method and system based on multi-scale residual error coding and decoding network Download PDFInfo
- Publication number
- CN114004811A CN114004811A CN202111284039.7A CN202111284039A CN114004811A CN 114004811 A CN114004811 A CN 114004811A CN 202111284039 A CN202111284039 A CN 202111284039A CN 114004811 A CN114004811 A CN 114004811A
- Authority
- CN
- China
- Prior art keywords
- feature
- scale
- convolution
- unit
- decoding
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30004—Biomedical image processing
- G06T2207/30088—Skin; Dermal
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image segmentation method and system based on a multi-scale residual error coding and decoding network, which preprocesses a skin lesion image; performing data enhancement on the preprocessed image to obtain more image data, namely an input characteristic image; performing feature extraction, coding feature fusion and decoding feature fusion on the input feature graph based on a multi-scale residual coding and decoding network; performing maximum pooling, average pooling and soft pooling on the spliced features F to obtain spatial attention features of the spliced features F, performing channel attention operation on the spliced features F to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain a final segmentation result; the skin lesion segmentation network Ms has better RED segmentation effect, and particularly has good segmentation result on lesions with irregular shapes, large scale change, unobvious contrast and fuzzy boundaries; the Ms RED network designed by the invention has great application potential in clinical skin disease diagnosis.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence or deep learning, and particularly relates to an image segmentation method and system based on a multi-scale residual error coding and decoding network.
Background
Skin diseases are one of the most common diseases, with skin cancer having a very high mortality rate. For example, melanoma has a five-year survival rate of less than 15%. As a non-invasive imaging tool, the skin mirror is widely used to assist in screening and diagnosing skin lesions. However, manual screening of skin disease pictures is a time consuming and laborious task that is subjectively susceptible. To solve the above problems, computer-aided diagnosis techniques are introduced in daily diagnosis by clinicians to assist them in efficient screening and diagnosis of skin lesions. In the skin disease auxiliary diagnosis, a key step is to accurately locate the boundary of the lesion, namely the skin lesion segmentation. The accuracy of lesion segmentation may affect the accuracy of the entire aided diagnosis system. Therefore, researchers have invested tremendous efforts in designing automated skin lesion segmentation algorithms, including classical machine learning algorithms and the current popular deep learning algorithms.
The classic machine learning algorithm mainly comprises: clustering algorithms, thresholding algorithms, variability profile models, and region growing algorithms. These algorithms rely heavily on manual features and elaborate post-processing methods, resulting in poor performance in complex application scenarios. On the other hand, the deep learning method can automatically learn high-dimensional features without human interference, and the accuracy rate exceeds that of a classical machine learning algorithm, so that the deep learning method has a dominant position in the field of skin disease segmentation. The codec structure is the most popular image segmentation framework. In a codec network architecture, an encoder is usually used to extract features of an image, and a decoder reconstructs the extracted features to an output image size and outputs a final segmentation result. In the present invention, we compare our designed Ms RED with a variety of well-known segmented networks, including: FCN, U-Net, U-Net + +, AttU-Net, deep Labv3+, DenseASPP, CA-Net, DO-Net, CE-Net.
FCN: it converts the fully-connected layers in the conventional CNN into convolutional layers one by one, so that the network can adapt to inputs of any size. The subsequent deep learning segmentation algorithm is mostly constructed on the basis of the FCN.
U-Net: by combining the concepts of FCN architecture and jump connection, Ronneberger et al designs a famous U-Net network, and has achieved a great deal of success in the field of medical image segmentation.
U-Net + +: on the basis of U-Net, different layers of U-Net coding and decoding are connected, so that the network can automatically learn the importance of the features of the different layers.
AttU-Net: and an attention mechanism is introduced into the U-Net structure, so that the learning capability of the network on useful features is strengthened, and the learning of the network on invalid features is inhibited.
DeepLabv3 +: the coding and decoding network is constructed based on an ASPP structure, the ASPP represents a spatial pyramid pooling module with holes, and multi-scale features can be obtained without increasing the operation amount.
DenseASPP: in order to obtain a better boundary segmentation result, the DenseASPP uses an ASPP module with dense connection, so that features with different scales are effectively fused, and more detailed information is reserved in the feature transmission process.
CA-Net: the network comprises three attention mechanism modules including space attention, channel attention and scale attention, and can provide better segmentation performance with fewer parameter quantities.
DO-Net: and by combining the ASPP module and the LSTM module, the incidence relation between the multi-scale features is simulated, and the context information of the features is effectively extracted.
CE-Net: and combining an inclusion-ResNet-V2 module and an ASPP module to generate a multi-scale feature with a larger receptive field. The existing skin lesion segmentation algorithm based on deep learning mainly has two defects: (1) the parameter quantity of the algorithm model is huge, and the algorithm model cannot be used under the condition that the computing resources are limited; (2) the segmentation result of the lesion with fuzzy boundaries is not ideal for the lesion with large size change, irregular shape, unobvious contrast (unobvious contrast between the lesion area and the background).
The reasons for the above two disadvantages are: (a) the feature extractor in the comparison method cannot mention abundant multi-scale features, so that the segmentation effect on the pathological changes with large size change and irregular shapes is poor; traditional convolutional layers can only mention features of a certain fixed size, while ASPP modules usually extract features of four dimensions. (b) The contrast method cannot effectively fuse multi-scale features between different layers in the encoding and decoding stages, and cannot sufficiently fuse context information of the features, so that the contrast between the foreground and the background is not obvious, and the segmentation effect of the fuzzy-boundary lesion is not good.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides an image segmentation method and system based on a multi-scale residual error coding and decoding network, which comprises the following steps: a multi-scale residual error coding and decoding network (Ms RED) is provided, which has the advantages of less parameters and accurate segmentation; the parameter quantity of Ms RED is only 3.8M, the method can be used under the condition of limited operation resources, the Ms RED segmentation effect is good, and the segmentation effect of the pathological changes with large size change, irregular shape, unobvious contrast and fuzzy boundary is obviously superior to that of the similar methods.
In order to achieve the purpose, the invention adopts the technical scheme that: an image segmentation method based on a multi-scale residual error coding and decoding network comprises the following steps: preprocessing a skin lesion image;
performing data enhancement on the preprocessed image to obtain more image data, namely an input characteristic image;
performing feature extraction, encoding feature fusion and decoding feature fusion on the input feature image based on a multi-scale residual error encoding and decoding network; the multi-scale residual coding and decoding network is based on CS2-Net, a convolution module in CS2-Net is replaced by a multi-scale feature extraction module M2F2, multi-scale context information between different layers in a network coding stage is integrated by a multi-scale residual coding feature fusion module MsR-EFM, and multi-scale context information between different layers in a network decoding stage is integrated by a multi-scale residual coding feature fusion module MsR-DFM;
the multi-scale feature extraction module M2F2 is provided with four branches, each branch is provided with a convolution layer, a multi-scale feature extraction unit and a feature splitting and splicing unit, each branch respectively obtains corresponding feature maps after performing convolution and multi-scale feature extraction on an input feature map, and then obtains the multi-scale feature map needing to be extracted after the feature maps are spliced by the feature splicing unit; the multi-scale feature maps of different layers in the encoding stage are used as the input of a multi-scale residual coding feature fusion module MsR-EFM;
the multi-scale residual coding feature fusion module MsR-EFM is provided with a convolution unit, a pooling unit and a weight distribution unit, wherein the convolution unit is used for changing the channel number of the multi-scale feature map, the pooling unit is used for reducing the size of the feature map, and the weight distribution unit is used for respectively giving weights to feature images of different layers;
an up-sampling unit, a convolution unit, a pooling unit and a multi-layer perceptron are arranged in the multi-scale residual decoding feature fusion module MsR-DFM; the up-sampling unit is used for unifying the feature maps of different layers in the decoding stage to the size required by final output, and the convolution unit is used for capturing the local information of the feature maps and changing the channel number of the feature maps; the pooling unit and the multilayer perceptron are used for refining spatial information of the feature map to obtain spatial attention of the feature map, performing maximum pooling, average pooling and soft pooling on the spliced feature F to obtain spatial attention features of the feature, performing channel attention operation on the spatial attention features to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain final segmentation results.
The multi-scale feature extraction module M2F2 changes the number of channels of the input feature map to the number of channels required by the output feature map through convolution, and marks the obtained feature map as F1(ii) a Reducing the number of channels of the input feature map to 1/4 by convolution, extracting multi-scale features, and marking the obtained feature map as F2(ii) a Reducing the number of channels of the input feature map to 1/8 by convolution, then extracting multi-scale features, and marking the obtained feature map as F3(ii) a Reducing the number of convolution layers of a channel of an input feature map to 1/8, then performing convolution, extracting multi-scale features, and marking the obtained feature map as F4(ii) a Will feature chart F2、F3And F4Spliced together as F234Then, the feature map F is set234The number of channels changed to the same number as the output characteristic is denoted as F123(ii) a Final output feature F of M2F21+F123。
In the multi-scale residual coding feature fusion module MsR-EFM, the feature images of different layers in the network coding stage are firstly resampled to the same size through the convolution unit and the pooling unit to obtain the resampled feature images, and weights are respectively given to the layers to obtain the effectively fused multi-scale coding features.
In the multi-scale residual decoding feature fusion module MsR-DFM, feature images of different layers are first resampled to the same size in an upsampling mode of bilinear difference values, convolution processing is then performed, and feature images obtained after convolution processing are spliced along the dimension of a convolution channel to obtain a spliced feature F.
When the skin lesion image is preprocessed, the acquired original skin lesion image is processed to obtain an image with the size of a preset pixel, and the image is used as the input of a neural network.
And when the data is enhanced, the images obtained by preprocessing are subjected to four random operations of random rotation of-20 degrees to 20 degrees, horizontal turning, vertical turning and random cutting, so that more training data are obtained.
On the other hand, the invention also provides a skin lesion image segmentation system based on the multi-scale residual error coding and decoding network, which comprises an image data processing module and a characteristic coding and decoding network module;
the image data processing module is used for preprocessing the image and enhancing the data to obtain more training data;
the characteristic coding and decoding network module performs characteristic extraction, coding characteristic fusion and decoding characteristic fusion on the input characteristic graph based on a multi-scale residual coding and decoding network; the multi-scale residual coding and decoding network is based on CS2-Net, a convolution module in CS2-Net is replaced by a multi-scale feature extraction module M2F2, multi-scale context information between different layers in a network coding stage is integrated by a multi-scale residual coding feature fusion module MsR-EFM, and multi-scale context information between different layers in a network decoding stage is integrated by a multi-scale residual coding feature fusion module MsR-DFM;
the multi-scale feature extraction module M2F2 is provided with four branches, each branch is provided with a convolution layer, a multi-scale feature extraction unit and a feature splitting and splicing unit, each branch respectively obtains corresponding feature maps after performing convolution and multi-scale feature extraction on an input feature map, and then obtains the multi-scale feature map needing to be extracted after the feature maps are spliced by the feature splicing unit; the multi-scale feature maps of different layers in the encoding stage are used as the input of a multi-scale residual coding feature fusion module MsR-EFM;
the multi-scale residual coding feature fusion module MsR-EFM is provided with a convolution unit, a pooling unit and a weight distribution unit, wherein the convolution unit is used for changing the channel number of the multi-scale feature map, the pooling unit is used for reducing the size of the feature map, and the weight distribution unit is used for respectively giving weights to feature images of different layers;
an up-sampling unit, a convolution unit, a pooling unit and a multi-layer perceptron are arranged in the multi-scale residual decoding feature fusion module MsR-DFM; the up-sampling unit is used for unifying the feature maps of different layers in the decoding stage to the size required by final output, and the convolution unit is used for capturing the local information of the feature maps and changing the channel number of the feature maps; the pooling unit and the multilayer perceptron are used for refining spatial information of the feature map to obtain spatial attention of the feature map, performing maximum pooling, average pooling and soft pooling on the spliced feature F to obtain spatial attention features of the feature, performing channel attention operation on the spatial attention features to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain final segmentation results.
In addition, the invention provides a computer device, which comprises a processor and a memory, wherein the memory is used for storing a computer executable program, the processor reads the computer executable program from the memory and executes the computer executable program, and when the processor executes the computer executable program, the image segmentation method based on the multi-scale residual coding and decoding network can be realized.
A computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for image segmentation based on a multi-scale residual coding and decoding network according to the present invention can be implemented.
Compared with the prior art, the invention has at least the following beneficial effects:
the skin lesion segmentation network Ms designed by the invention has a small number of RED parameters, only 3.8M, and can be used under the condition of limited computing resources;
the skin lesion segmentation network Ms RED has better segmentation effect, and particularly has good segmentation result on the lesions with irregular shapes, large scale change, unobvious contrast and fuzzy boundaries;
the Ms RED network designed by the invention has huge application potential in clinical skin disease diagnosis. In addition, the Ms RED network can also be applied to other image segmentation fields.
Further, the M2F2 of the present invention can extract features of 13 different scales with much less parameter amount than the conventional convolution, and benefit from the advantage of M2F2, and the network Ms RED designed by the present invention based on M2F2 has less parameter amount and stronger learning ability.
Drawings
Fig. 1 is a network structure diagram of a multi-scale residual codec network Ms RED.
Fig. 2 is a multi-scale feature extraction module M2F 2.
FIG. 3 is a multi-scale residual coding feature fusion module MsR-EFM.
FIG. 4 is a multi-scale residual decoding feature fusion module MsR-DFM.
Fig. 5 is a diagram of the effect of segmentation of different networks.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a skin lesion segmentation network: multi-scale residual codec networks (Ms RED). From the data used; setting a network; evaluation criteria of the segmentation result; the structure of the Ms RED network; processing results five aspects elaborate the invention.
(1) Data used
The invention verifies the designed segmentation net Ms RED using two published dermatosis segmentation datasets (ISIC2018, PH 2). ISIC2018 is a large-scale dermoscopic image published by International Skin Imaging Corporation (ISIC) in 2018, which includes 2594 RGB images and has become a reference dataset for evaluation by a dermatosis segmentation algorithm. In the invention, the original image is resampled to a size of 224 x 320 and is divided into a training set, a test set and a validation set by 70%, 20% and 10%.
PH2 is a small data set containing only 200 dermoscopic images. The invention resamples the original image to 224 x 320 size, uses 80 pictures as training set, 20 as verification set and 100 as test set.
The training data comprises original data and corresponding ground truth labels: 0 denotes a background, and 1 denotes a lesion region.
(2) Experimental setup
The method is realized based on a PyTorch library, runs on an NVIDIA 2080Ti video card with a video memory of 11G, uses Adam as an optimizer, and selects Cosine AnalgingWarmRestarts (T0 is 10, Tmult is 2) according to a learning rate reduction strategy. The initial learning rate is set to 0.001 and the weight dacay is set to 0.00005. Training 250 epochs in all networks, selecting a model with the highest Jaccard Index on a verification set as a final model, and testing the performance of the model on a test set; on both ISIC2018 and PH2 datasets, a data enhancement strategy was employed to prevent overfitting. The data enhancement strategy includes four random operations: random rotation-20 to 20 degrees, horizontal flipping, vertical flipping, and random cropping. In order to obtain reliable performance of each network, the average result of five experiments is taken as the final performance by using a five-fold cross validation mode.
(3) Evaluation index
The performance of the different methods was evaluated using four indices Jaccard Index (JI), accuracy (acc), Recall and Precision.
Wherein TP, TN, FP and FN are true positive, true negative, false positive and false negative respectively.
(4) Network structure of Ms RED
FIG. 1 is a network structure diagram of a multi-scale residual coding and decoding network Ms RED, which is designed based on CS2-Net, and on the basis of CS2-Net, a multi-scale feature extraction module M2F2 is used to replace the traditional convolution operation; a multi-scale residual coding feature fusion module MsR-EFM is designed at the coding stage, and a multi-scale residual decoding feature fusion module MsR-DFM is designed at the decoding stage; MsR-EFM and MsR-DFM can adaptively fuse the multi-scale context characteristics of the coding and decoding stages, and retain more detailed information in the characteristic transfer process to provide more accurate segmentation results. Referring to fig. 2 and 3, the innovation is that: (1) in E2, E3, E4, E5, D4, D3, D2 and D1, a redesigned multi-scale feature extraction module M2F2 is adopted to replace a traditional convolution module so as to extract rich multi-scale feature information; (2) a multi-scale residual coding feature fusion module MsR-EFM is designed at the coding stage, and a multi-scale residual decoding feature fusion module MsR-DFM is designed at the decoding stage.
A multi-scale feature extraction module M2F2 is shown in FIG. 2, where in _ c represents the convolution inputOut _ c represents the number of channels of the convolution output, and kernel size represents the size of the convolution kernel. Conv (rate) denotes the fill rate of the dilated convolution. As shown in fig. 2, M2F2 has four branches, which are called branch 1, branch 2, branch 3, and branch 4 from top to bottom. On branch 1, using a 1 × 1 convolution, the number of channels of the input signature is changed to the number of channels required for outputting the signature, and the obtained signature is denoted as F1(ii) a On branch 2, firstly using a convolution of 1 × 1 to reduce the number of channels of the input feature map to 1/4 so as to reduce the parameter quantity of subsequent operation, then using a split B module with a filling rate of 1 to extract multi-scale features, and marking the obtained feature map as F2. The SplitB module consists of expansion convolution, channel splitting and channel merging technologies, and can extract rich multi-scale features with less computation; on branch 3, firstly using a convolution of 1 × 1 to reduce the number of channels of the input feature map to 1/8, then using a convolution of 3 × 3, finally using a split b module with a filling rate of 3 to extract multi-scale features, and marking the obtained feature map as F3(ii) a On branch 4, firstly using a convolution of 1 × 1 to reduce the number of channels of the input feature map to 1/8, then using two series-connected convolutions of 3 × 3, finally using a split module with a filling rate of 5 to extract multi-scale features, and marking the obtained feature map as F4(ii) a The characteristics obtained after the branches 2, 3 and 4 pass through the split modules are spliced together and are marked as F234After a 1 × 1 convolution, the number of channels is changed to the number of channels having the same output characteristics, which is denoted as F123(ii) a The final output of M2F2 is F1+F123。
M2F2 has two advantages: firstly, the quantity of the ginseng is small; secondly, rich multi-scale features can be extracted. Compared with the conventional convolution operation, the parameter quantity of the M2F2 is only one eighth of that of the conventional convolution; compared with the existing multi-scale feature extraction modules, such as the inclusion module and the ASPP module, M2F2 can extract more abundant multi-scale features. Generally speaking, the inclusion module and the ASPP module with four channels can extract features of four different sizes, while the M2F2 of the present invention can extract features of 13 different scales, and the network Ms RED designed based on M2F2 of the present invention has fewer parameters and stronger learning ability thanks to the two advantages of M2F 2.
The multi-scale residual coding feature fusion module MsR-EFM is shown in FIG. 3, and includes convolution units with convolution kernel size of 3, step size of 2, and pooling units of step size of 2. In the module, a convolution unit is used for changing the channel number of the characteristic diagram and reducing the size of the characteristic diagram by one time; the pooling unit is used to reduce the size of the feature map by a factor of two.
The multi-scale residual coding feature fusion module MsR-EFM can adaptively fuse the multi-scale features of different layers of the Ms RED network coding stage, and effectively fuse the context information of the network. In order to effectively fuse the features of different layers in the network coding stage, the feature maps need to be unified to the same size. As shown in fig. 3, the feature map size in the E1 layer is 32 × 224 × 320, the feature map size in the E2 layer is 64 × 112 × 160, the feature map size in the E3 layer is 128 × 56 × 80, and the feature map size in the E4 layer is 256 × 28 × 40. The invention first resamples E1, E2, E3 to the size of E4, i.e. 256 × 28 × 40, by convolution with step size 2 and pooling unit with step size 2. And the resampled E1, E2, E3 and E4 are respectively corresponding to different weights w1, w2, w3 and w4, so that the network can learn the importance degree of the features of different layers in a self-adaptive manner, the features of different layers are effectively fused, and more valuable feature information is provided for subsequent tasks.
As shown in fig. 4, the multi-scale residual error decoding feature fusion module MsR-DFM, the multi-scale residual error decoding feature fusion module MsR-DFM can adaptively fuse the multi-scale features of different layers at the decoding stage of the Ms RED network, effectively fuse the context information of the network, and provide a more accurate segmentation result. MsR-DFM module includes up sampling unit, convolution unit, pooling unit and multilayer perceptron; the upsampling unit is used for unifying feature maps of different layers in a decoding stage to a size required by final output, namely 224 × 320; the convolution unit is used for capturing local information of the feature map and changing the channel number of the feature map; the pooling unit and the multi-layer perceptron are used to refine the spatial information of the feature map to obtain spatial attention of the feature map.
In order to effectively fuse the features of different layers at the network decoding stage, firstly, the feature maps need to be unified to the same size, and a multi-scale residual error decoding feature fusion module MsR-DFM is shown in fig. 4, wherein the feature map size in the D4 layer is 256 × 28 × 40, the feature map size in the D3 layer is 128 × 56 × 80, the feature map size in the D2 layer is 64 × 112 × 160, and the feature map size in the D1 layer is 32 × 224 × 320. D4, D3, D2 are first resampled to the size of D1, i.e., 32 x 224 x 320, by an upsampling of bilinear difference values. Then, the convolution operation with the convolution kernel size of 3 x 3 and the number of output channels of 4 is used for uniformly converting the resampled D4, D3, D2 and D1 into a feature diagram of 4 x 224 x 320; and splicing the feature maps along the channel dimension to form a 16 × 224 × 320 feature map, which is marked as F.
And performing maximum pooling, average pooling and soft pooling on the spliced feature map F to obtain spatial attention features of the feature map F, performing channel attention operation on the feature map F to obtain a feature map after channel attention, and performing sigmoid activation operation on the feature map to obtain a final output result. The output result is a picture of size 224 × 320, representing the result of the segmentation. Fig. 3 illustrates a multi-scale residual coding feature fusion module MsR-EFM, where stride represents the step size in convolution and pooling.
(5) Experimental results, the performance of the different models on ISIC2018 and PH2 datasets are shown in table 1 and table 2, respectively. It can be seen that the model with the smallest parameter quantity is CA-Net, but the segmentation index has a larger gap from the designed Ms RED. Except CA-Net, the model of the invention has much fewer parameters than the comparison method, and the performance is better than the results obtained by other methods.
It is noted that PH2 is a small data set, with only 200 cases of data, and the present invention uses 80 cases of data as a training set and 100 cases of data as a test set. In this case Ms RED still performs well.
Experimental results prove that Ms RED is a network with few parameters, good performance and small dependence on training data. Ms RED has a distinct advantage over other methods.
TABLE 1 Performance of different models on the ISIC2018 dataset, bolded representation of the best results in the comparison method
Table 2 performance of the different models on the PH2 dataset, bolded to show the best results in the comparative method
Fig. 5 shows the segmentation effect graphs of different networks on the ISIC2018 and the PH2, the first four rows represent the segmentation effect graphs of different networks on the ISIC2018 data set, and the second four rows represent the segmentation effect graphs of different networks on the PH2 data set.
The invention also provides a skin lesion image segmentation system based on the multi-scale residual error coding and decoding network, which comprises an image data processing module and a characteristic coding and decoding network module;
the image data processing module is used for preprocessing the image and enhancing the data to obtain more training data;
the characteristic coding and decoding network module performs characteristic extraction, coding characteristic fusion and decoding characteristic fusion on the input characteristic image based on a multi-scale residual coding and decoding network; the multi-scale residual coding and decoding network is based on CS2-Net, a convolution module in CS2-Net is replaced by a multi-scale feature extraction module M2F2, multi-scale context information between different layers in a network coding stage is integrated by a multi-scale residual coding feature fusion module MsR-EFM, and multi-scale context information between different layers in a network decoding stage is integrated by a multi-scale residual coding feature fusion module MsR-DFM;
the multi-scale feature extraction module M2F2 is provided with four branches, each branch is provided with a convolution layer, a multi-scale feature extraction unit and a feature splitting and splicing unit, each branch respectively obtains corresponding feature maps after performing convolution and multi-scale feature extraction on an input feature map, and then obtains the multi-scale feature map needing to be extracted after the feature maps are spliced by the feature splicing unit; the multi-scale feature maps of different layers in the encoding stage are used as the input of a multi-scale residual coding feature fusion module MsR-EFM;
the multi-scale residual coding feature fusion module MsR-EFM is provided with a convolution unit, a pooling unit and a weight distribution unit, wherein the convolution unit is used for changing the channel number of the multi-scale feature map, the pooling unit is used for reducing the size of the feature map, and the weight distribution unit is used for respectively giving weights to feature images of different layers;
an up-sampling unit, a convolution unit, a pooling unit and a multi-layer perceptron are arranged in the multi-scale residual decoding feature fusion module MsR-DFM; the up-sampling unit is used for unifying the feature maps of different layers in the decoding stage to the size required by final output, and the convolution unit is used for capturing the local information of the feature maps and changing the channel number of the feature maps; the pooling unit and the multilayer perceptron are used for refining spatial information of the feature map to obtain spatial attention of the feature map, performing maximum pooling, average pooling and soft pooling on the spliced feature F to obtain spatial attention features of the feature, performing channel attention operation on the spatial attention features to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain final segmentation results.
In addition, the invention can also provide a computer device, which includes a processor and a memory, where the memory is used to store a computer executable program, the processor reads part or all of the computer executable program from the memory and executes the computer executable program, and when the processor executes part or all of the computer executable program, the image segmentation method based on the multi-scale residual coding and decoding network according to the invention can be implemented.
In another aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the image segmentation method based on a multi-scale residual coding and decoding network according to the present invention can be implemented.
The computer device may be a notebook computer, a desktop computer or a workstation.
The processor may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), or an off-the-shelf programmable gate array (FPGA).
The memory of the invention can be an internal storage unit of a notebook computer, a desktop computer or a workstation, such as a memory and a hard disk; external memory units such as removable hard disks, flash memory cards may also be used.
Computer-readable storage media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. The computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM).
Claims (9)
1. An image segmentation method based on multi-scale residual error coding and decoding network is characterized in that,
preprocessing a skin lesion image;
performing data enhancement on the preprocessed image to obtain more image data, namely an input characteristic image;
performing feature extraction, encoding feature fusion and decoding feature fusion on the input feature image based on a multi-scale residual error encoding and decoding network; the multi-scale residual coding and decoding network is based on CS2-Net, a convolution module in CS2-Net is replaced by a multi-scale feature extraction module M2F2, multi-scale context information between different layers in a network coding stage is integrated by a multi-scale residual coding feature fusion module MsR-EFM, and multi-scale context information between different layers in a network decoding stage is integrated by a multi-scale residual coding feature fusion module MsR-DFM;
the multi-scale feature extraction module M2F2 is provided with four branches, each branch is provided with a convolution layer, a multi-scale feature extraction unit and a feature splitting and splicing unit, each branch respectively obtains corresponding feature maps after performing convolution and multi-scale feature extraction on an input feature map, and then obtains the multi-scale feature map needing to be extracted after the feature maps are spliced by the feature splicing unit; the multi-scale feature maps of different layers in the encoding stage are used as the input of a multi-scale residual coding feature fusion module MsR-EFM;
the multi-scale residual coding feature fusion module MsR-EFM is provided with a convolution unit, a pooling unit and a weight distribution unit, wherein the convolution unit is used for changing the channel number of the multi-scale feature map, the pooling unit is used for reducing the size of the feature map, and the weight distribution unit is used for respectively giving weights to feature images of different layers;
an up-sampling unit, a convolution unit, a pooling unit and a multi-layer perceptron are arranged in the multi-scale residual decoding feature fusion module MsR-DFM; the up-sampling unit is used for unifying the feature maps of different layers in the decoding stage to the size required by final output, and the convolution unit is used for capturing the local information of the feature maps and changing the channel number of the feature maps; the pooling unit and the multilayer perceptron are used for refining spatial information of the feature map to obtain spatial attention of the feature map, performing maximum pooling, average pooling and soft pooling on the spliced feature F to obtain spatial attention features of the feature, performing channel attention operation on the spatial attention features to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain final segmentation results.
2. The image segmentation method based on multi-scale residual coding-decoding network as claimed in claim 1, whereinThe multi-scale feature extraction module M2F2 changes the number of channels of the input feature map to the number of channels required by the output feature map through convolution, and marks the obtained feature map as F1(ii) a Reducing the number of channels of the input feature map to 1/4 by convolution, extracting multi-scale features, and marking the obtained feature map as F2(ii) a Reducing the number of channels of the input feature map to 1/8 by convolution, then extracting multi-scale features, and marking the obtained feature map as F3(ii) a Reducing the number of convolution layers of a channel of an input feature map to 1/8, then performing convolution, extracting multi-scale features, and marking the obtained feature map as F4(ii) a Will feature chart F2、F3And F4Spliced together as F234Then, the feature map F is set234The number of channels changed to the same number as the output characteristic is denoted as F123(ii) a Final output feature F of M2F21+F123。
3. The image segmentation method based on the multi-scale residual coding-decoding network of claim 1, wherein in the multi-scale residual coding feature fusion module MsR-EFM, the feature images of different layers in the network coding stage are firstly resampled to the same size through the convolution unit and the pooling unit to obtain the resampled feature images, and the weights are respectively given to each layer to obtain the effectively fused multi-scale coding features.
4. The image segmentation method based on the multi-scale residual coding and decoding network as claimed in claim 1, wherein in the multi-scale residual coding and decoding feature fusion module MsR-DFM, feature images of different layers are resampled to the same size by an upsampling mode of bilinear difference values, then convolution processing is performed, feature images obtained after convolution processing are spliced along a convolution channel dimension, and a spliced feature F is obtained.
5. The image segmentation method based on the multi-scale residual coding and decoding network of claim 1, wherein when the skin lesion image is preprocessed, the obtained original skin lesion image is processed to obtain an image with a size of a preset pixel, and the image is used as an input of a neural network.
6. The image segmentation method based on the multi-scale residual coding and decoding network according to claim 1, wherein four random operations of random rotation of-20 degrees to 20 degrees, horizontal inversion, vertical inversion and random clipping are performed on the preprocessed image during data enhancement, so that more training data are obtained.
7. A skin lesion image segmentation system based on a multi-scale residual error coding and decoding network is characterized by comprising an image data processing module and a characteristic coding and decoding network module;
the image data processing module is used for preprocessing the image and enhancing the data to obtain more training data;
the characteristic coding and decoding network module performs characteristic extraction, coding characteristic fusion and decoding characteristic fusion on the input characteristic image based on a multi-scale residual coding and decoding network; the multi-scale residual coding and decoding network is based on CS2-Net, a convolution module in CS2-Net is replaced by a multi-scale feature extraction module M2F2, multi-scale context information between different layers in a network coding stage is integrated by a multi-scale residual coding feature fusion module MsR-EFM, and multi-scale context information between different layers in a network decoding stage is integrated by a multi-scale residual coding feature fusion module MsR-DFM;
the multi-scale feature extraction module M2F2 is provided with four branches, each branch is provided with a convolution layer, a multi-scale feature extraction unit and a feature splitting and splicing unit, each branch respectively obtains corresponding feature maps after performing convolution and multi-scale feature extraction on an input feature map, and then obtains the multi-scale feature map needing to be extracted after the feature maps are spliced by the feature splicing unit; the multi-scale feature maps of different layers in the encoding stage are used as the input of a multi-scale residual coding feature fusion module MsR-EFM;
the multi-scale residual coding feature fusion module MsR-EFM is provided with a convolution unit, a pooling unit and a weight distribution unit, wherein the convolution unit is used for changing the channel number of the multi-scale feature map, the pooling unit is used for reducing the size of the feature map, and the weight distribution unit is used for respectively giving weights to feature images of different layers;
an up-sampling unit, a convolution unit, a pooling unit and a multi-layer perceptron are arranged in the multi-scale residual decoding feature fusion module MsR-DFM; the up-sampling unit is used for unifying the feature maps of different layers in the decoding stage to the size required by final output, and the convolution unit is used for capturing the local information of the feature maps and changing the channel number of the feature maps; the pooling unit and the multilayer perceptron are used for refining spatial information of the feature map to obtain spatial attention of the feature map, performing maximum pooling, average pooling and soft pooling on the spliced feature F to obtain spatial attention features of the feature, performing channel attention operation on the spatial attention features to obtain features after channel attention, and performing sigmoid activation operation on the features to obtain final segmentation results.
8. A computer device, comprising a processor and a memory, wherein the memory is used for storing a computer executable program, the processor reads the computer executable program from the memory and executes the computer executable program, and the processor can realize the image segmentation method based on the multi-scale residual coding and decoding network according to any one of claims 1 to 6 when executing the computer executable program.
9. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when being executed by a processor, the computer program can implement the image segmentation method based on the multi-scale residual coding and decoding network according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111284039.7A CN114004811A (en) | 2021-11-01 | 2021-11-01 | Image segmentation method and system based on multi-scale residual error coding and decoding network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111284039.7A CN114004811A (en) | 2021-11-01 | 2021-11-01 | Image segmentation method and system based on multi-scale residual error coding and decoding network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114004811A true CN114004811A (en) | 2022-02-01 |
Family
ID=79926170
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111284039.7A Pending CN114004811A (en) | 2021-11-01 | 2021-11-01 | Image segmentation method and system based on multi-scale residual error coding and decoding network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114004811A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510469A (en) * | 2022-02-08 | 2022-05-17 | 中国电力科学研究院有限公司 | Method, device, equipment and medium for identifying bad data of power system |
CN114565770A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN114565628A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on boundary perception attention |
CN115035354A (en) * | 2022-08-12 | 2022-09-09 | 江西省水利科学院 | Reservoir water surface floater target detection method based on improved YOLOX |
CN116823852A (en) * | 2023-06-09 | 2023-09-29 | 苏州大学 | Strip-shaped skin scar image segmentation method and system based on convolutional neural network |
CN116844051A (en) * | 2023-07-10 | 2023-10-03 | 贵州师范大学 | Remote sensing image building extraction method integrating ASPP and depth residual |
-
2021
- 2021-11-01 CN CN202111284039.7A patent/CN114004811A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510469A (en) * | 2022-02-08 | 2022-05-17 | 中国电力科学研究院有限公司 | Method, device, equipment and medium for identifying bad data of power system |
CN114565770A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN114565628A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on boundary perception attention |
CN114565770B (en) * | 2022-03-23 | 2022-09-13 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN114565628B (en) * | 2022-03-23 | 2022-09-13 | 中南大学 | Image segmentation method and system based on boundary perception attention |
CN115035354A (en) * | 2022-08-12 | 2022-09-09 | 江西省水利科学院 | Reservoir water surface floater target detection method based on improved YOLOX |
CN116823852A (en) * | 2023-06-09 | 2023-09-29 | 苏州大学 | Strip-shaped skin scar image segmentation method and system based on convolutional neural network |
CN116823852B (en) * | 2023-06-09 | 2024-07-19 | 苏州大学 | Strip-shaped skin scar image segmentation method and system based on convolutional neural network |
CN116844051A (en) * | 2023-07-10 | 2023-10-03 | 贵州师范大学 | Remote sensing image building extraction method integrating ASPP and depth residual |
CN116844051B (en) * | 2023-07-10 | 2024-02-23 | 贵州师范大学 | Remote sensing image building extraction method integrating ASPP and depth residual |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114004811A (en) | Image segmentation method and system based on multi-scale residual error coding and decoding network | |
CN113077471B (en) | Medical image segmentation method based on U-shaped network | |
Dai et al. | Ms RED: A novel multi-scale residual encoding and decoding network for skin lesion segmentation | |
CN115482241A (en) | Cross-modal double-branch complementary fusion image segmentation method and device | |
CN111461232A (en) | Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning | |
CN112862830B (en) | Multi-mode image segmentation method, system, terminal and readable storage medium | |
CN106408001A (en) | Rapid area-of-interest detection method based on depth kernelized hashing | |
CN113205538A (en) | Blood vessel image segmentation method and device based on CRDNet | |
CN113344933B (en) | Glandular cell segmentation method based on multi-level feature fusion network | |
CN112884788B (en) | Cup optic disk segmentation method and imaging method based on rich context network | |
CN115375711A (en) | Image segmentation method of global context attention network based on multi-scale fusion | |
Shan et al. | SCA-Net: A spatial and channel attention network for medical image segmentation | |
CN111860528A (en) | Image segmentation model based on improved U-Net network and training method | |
CN113012163A (en) | Retina blood vessel segmentation method, equipment and storage medium based on multi-scale attention network | |
CN115620010A (en) | Semantic segmentation method for RGB-T bimodal feature fusion | |
CN112381846A (en) | Ultrasonic thyroid nodule segmentation method based on asymmetric network | |
CN110570394A (en) | medical image segmentation method, device, equipment and storage medium | |
CN112288749A (en) | Skull image segmentation method based on depth iterative fusion depth learning model | |
CN115526829A (en) | Honeycomb lung focus segmentation method and network based on ViT and context feature fusion | |
CN112634308B (en) | Nasopharyngeal carcinoma target area and organ-at-risk delineating method based on different receptive fields | |
Qiu | A new multilevel feature fusion network for medical image segmentation | |
CN114387282A (en) | Accurate automatic segmentation method and system for medical image organs | |
CN117726872A (en) | Lung CT image classification method based on multi-view multi-task feature learning | |
Suo et al. | Cross-level collaborative context-aware framework for medical image segmentation | |
CN110992320B (en) | Medical image segmentation network based on double interleaving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |