CN114821580A - Noise-containing image segmentation method by stage-by-stage merging with denoising module - Google Patents
Noise-containing image segmentation method by stage-by-stage merging with denoising module Download PDFInfo
- Publication number
- CN114821580A CN114821580A CN202210497742.4A CN202210497742A CN114821580A CN 114821580 A CN114821580 A CN 114821580A CN 202210497742 A CN202210497742 A CN 202210497742A CN 114821580 A CN114821580 A CN 114821580A
- Authority
- CN
- China
- Prior art keywords
- feature map
- segmentation
- stage
- denoising
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000003709 image segmentation Methods 0.000 title claims description 7
- 230000011218 segmentation Effects 0.000 claims abstract description 73
- 230000004927 fusion Effects 0.000 claims abstract description 8
- 238000010586 diagram Methods 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 2
- 238000005192 partition Methods 0.000 abstract 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a segmentation method of a noisy image merged into a denoising module in stages, which comprises the steps of firstly inputting a noisy image into a backbone network, and extracting feature maps of four stages through convolution operation; secondly, obtaining a preliminary semantic segmentation result by a double attention mechanism on the feature map extracted in the fourth stage. On the basis, by means of feature difference of different stages of a backbone network, through iterative fusion of multi-stage semantic features, a denoising assisting partition mode is formed, and a denoising assisting mode is partitioned; and finally, combining the obtained three semantic segmentation results to form a final segmentation result, and further optimizing parameters through mixed cross entropy loss. The invention utilizes the cooperative denoising and segmentation to improve the semantic segmentation precision of the noise image, and solves the problems that the denoising link in the existing semantic segmentation method aiming at the noise image loses semantic information, so that the accuracy of subsequent target category segmentation and the segmentation integrity of the target contour are influenced.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a noise-containing image segmentation method by stage merging a denoising module.
Background
The goal of semantic segmentation is to determine the classification of each pixel (e.g., belonging to the background, person or car, etc.) and thereby convert some of the original image into a mask with a highlighted region of interest. At present, many advanced segmentation methods have been applied in various fields, such as automatic driving, scene parsing, target detection, human-computer interaction, and the like. Recently, deep neural networks have achieved significant success in the task of semantic segmentation. Such as PSANet, DenseAPP, and DANet. However, the success of these networks is premised on a high quality training data set, i.e., clean, noise-free images. However, in practical applications, due to the environment, focus failure and shake of the camera during shooting, the image captured by the image capturing apparatus often has different degrees of noise information, such as: gaussian noise, short noise, thermal noise, etc. These noise information are not controllable, and even a sophisticated image capturing apparatus cannot control the environment when a real image is captured. Noise information tends to cover some small textures of the image, reducing the ability to semantically segment the model. When a noisy data set is used to train a semantic segmentation model of the current mainstream, the segmentation accuracy is significantly reduced. In order to reduce the interference of noise on semantic information, the most direct method is to complete an image denoising task before performing a semantic segmentation task. A series connection method of one-step in-place denoising and one-step in-place segmentation is adopted, although the method can remove noise, some small texture semantic information is lost correspondingly, and the result of training a segmentation model by using a clean image still cannot be achieved. When the DANet is trained by using a data set with noise, the noise information seriously interferes the image texture structure, the boundary part of a target is influenced, so that the context information cannot be accurately acquired, the target area is wrongly divided, and the semantic segmentation precision is obviously reduced. A simple series architecture is formed by adding a denoising module to the DANet, and although the damage of noise information to a target texture structure can be weakened, the loss of partial texture information is inevitably caused, so that the error positioning is generated in the semantic segmentation in the next step, and the segmentation of a target region is incomplete.
Disclosure of Invention
In view of the above, the present invention aims to provide a noisy image segmentation method that incorporates a denoising module in stages, which improves the semantic segmentation precision of a noisy image by using cooperative denoising and segmentation, and solves the problem that the semantic information is lost in the denoising link in the existing noisy image semantic segmentation method, so that the accuracy of subsequent target class division and the segmentation integrity of a target contour are affected.
In order to achieve the purpose, the invention adopts the following technical scheme: a noise-containing image segmentation method by stage merging into a denoising module comprises the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: generating a feature map f by the backbone network Stage4 4 Inputting the data into a DAM (double attention Module), refining the characteristics, and outputting a primary segmentation result z 1 ;
Step S4: generating a feature map f by the backbone network Stage3 3 And a preliminary segmentation result z 1 Input to segmentation based on staged collaborationIn noise block SDBSC; firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2 ;
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a segmentation and denoising block SDBSC based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Can be expressed as:
wherein y is i A group Truth representing the pixel i,representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Step S7: by splitting label z pairsCalculating the Mixed Cross entropy loss L S Can be represented asWherein Ls 1 Represents the cross entropy loss:
where p represents the number of pixels of a picture,represents the group Truth class of pixel i,representing the probability estimate, Ls, of the pixel i 2 Represents mIOU loss:
where X denotes a predicted pixel set and Y denotes a GT pixel set.
In a preferred embodiment: the method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S3, the method further includes:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C (ii) a Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW It can be expressed as:
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Representing the jth position of matrix a. Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C (ii) a Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C It can be expressed as:
wherein N is ji Representing the association of the ith location and the jth channel in the profile,to representWill the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C ×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
wherein Q j The jth position of the characteristic diagram Q is shown.
In a preferred embodiment: in S4, the method further includes:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a global average pooling downsampling mode to obtain a feature map R 4C×H/4×W/4 Is obtained toLow-level features of noisy images;
step S42: respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
wherein α (·) andfor a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 Stage represents the profile of the backbone network output, S out Output representing last stage semantic segmentationoutput is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
Compared with the prior art, the invention has the following beneficial effects:
1) by introducing the idea of multi-scale and utilizing the cavity convolution, the problem that the small texture structure of the image is covered by noise is better solved. Moreover, the semantic information of each stage, the low-level characteristics in a denoising link and the segmentation result are effectively combined by utilizing linear conversion through converting the staged segmentation into a denoising method, so that the semantic segmentation promotes the image denoising;
2) the method for enhancing the target contour semantic information by utilizing the high-level semantic features and the low-level semantic features extracted from each stage in the backbone network realizes denoising and segmentation of the image with noise in stages and helps denoising and segmentation tasks from different levels.
Drawings
Fig. 1 is a flowchart of a noisy image segmentation method by stage-wise merging into a denoising module in a preferred embodiment of the present invention.
FIG. 2 is a graph comparing semantic segmentation algorithms with other noisy images in a preferred embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the method for segmenting a noisy image by incorporating a denoising module in stages according to the present invention is implemented by the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: generating a feature map f of a backbone network Stage4 4 Inputting the data into a Double Attention Module (DAM), thinning the characteristics and outputting a preliminary segmentation result z 1;
step S4: generating a feature map f of a backbone network Stage3 3 And preliminarySegmentation result z 1 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stage synergy. Firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2 ;
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Can be expressed as:
wherein y is i A group Truth representing the pixel i,representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Step S7: by splitting label z pairsCalculating the Mixed Cross entropy loss L S Can be represented asWherein Ls 1 Represents the cross entropy loss:
where p represents the number of pixels of a picture,represents the group Truth class of pixel i,representing the probability estimate, Ls, of the pixel i 2 Represents mIOU loss:
where X denotes a predicted pixel set and Y denotes a GT pixel set.
In step S3, the method further includes the steps of:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C . Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW It can be expressed as:
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Representing the jth position of matrix a. Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C . Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C It can be expressed as:
wherein N is ji Representing the association of the ith location and the jth channel in the profile,to representWill the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C ×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
wherein Q j The jth position of the characteristic diagram Q is shown.
In step S4, the method further includes the steps of:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a downsampling mode of global average pooling to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
step S42: will make an uproarThe low-level features of the sound image are respectively subjected to cavity convolution by different expansion factors (rate3, rate6 and rate9), and multi-scale information is fused to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
wherein α (·) andfor a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 Stage represents the profile of the backbone network output, S out Output representing last stage semantic segmentationoutput is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
The following is a specific embodiment of the present invention.
The application of the algorithm provided by the invention to the semantic segmentation of the noisy image comprises the following specific steps:
1. in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
2. Noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
3. generating a feature map f of a backbone network Stage4 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C . Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW ;
4. Generating a feature map f of a backbone network Stage4 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C . Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of softmax layer C×C ;
5. Feature map M of spatial attention HW×HW And channel attention profile N C×C Are respectively reacted with A C×HW Matrix multiplication is carried out, and then a primary segmentation result z1 is obtained through addition after reshaping;
6. generating a feature map f of a backbone network Stage3 3 And a preliminary segmentation result z 1 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the staged collaboration;
7. for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a global average pooling downsampling mode to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
8. respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 By the linear transformation formula:obtaining a fused feature map;
9. in the decoding stage of the denoising link, the up-sampling adopts a deconvolution mode, and simultaneously, a linear conversion formula is applied in the last link of the decoding stage to obtain an image after preliminary denoising.
10. Passing the denoised image through a backbone network ResNet50, and then repeating the steps 3) -5) to obtain a primary segmentation result z 2 。
11. Generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stage synergy, repeating the steps 7) -10) to obtain a new segmentation result z 3 And denoised images.
12. Segmenting the result z in stages 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Figure 2 is a graph showing the qualitative comparison of the algorithm of this example with other methods on a PASCAL VOC 2012 data set with a gaussian noise standard deviation of 20. As can be seen from the three columns of fig. 2(c) (d) (e), after the denoising module is added, the semantic segmentation quality is still not satisfactory, the target range is identified incorrectly, and the boundary of the target is not accurate and smooth, fig. 2(f) shows the result of the DMS, the problem of incorrect identification is improved, but the boundary area is not identified well. Fig. 2(g) shows the result of the algorithm in this embodiment, and it is obvious that the segmentation can be performed well no matter in the case of multiple classes or small targets. Moreover, as can be seen from the images in the 4 th and 5 th rows, the algorithm in the present embodiment achieves a significant improvement in the target boundary.
Claims (3)
1. A noise-containing image segmentation method by stage merging into a denoising module is characterized by comprising the following steps: the method comprises the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: feature map f generated by Stage4 of backbone network 4 Inputting the data into a DAM (double attention Module), refining the characteristics, and outputting a primary segmentation result z 1 ;
Step S4: generating a feature map f by the backbone network Stage3 3 And a preliminary segmentation result z 1 Inputting the data into a segmentation de-noising block SDBSC based on the stage cooperation; firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2 ;
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a segmentation and denoising block SDBSC based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Expressed as:
wherein y is i A group Truth representing the pixel i,representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Step S7: by splitting label z pairsCalculating the Mixed Cross entropy loss L S Is shown asWhereinRepresents the cross entropy loss:
where p represents the number of pixels of a picture,represents the group Truth class of pixel i,representing the probability estimate for the pixel i,represents mIOU loss:
where X denotes a predicted pixel set and Y denotes a GT pixel set.
2. The method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S3, the method further includes:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C (ii) a Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW Expressed as:
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Represents the jth position of matrix A; will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C (ii) a Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C Expressed as:
wherein N is ji Representing the association of the ith location and the jth channel in the profile,to representWill the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
wherein Q j The jth position of the characteristic diagram Q is shown.
3. The method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S4, the method further includes:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a downsampling mode of global average pooling to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
step S42: respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
wherein α (·) andfor a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 ,stage represents a characteristic diagram of the backbone network output, S out Output representing last stage semantic segmentationoutput is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210497742.4A CN114821580A (en) | 2022-05-09 | 2022-05-09 | Noise-containing image segmentation method by stage-by-stage merging with denoising module |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210497742.4A CN114821580A (en) | 2022-05-09 | 2022-05-09 | Noise-containing image segmentation method by stage-by-stage merging with denoising module |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114821580A true CN114821580A (en) | 2022-07-29 |
Family
ID=82513898
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210497742.4A Pending CN114821580A (en) | 2022-05-09 | 2022-05-09 | Noise-containing image segmentation method by stage-by-stage merging with denoising module |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114821580A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115222630A (en) * | 2022-08-09 | 2022-10-21 | 中国科学院自动化研究所 | Image generation method, and training method and device of image denoising model |
CN115578360A (en) * | 2022-10-24 | 2023-01-06 | 电子科技大学 | Multi-target semantic segmentation method for ultrasonic cardiogram |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084234A (en) * | 2019-03-27 | 2019-08-02 | 东南大学 | A kind of sonar image target identification method of Case-based Reasoning segmentation |
CN112819705A (en) * | 2021-01-13 | 2021-05-18 | 西安交通大学 | Real image denoising method based on mesh structure and long-distance correlation |
CN113808032A (en) * | 2021-08-04 | 2021-12-17 | 北京交通大学 | Multi-stage progressive image denoising algorithm |
WO2022083026A1 (en) * | 2020-10-21 | 2022-04-28 | 华中科技大学 | Ultrasound image denoising model establishing method and ultrasound image denoising method |
-
2022
- 2022-05-09 CN CN202210497742.4A patent/CN114821580A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084234A (en) * | 2019-03-27 | 2019-08-02 | 东南大学 | A kind of sonar image target identification method of Case-based Reasoning segmentation |
WO2022083026A1 (en) * | 2020-10-21 | 2022-04-28 | 华中科技大学 | Ultrasound image denoising model establishing method and ultrasound image denoising method |
CN112819705A (en) * | 2021-01-13 | 2021-05-18 | 西安交通大学 | Real image denoising method based on mesh structure and long-distance correlation |
CN113808032A (en) * | 2021-08-04 | 2021-12-17 | 北京交通大学 | Multi-stage progressive image denoising algorithm |
Non-Patent Citations (2)
Title |
---|
张蓉;赵昆淇;顾凯;: "基于卷积神经网络的道路图像语义分割", 计算机与数字工程, no. 07, 20 July 2020 (2020-07-20), pages 231 - 234 * |
黄琳 等: ""多尺度多阶段特征融合的带噪图像语义分割"", 《计算机系统应用》, vol. 32, no. 3, 31 March 2023 (2023-03-31), pages 58 - 69 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115222630A (en) * | 2022-08-09 | 2022-10-21 | 中国科学院自动化研究所 | Image generation method, and training method and device of image denoising model |
CN115578360A (en) * | 2022-10-24 | 2023-01-06 | 电子科技大学 | Multi-target semantic segmentation method for ultrasonic cardiogram |
CN115578360B (en) * | 2022-10-24 | 2023-12-26 | 电子科技大学 | Multi-target semantic segmentation method for ultrasonic cardiac image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
Tian et al. | Deep learning on image denoising: An overview | |
CN108154118B (en) | A kind of target detection system and method based on adaptive combined filter and multistage detection | |
CN111340844B (en) | Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism | |
CN109726627B (en) | Neural network model training and universal ground wire detection method | |
CN114821580A (en) | Noise-containing image segmentation method by stage-by-stage merging with denoising module | |
CN109166102A (en) | It is a kind of based on critical region candidate fight network image turn image interpretation method | |
CN113870335A (en) | Monocular depth estimation method based on multi-scale feature fusion | |
CN111091503A (en) | Image out-of-focus blur removing method based on deep learning | |
CN114331886B (en) | Image deblurring method based on depth features | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
JP6857369B2 (en) | CNN learning method and learning device, test method and test device using it | |
CN113052775B (en) | Image shadow removing method and device | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN111145102A (en) | Synthetic aperture radar image denoising method based on convolutional neural network | |
CN111626134A (en) | Dense crowd counting method, system and terminal based on hidden density distribution | |
CN114943894A (en) | ConvCRF-based high-resolution remote sensing image building extraction optimization method | |
CN113673562A (en) | Feature enhancement method, target segmentation method, device and storage medium | |
CN112633429A (en) | Method for recognizing handwriting choice questions of students | |
CN117542045B (en) | Food identification method and system based on space-guided self-attention | |
CN117934308A (en) | Lightweight self-supervision monocular depth estimation method based on graph convolution network | |
WO2020093210A1 (en) | Scene segmentation method and system based on contenxtual information guidance | |
CN110580712B (en) | Improved CFNet video target tracking method using motion information and time sequence information | |
CN110598614B (en) | Related filtering target tracking method combined with particle filtering | |
CN113096133A (en) | Method for constructing semantic segmentation network based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |