CN114821580A - Noise-containing image segmentation method by stage-by-stage merging with denoising module - Google Patents

Noise-containing image segmentation method by stage-by-stage merging with denoising module Download PDF

Info

Publication number
CN114821580A
CN114821580A CN202210497742.4A CN202210497742A CN114821580A CN 114821580 A CN114821580 A CN 114821580A CN 202210497742 A CN202210497742 A CN 202210497742A CN 114821580 A CN114821580 A CN 114821580A
Authority
CN
China
Prior art keywords
feature map
segmentation
stage
denoising
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210497742.4A
Other languages
Chinese (zh)
Inventor
陈飞
黄琳
曾勋勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202210497742.4A priority Critical patent/CN114821580A/en
Publication of CN114821580A publication Critical patent/CN114821580A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a segmentation method of a noisy image merged into a denoising module in stages, which comprises the steps of firstly inputting a noisy image into a backbone network, and extracting feature maps of four stages through convolution operation; secondly, obtaining a preliminary semantic segmentation result by a double attention mechanism on the feature map extracted in the fourth stage. On the basis, by means of feature difference of different stages of a backbone network, through iterative fusion of multi-stage semantic features, a denoising assisting partition mode is formed, and a denoising assisting mode is partitioned; and finally, combining the obtained three semantic segmentation results to form a final segmentation result, and further optimizing parameters through mixed cross entropy loss. The invention utilizes the cooperative denoising and segmentation to improve the semantic segmentation precision of the noise image, and solves the problems that the denoising link in the existing semantic segmentation method aiming at the noise image loses semantic information, so that the accuracy of subsequent target category segmentation and the segmentation integrity of the target contour are influenced.

Description

Noise-containing image segmentation method by stage-by-stage merging with denoising module
Technical Field
The invention relates to the technical field of computer vision, in particular to a noise-containing image segmentation method by stage merging a denoising module.
Background
The goal of semantic segmentation is to determine the classification of each pixel (e.g., belonging to the background, person or car, etc.) and thereby convert some of the original image into a mask with a highlighted region of interest. At present, many advanced segmentation methods have been applied in various fields, such as automatic driving, scene parsing, target detection, human-computer interaction, and the like. Recently, deep neural networks have achieved significant success in the task of semantic segmentation. Such as PSANet, DenseAPP, and DANet. However, the success of these networks is premised on a high quality training data set, i.e., clean, noise-free images. However, in practical applications, due to the environment, focus failure and shake of the camera during shooting, the image captured by the image capturing apparatus often has different degrees of noise information, such as: gaussian noise, short noise, thermal noise, etc. These noise information are not controllable, and even a sophisticated image capturing apparatus cannot control the environment when a real image is captured. Noise information tends to cover some small textures of the image, reducing the ability to semantically segment the model. When a noisy data set is used to train a semantic segmentation model of the current mainstream, the segmentation accuracy is significantly reduced. In order to reduce the interference of noise on semantic information, the most direct method is to complete an image denoising task before performing a semantic segmentation task. A series connection method of one-step in-place denoising and one-step in-place segmentation is adopted, although the method can remove noise, some small texture semantic information is lost correspondingly, and the result of training a segmentation model by using a clean image still cannot be achieved. When the DANet is trained by using a data set with noise, the noise information seriously interferes the image texture structure, the boundary part of a target is influenced, so that the context information cannot be accurately acquired, the target area is wrongly divided, and the semantic segmentation precision is obviously reduced. A simple series architecture is formed by adding a denoising module to the DANet, and although the damage of noise information to a target texture structure can be weakened, the loss of partial texture information is inevitably caused, so that the error positioning is generated in the semantic segmentation in the next step, and the segmentation of a target region is incomplete.
Disclosure of Invention
In view of the above, the present invention aims to provide a noisy image segmentation method that incorporates a denoising module in stages, which improves the semantic segmentation precision of a noisy image by using cooperative denoising and segmentation, and solves the problem that the semantic information is lost in the denoising link in the existing noisy image semantic segmentation method, so that the accuracy of subsequent target class division and the segmentation integrity of a target contour are affected.
In order to achieve the purpose, the invention adopts the following technical scheme: a noise-containing image segmentation method by stage merging into a denoising module comprises the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: generating a feature map f by the backbone network Stage4 4 Inputting the data into a DAM (double attention Module), refining the characteristics, and outputting a primary segmentation result z 1
Step S4: generating a feature map f by the backbone network Stage3 3 And a preliminary segmentation result z 1 Input to segmentation based on staged collaborationIn noise block SDBSC; firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a segmentation and denoising block SDBSC based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Can be expressed as:
Figure BDA0003633476450000031
wherein y is i A group Truth representing the pixel i,
Figure BDA0003633476450000032
representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Figure BDA0003633476450000033
Step S7: by splitting label z pairs
Figure BDA0003633476450000034
Calculating the Mixed Cross entropy loss L S Can be represented as
Figure BDA0003633476450000035
Wherein Ls 1 Represents the cross entropy loss:
Figure BDA0003633476450000037
where p represents the number of pixels of a picture,
Figure BDA0003633476450000038
represents the group Truth class of pixel i,
Figure BDA0003633476450000039
representing the probability estimate, Ls, of the pixel i 2 Represents mIOU loss:
Figure BDA00036334764500000310
where X denotes a predicted pixel set and Y denotes a GT pixel set.
In a preferred embodiment: the method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S3, the method further includes:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C (ii) a Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW It can be expressed as:
Figure BDA0003633476450000041
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Representing the jth position of matrix a. Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
Figure BDA0003633476450000042
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C (ii) a Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C It can be expressed as:
Figure BDA0003633476450000043
wherein N is ji Representing the association of the ith location and the jth channel in the profile,
Figure BDA0003633476450000044
to represent
Figure BDA0003633476450000045
Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C ×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
Figure BDA0003633476450000046
wherein Q j The jth position of the characteristic diagram Q is shown.
In a preferred embodiment: in S4, the method further includes:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a global average pooling downsampling mode to obtain a feature map R 4C×H/4×W/4 Is obtained toLow-level features of noisy images;
step S42: respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
Figure BDA0003633476450000051
wherein α (·) and
Figure BDA0003633476450000052
for a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 Stage represents the profile of the backbone network output, S out Output representing last stage semantic segmentation
Figure BDA0003633476450000053
output is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
Compared with the prior art, the invention has the following beneficial effects:
1) by introducing the idea of multi-scale and utilizing the cavity convolution, the problem that the small texture structure of the image is covered by noise is better solved. Moreover, the semantic information of each stage, the low-level characteristics in a denoising link and the segmentation result are effectively combined by utilizing linear conversion through converting the staged segmentation into a denoising method, so that the semantic segmentation promotes the image denoising;
2) the method for enhancing the target contour semantic information by utilizing the high-level semantic features and the low-level semantic features extracted from each stage in the backbone network realizes denoising and segmentation of the image with noise in stages and helps denoising and segmentation tasks from different levels.
Drawings
Fig. 1 is a flowchart of a noisy image segmentation method by stage-wise merging into a denoising module in a preferred embodiment of the present invention.
FIG. 2 is a graph comparing semantic segmentation algorithms with other noisy images in a preferred embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application; as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the method for segmenting a noisy image by incorporating a denoising module in stages according to the present invention is implemented by the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: generating a feature map f of a backbone network Stage4 4 Inputting the data into a Double Attention Module (DAM), thinning the characteristics and outputting a preliminary segmentation result z 1;
step S4: generating a feature map f of a backbone network Stage3 3 And preliminarySegmentation result z 1 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stage synergy. Firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Can be expressed as:
Figure BDA0003633476450000071
wherein y is i A group Truth representing the pixel i,
Figure BDA0003633476450000072
representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Figure BDA0003633476450000073
Step S7: by splitting label z pairs
Figure BDA0003633476450000074
Calculating the Mixed Cross entropy loss L S Can be represented as
Figure BDA0003633476450000075
Wherein Ls 1 Represents the cross entropy loss:
Figure BDA0003633476450000076
where p represents the number of pixels of a picture,
Figure BDA0003633476450000077
represents the group Truth class of pixel i,
Figure BDA0003633476450000078
representing the probability estimate, Ls, of the pixel i 2 Represents mIOU loss:
Figure BDA0003633476450000079
where X denotes a predicted pixel set and Y denotes a GT pixel set.
In step S3, the method further includes the steps of:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C . Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW It can be expressed as:
Figure BDA0003633476450000081
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Representing the jth position of matrix a. Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
Figure BDA0003633476450000082
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C ×HW Characteristic diagram B HW×C . Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C It can be expressed as:
Figure BDA0003633476450000083
wherein N is ji Representing the association of the ith location and the jth channel in the profile,
Figure BDA0003633476450000084
to represent
Figure BDA0003633476450000085
Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C ×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
Figure BDA0003633476450000086
wherein Q j The jth position of the characteristic diagram Q is shown.
In step S4, the method further includes the steps of:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a downsampling mode of global average pooling to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
step S42: will make an uproarThe low-level features of the sound image are respectively subjected to cavity convolution by different expansion factors (rate3, rate6 and rate9), and multi-scale information is fused to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
Figure BDA0003633476450000091
wherein α (·) and
Figure BDA0003633476450000092
for a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 Stage represents the profile of the backbone network output, S out Output representing last stage semantic segmentation
Figure BDA0003633476450000093
output is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
The following is a specific embodiment of the present invention.
The application of the algorithm provided by the invention to the semantic segmentation of the noisy image comprises the following specific steps:
1. in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
2. Noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
3. generating a feature map f of a backbone network Stage4 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C . Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW
4. Generating a feature map f of a backbone network Stage4 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C . Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of softmax layer C×C
5. Feature map M of spatial attention HW×HW And channel attention profile N C×C Are respectively reacted with A C×HW Matrix multiplication is carried out, and then a primary segmentation result z1 is obtained through addition after reshaping;
6. generating a feature map f of a backbone network Stage3 3 And a preliminary segmentation result z 1 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the staged collaboration;
7. for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a global average pooling downsampling mode to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
8. respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 By the linear transformation formula:
Figure BDA0003633476450000101
obtaining a fused feature map;
9. in the decoding stage of the denoising link, the up-sampling adopts a deconvolution mode, and simultaneously, a linear conversion formula is applied in the last link of the decoding stage to obtain an image after preliminary denoising.
10. Passing the denoised image through a backbone network ResNet50, and then repeating the steps 3) -5) to obtain a primary segmentation result z 2
11. Generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a Segmentation and Denoising Block (SDBSC) based on the stage synergy, repeating the steps 7) -10) to obtain a new segmentation result z 3 And denoised images.
12. Segmenting the result z in stages 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Figure BDA0003633476450000102
13. Computing mean square error loss for denoised images
Figure BDA0003633476450000103
14. By splitting label z pairs
Figure BDA0003633476450000111
Calculating the mixed cross entropy loss:
Figure BDA0003633476450000112
wherein
Figure BDA0003633476450000113
Figure 2 is a graph showing the qualitative comparison of the algorithm of this example with other methods on a PASCAL VOC 2012 data set with a gaussian noise standard deviation of 20. As can be seen from the three columns of fig. 2(c) (d) (e), after the denoising module is added, the semantic segmentation quality is still not satisfactory, the target range is identified incorrectly, and the boundary of the target is not accurate and smooth, fig. 2(f) shows the result of the DMS, the problem of incorrect identification is improved, but the boundary area is not identified well. Fig. 2(g) shows the result of the algorithm in this embodiment, and it is obvious that the segmentation can be performed well no matter in the case of multiple classes or small targets. Moreover, as can be seen from the images in the 4 th and 5 th rows, the algorithm in the present embodiment achieves a significant improvement in the target boundary.

Claims (3)

1. A noise-containing image segmentation method by stage merging into a denoising module is characterized by comprising the following steps: the method comprises the following steps:
step S1: in a clean PASCAL VOC 2012 data set y (1) ,y (2) ,...,y (m) On the mean value of random superposition 0, standard deviation 0,30]The Gaussian noise of (2) yields a training set of noisy images { x } (1) ,x (2) ,...,x (m) };
Step S2: noise image x (i) Inputting the data into a backbone network ResNet50, and sequentially passing through Stage1, Stage2, Stage3 and Stage4 to extract the characteristics of each Stage;
step S3: feature map f generated by Stage4 of backbone network 4 Inputting the data into a DAM (double attention Module), refining the characteristics, and outputting a primary segmentation result z 1
Step S4: generating a feature map f by the backbone network Stage3 3 And a preliminary segmentation result z 1 Inputting the data into a segmentation de-noising block SDBSC based on the stage cooperation; firstly, the stage characteristic, the segmentation result and the multi-scale characteristic of the denoising task of the backbone network are combined through a linear transformation formula to generate a new characteristic diagram, and then a new segmentation result z is generated through a segmentation module SSM in the SDBSC 2
Step S5: generating a feature map f of a backbone network Stage2 2 And a preliminary segmentation result z 2 Inputting the data into a segmentation and denoising block SDBSC based on the stepwise cooperation, repeating the step S4, and generating a new segmentation result z 3 And through the clean image y (i) Calculating the mean square error loss L of the denoised image d ,L d Expressed as:
Figure FDA0003633476440000011
wherein y is i A group Truth representing the pixel i,
Figure FDA0003633476440000012
representing the probability estimation of the pixel i, and n represents the number of pixel points;
step S6: finally, the staged segmentation result z is obtained 1 z 2 z 3 Performing superposition to generate semantic segmentation result of multi-stage feature fusion
Figure FDA0003633476440000013
Step S7: by splitting label z pairs
Figure FDA0003633476440000016
Calculating the Mixed Cross entropy loss L S Is shown as
Figure FDA0003633476440000014
Wherein
Figure FDA0003633476440000015
Represents the cross entropy loss:
Figure FDA0003633476440000021
where p represents the number of pixels of a picture,
Figure FDA0003633476440000022
represents the group Truth class of pixel i,
Figure FDA0003633476440000023
representing the probability estimate for the pixel i,
Figure FDA0003633476440000024
represents mIOU loss:
Figure FDA0003633476440000025
where X denotes a predicted pixel set and Y denotes a GT pixel set.
2. The method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S3, the method further includes:
step S31: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C (ii) a Performing matrix multiplication on B and A, and calculating by a softmax layer to obtain a spatial attention feature map M HW×HW Expressed as:
Figure FDA0003633476440000026
wherein M is ji Representing the relation between the ith position and the jth position in the characteristic diagram, and H and W respectively representing the characteristic diagram f 4 Length and width of (B) i Denotes the ith position, A, of the matrix B j Represents the jth position of matrix A; will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying by M and reshaping it into a feature map P consistent with the original feature map size C×H×W Multiplying the value by a scale parameter lambda, initializing the lambda to be 0, and continuously distributing more weights through learning, wherein the formula is as follows:
Figure FDA0003633476440000027
wherein, P j Denotes the jth position, C, of the feature map P i Represents the ith position of the matrix C;
step S32: feature map f generated by Stage4 of backbone network 4 C×H×W Respectively obtaining feature maps A through reshape C×HW Characteristic diagram B HW×C (ii) a Matrix multiplication is carried out on A and B, and a channel attention feature map N is obtained through calculation of a softmax layer C×C Expressed as:
Figure FDA0003633476440000031
wherein N is ji Representing the association of the ith location and the jth channel in the profile,
Figure FDA0003633476440000032
to represent
Figure FDA0003633476440000033
Will the characteristic diagram f 4 C×H×W Remodelling to C C×HW Multiplying the matrix N and reshaping the matrix N into a feature map Q consistent with the original feature map size C×H×W Multiplying it by a scale parameter μ, initializing μ to 0, and formulating as:
Figure FDA0003633476440000034
wherein Q j The jth position of the characteristic diagram Q is shown.
3. The method for segmenting the noisy image by being fused into the denoising module in stages according to claim 1, wherein: in S4, the method further includes:
step S41: for input image R 3×H×W Performing convolution operation with convolution kernel of 3 and padding of 1 for multiple times and adopting a downsampling mode of global average pooling to obtain a feature map R 4C×H/4×W/4 Acquiring low-level features of the noise image;
step S42: respectively convolving the low-level features of the noise image by the holes of different expansion factors (rate3, rate6 and rate9), and fusing multi-scale information to obtain a new feature map R 4C×H/4×W/4 And then the linear conversion formula is:
Figure FDA0003633476440000035
wherein α (·) and
Figure FDA0003633476440000036
for a linear transformation function, x represents a feature map R convolved with a hole 4C×H/4×W/4 ,stage represents a characteristic diagram of the backbone network output, S out Output representing last stage semantic segmentation
Figure FDA0003633476440000037
output is a feature graph output after fusion;
step S43: in the decoding stage of the denoising step, the upsampling adopts a deconvolution mode, and the linear conversion formula in the step S42 is applied in the last step of the decoding step.
CN202210497742.4A 2022-05-09 2022-05-09 Noise-containing image segmentation method by stage-by-stage merging with denoising module Pending CN114821580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210497742.4A CN114821580A (en) 2022-05-09 2022-05-09 Noise-containing image segmentation method by stage-by-stage merging with denoising module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210497742.4A CN114821580A (en) 2022-05-09 2022-05-09 Noise-containing image segmentation method by stage-by-stage merging with denoising module

Publications (1)

Publication Number Publication Date
CN114821580A true CN114821580A (en) 2022-07-29

Family

ID=82513898

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210497742.4A Pending CN114821580A (en) 2022-05-09 2022-05-09 Noise-containing image segmentation method by stage-by-stage merging with denoising module

Country Status (1)

Country Link
CN (1) CN114821580A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222630A (en) * 2022-08-09 2022-10-21 中国科学院自动化研究所 Image generation method, and training method and device of image denoising model
CN115578360A (en) * 2022-10-24 2023-01-06 电子科技大学 Multi-target semantic segmentation method for ultrasonic cardiogram

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084234A (en) * 2019-03-27 2019-08-02 东南大学 A kind of sonar image target identification method of Case-based Reasoning segmentation
CN112819705A (en) * 2021-01-13 2021-05-18 西安交通大学 Real image denoising method based on mesh structure and long-distance correlation
CN113808032A (en) * 2021-08-04 2021-12-17 北京交通大学 Multi-stage progressive image denoising algorithm
WO2022083026A1 (en) * 2020-10-21 2022-04-28 华中科技大学 Ultrasound image denoising model establishing method and ultrasound image denoising method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084234A (en) * 2019-03-27 2019-08-02 东南大学 A kind of sonar image target identification method of Case-based Reasoning segmentation
WO2022083026A1 (en) * 2020-10-21 2022-04-28 华中科技大学 Ultrasound image denoising model establishing method and ultrasound image denoising method
CN112819705A (en) * 2021-01-13 2021-05-18 西安交通大学 Real image denoising method based on mesh structure and long-distance correlation
CN113808032A (en) * 2021-08-04 2021-12-17 北京交通大学 Multi-stage progressive image denoising algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张蓉;赵昆淇;顾凯;: "基于卷积神经网络的道路图像语义分割", 计算机与数字工程, no. 07, 20 July 2020 (2020-07-20), pages 231 - 234 *
黄琳 等: ""多尺度多阶段特征融合的带噪图像语义分割"", 《计算机系统应用》, vol. 32, no. 3, 31 March 2023 (2023-03-31), pages 58 - 69 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115222630A (en) * 2022-08-09 2022-10-21 中国科学院自动化研究所 Image generation method, and training method and device of image denoising model
CN115578360A (en) * 2022-10-24 2023-01-06 电子科技大学 Multi-target semantic segmentation method for ultrasonic cardiogram
CN115578360B (en) * 2022-10-24 2023-12-26 电子科技大学 Multi-target semantic segmentation method for ultrasonic cardiac image

Similar Documents

Publication Publication Date Title
CN110111366B (en) End-to-end optical flow estimation method based on multistage loss
Tian et al. Deep learning on image denoising: An overview
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN111340844B (en) Multi-scale characteristic optical flow learning calculation method based on self-attention mechanism
CN109726627B (en) Neural network model training and universal ground wire detection method
CN114821580A (en) Noise-containing image segmentation method by stage-by-stage merging with denoising module
CN109166102A (en) It is a kind of based on critical region candidate fight network image turn image interpretation method
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN111091503A (en) Image out-of-focus blur removing method based on deep learning
CN114331886B (en) Image deblurring method based on depth features
CN111476133B (en) Unmanned driving-oriented foreground and background codec network target extraction method
JP6857369B2 (en) CNN learning method and learning device, test method and test device using it
CN113052775B (en) Image shadow removing method and device
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN111145102A (en) Synthetic aperture radar image denoising method based on convolutional neural network
CN111626134A (en) Dense crowd counting method, system and terminal based on hidden density distribution
CN114943894A (en) ConvCRF-based high-resolution remote sensing image building extraction optimization method
CN113673562A (en) Feature enhancement method, target segmentation method, device and storage medium
CN112633429A (en) Method for recognizing handwriting choice questions of students
CN117542045B (en) Food identification method and system based on space-guided self-attention
CN117934308A (en) Lightweight self-supervision monocular depth estimation method based on graph convolution network
WO2020093210A1 (en) Scene segmentation method and system based on contenxtual information guidance
CN110580712B (en) Improved CFNet video target tracking method using motion information and time sequence information
CN110598614B (en) Related filtering target tracking method combined with particle filtering
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination