CN111680695A - Semantic segmentation method based on reverse attention model - Google Patents
Semantic segmentation method based on reverse attention model Download PDFInfo
- Publication number
- CN111680695A CN111680695A CN202010513903.5A CN202010513903A CN111680695A CN 111680695 A CN111680695 A CN 111680695A CN 202010513903 A CN202010513903 A CN 202010513903A CN 111680695 A CN111680695 A CN 111680695A
- Authority
- CN
- China
- Prior art keywords
- model
- attention
- output
- semantic segmentation
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002441 reversible effect Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 34
- 238000011176 pooling Methods 0.000 claims description 19
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 8
- 238000009826 distribution Methods 0.000 claims description 6
- 238000009499 grossing Methods 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 6
- 238000009827 uniform distribution Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 5
- 238000003709 image segmentation Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a semantic segmentation method based on a reverse attention model; firstly, acquiring an image data set, and constructing a training set and a test set; constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model; inputting the features output by the basic network into a reverse attention model to calculate an attention view, respectively reacting the attention view on the low-level output features of the basic semantic segmentation network step by step, and fusing the attention view with the output features of the basic network and the up-sampling features of the output features of the basic semantic segmentation network to obtain a final segmentation result; the model only uses the output features of the basic semantic segmentation network to calculate the attention view and guides the low-level features to be merged into the output features of the basic semantic segmentation network, so that the noise in the low-level features of the model is suppressed, and the robustness and the segmentation precision of the semantic segmentation model are improved; meanwhile, a Gumbel softmax-based loss function is added to the high-level output characteristics of the basic semantic segmentation model so as to accelerate the model training speed.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a semantic segmentation method based on a reverse attention model.
Background
In recent years, deep learning has been rapidly developed, and a deep learning model represented by a Convolutional Neural Network (CNN) has reignited a neural network at a time of silence, which raises a wave of deep learning in academia and industry.
In order to solve the problem that the DNN-based segmentation model is limited in that the size of an input image must be fixed, Long and Shelhamer of the university of berkeley propose a Full Convolution Network (FCN) for semantic segmentation of an image, and end-to-end semantic segmentation is realized by using convolution instead of a full connected layer and by mapping a dense prediction (dense prediction) image output by the network onto an original image by using techniques such as deconvolution and upsampling, and the DNN model can process images of any size. The increase of the receptive field is an important factor for acquiring the semantic information of the image, but multiple downsampling easily causes the problems of image detail loss, boundary offset and the like. On the basis, a Deeplab V2 model, a Deeplab V3 model, a Deeplab V3+ model, a PSPNet model, a U-net model and other improved models thereof are successively proposed, improvement is carried out on a model architecture, an upsampling strategy, a receptive field size and the like, and particularly, by introducing a cavity convolution technology into a Deeplab series, the segmentation precision is effectively improved.
At present, semantic segmentation methods based on deep learning are developed based on the idea of the full convolution network, and the segmentation accuracy is greatly improved, but most of algorithms are proposed based on a public data set, and a plurality of small targets or complex scenes are often contained in an actual scene image, so that the existing model is challenged. In recent years, researchers have applied Attention models (Attention models) to convolutional neural network models in an attempt to extract accurate pixel-level Attention features from the high-level features of CNNs to improve the segmentation effect from another perspective. The attention model in deep learning actually simulates the attention model of the human brain, and a larger weight is assigned to the concerned object. Li et al propose a pyramid attention network that uses global context information of an image to solve the semantic segmentation problem, and combines an attention mechanism with a spatial pyramid (spatial pyramid) to extract accurate and dense features and obtain pixel labels. Fu et al propose a scene segmentation network integrating a two-way attention mechanism. The latest semantic segmentation framework is a Gated Shape CNNs (GSCNN) network proposed by Takikawa et al, which converts information in a normal semantic network stream into Shape information of an object by using a Gated relational Layer network, so as to add a Shape stream in a typical network architecture, that is, the Gated relational Layer combines a Shape stream and a regular stream, and finally obtains a final segmentation result.
However, this method increases the number of model learning parameters and is complicated. In addition, cross attention models and convolution block attention models are widely used.
Disclosure of Invention
The invention aims to provide a semantic segmentation method based on a reverse attention model, which is used for improving the performance of image semantic segmentation.
In order to solve the technical problems, the technical scheme of the invention is as follows: a semantic segmentation method based on a reverse attention model comprises the following steps:
(1) acquiring an image data set, and constructing a training set and a test set;
(2) constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model;
the basic network model comprises a plurality of convolution modules and an ASPP output module which are sequentially connected, wherein the ASPP output module is used for outputting the output characteristics of the basic network model;
the processing procedure of the reverse attention model is as follows:
1) obtaining output characteristics after dimensionality reduction by passing the output characteristics of the basic network model through a convolutional layer, inputting the output characteristics into an attention calculation model to obtain a first attention view, performing point multiplication on the output characteristics after dimensionality reduction and the first attention view, and then overlapping the output characteristics after dimensionality reduction to obtain first output characteristics;
2) the method comprises the steps of up-sampling a first output feature, obtaining at least two features with different scales, calculating an attention view for the feature with each scale, performing point multiplication on the obtained attention views with different scales and a feature map of a basic network model respectively, overlapping the result after point multiplication with the feature with the corresponding scale respectively to obtain output feature maps with different scales, and fusing the output feature maps with different scales and the first output feature to obtain an output result;
(3) inputting a training set into the deep semantic segmentation network model for training to obtain a trained deep semantic segmentation network model;
(4) and inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
The invention has the beneficial effects that:
according to the invention, the attention view iteration of the high-level features of the model is reacted to the low-level features of the model to improve the precision of the semantic segmentation result. Compared with the traditional attention model in the semantic segmentation method, the self-attention model in the traditional method is a mode of calculating the attention view of the output feature of the current layer in the model and then applying the attention view to the output feature of the current layer, the method utilizes the semantic information in the reverse attention model to inhibit the noise in the low-layer feature of the model, namely the attention view is calculated by the last output feature of the basic model, the low-layer feature of the model is merged into the last output feature of the basic model by taking the last output feature as a guide, and the last output feature merges the semantic information of the high layer and the boundary information of the low layer. The reverse attention model in the invention can also accelerate the convergence process of the backward propagation process parameters of the deep convolutional neural network model. In addition, the difference between the model prediction boundary and the marked image boundary is calculated by using a Gumbel softmax-based loss function, so that the deep convolutional neural network model is guided to pay more attention to the boundary information of the image, and the model training speed is accelerated.
Further, the loss functions adopted by the deep semantic segmentation network model comprise a cross entropy loss function and a Gumbel softmax-based loss function.
Further, the attention calculation model adopts a combined way of channel attention and spatial attention, namely: m (f) ═ σ (M)c(F)+Ms(F)),
Wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
wherein MLP denotes multi-layer perceptron, i.e. fully connected; AvgPool is the average pooling layer, BN is the batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is a channelScaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are convolution kernel sizes.
Further, the calculation process of the loss function based on Gumbel softmax is as follows:
1) the first output feature is a feature Y with semantic segmentation class number through the output dimension of the convolutional layerN×cEach sample i (y) in the matrixi=[yi1,...,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c;
2) Calculated noise is Gi=-log(-log(∈i));
3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution:
vi=[yi1+G1,...,yic+Gc];
4) calculating the output characteristic probability size through a softmax function so as to obtain a class approximate to a one-hot form:
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjThe Gumbel distribution obtained by adding noise to the sample y;
5) to sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary informationThen, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried outAnd L between B1Paradigm, loss of the last layer of the underlying network modelA loss function.
Further, when the basic network model is the deep lambv 2 based on VGG16, the basic network model includes five feature extraction blocks and an ASPP module, and the five feature extraction blocks sequentially include a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module, and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; the ASPP module is of a pyramid structure with cavity convolution.
Further, the first block convolution module in the VGG16 includes 2 convolution layers of 3 × 3, and the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; and the output of the fifth convolution module is connected with the ASPP module.
Further, the depth semantic segmentation model is a deplab v3 model based on VGG 16.
Drawings
FIG. 1 is a schematic diagram of the semantic segmentation method of the reverse attention model based on Deeplab V2 in the present invention.
Detailed Description
For purposes of illustrating the objects, aspects and advantages of the present invention in detail, the present invention is further described in detail below with reference to specific implementation steps and the accompanying drawings.
The invention provides a semantic segmentation method based on a reverse attention model, which introduces the reverse attention model in a common full convolution network (CNN) semantic segmentation model, reacts an attention view of a high-level output feature of the model on a low-level feature of the model, performs multi-feature fusion, maintains boundary information in a segmentation result, filters partial noise information and improves the precision of the semantic segmentation result.
According to the invention, because a Gumbel softmax-based loss function is added to the characteristics of the last layer of output characteristics of the basic semantic segmentation model after attention self-enhancement, because Gumbel softmax is more similar to a one-hot type classification result, the boundary error can be calculated through the loss function, and the speed of model training parameter convergence can be accelerated.
Specifically, the semantic segmentation method of the present invention is described below by taking the deplabv 2 network architecture as an example.
It should be noted that the deep semantic segmentation network model in the application is a network model based on a network architecture of a traditional classical semantic segmentation model; the underlying network model architecture can be VGG16 or ResNet, etc.
As shown in fig. 1, taking image data in a horizontal warehouse as an example, and constructing a deep semantic segmentation network model based on a deep semantic segmentation V2 network model, where the deep semantic segmentation network model includes a VGG16 feature extraction module, an ASPP module, a reverse attention model, a cross entropy loss function, a loss function based on gumbelsoft max, and the like; the VGG16 network architecture comprises five feature extraction blocks, each convolution module comprises 2-3 convolution layers, each convolution layer in each convolution module is followed by a nonlinear corresponding ReLU layer, and each convolution module is followed by a pooling layer; the convolution layers of the fourth convolution module and the fifth convolution module are void convolution; the aspp (advanced Spatial Pyramid) module has a Pyramid structure with a hollow convolution.
Specifically, the semantic segmentation method of the present embodiment includes the following steps:
wherein the basic network model is a Deeplab V2 model based on VGG16 or a Deeplab V3 model based on VGG 16;
in this embodiment, taking the deep V2 model of VGG16 as an example, the basic network model includes five feature extraction blocks and an ASPP output module, where the five feature extraction blocks sequentially include a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module, and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; wherein the first block convolution module comprises 2 convolution layers of 3 × 3, and the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; the output of the fifth convolution module is connected with an ASPP output module; the ASPP output module is of a pyramid structure with a cavity convolution and is used for outputting the output characteristics of the basic network module.
The processing procedure of the reverse attention model is as follows:
1) reducing the output characteristics of the last layer (i.e. the ASPP layer in FIG. 1) in the basic network model by the convolution layer of 1 × 1 to obtain characteristics FhInputting the attention calculation model to obtain a first attention view M (F)0 h) Reaction on feature FhTo obtain
2) By outputting a characteristic F0For high-level features, two attention views A with different scales are respectively calculatedi(i ═ 1,2) and reacts to the two lower-level output features F of the underlying network modell i(output characteristics of i-th layer), and re-summing the processed characteristics0And feature fusion through scale change, the fused features pass through two convolutional layers of 3 × 3 and one convolutional layer of 1 × 1 to obtain an output result,
if F is to be mentionedl iAnd F0Are different, F may be interpolated using an upsampling operation0Upsampling, using a 3 × 3 convolutional layer to convert Fl iThe number of channels is decreased to a sum F0The same number of channels, the calculation being based on F0Attention view of (A)iReacting it with Fl iAnd F0The fusion process is as follows:where ⊙ denotes the dot product of the elements,represents an addition of elements;
the attention calculation model in this embodiment adopts a combination of channel attention and spatial attention, that is: m (f) ═ σ (M)c(F)+Ms(F) Wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
where MLP denotes multilayer perceptron (i.e. fully connected), AvgPool is average pooling layer, BN is batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is the channel scaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are rollsSize of the nuclei.
The bottom-layer output characteristics in this embodiment are the outputs of the first block convolution module and the second block convolution module (as shown in fig. 1), that is, the output characteristics can be obtained according to the above method
3) Will be provided withF0Connecting and outputting final characteristics through two 3 × 3 × 256, one 1 × 1 × C convolutional layers and an upsampling layer;
it should be noted that, when two layers of output features in the lower layer of the basic network model are merged into the higher layer features, the convolution operation is performed on each branch output feature, so that the number of feature channels is reduced, the operation complexity is reduced, and the influence of noise in the lower layer features is reduced.
it should be noted that, in the model training process, the adopted loss functions include a cross entropy loss function and a loss function based on Gumbel softmax;
specifically, the calculation process of the loss function based on Gumbel softmax is as follows:
1) aiming at the first output feature of the last layer of the basic network model, the feature of which the output dimension is the semantic segmentation class number through the convolution layer is YN×cWhere N is the product of the length and width of the feature matrix, in which each sample i (y) isi=[yi1,...,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c...;
2) Calculated noise is Gi=-log(-log(∈i));
3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution: v. ofi=[yi1+G1,...,yic+Gc];
4) Calculating the output characteristic probability size through a softmax function so as to obtain a final class approximate to a one-hot form:
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjIs the Gumbel distribution obtained after adding noise to the sample y.
5) To sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary informationThen, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried outAnd B as a loss function of the last layer of the underlying network model.
Wherein the cross entropy loss function of a sample is expressed as:
wherein the content of the first and second substances,is a true tag, y, of a sample x in the one-hot formiAnd (3) a probability value output by the model and subjected to softmax, wherein i is the ith item in the vector, and the final cross entropy loss is the average value of sample loss values in all batch processing.
In the embodiment, the reverse attention model is adopted, so that the result obtained by the semantic segmentation method is fused with the semantic information and the boundary information, and the boundary loss function based on Gumbel softmax is used, so that the model training can be guided, and the speed of model parameter convergence is accelerated.
It is worth noting that the reverse attention model proposed by the invention can be applied to one or more layers of the lower layers of the model, and the strategy can be applied to any basic semantic segmentation model.
And 4, inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
In order to verify the superiority of the method, a series of comparative test results based on a reverse attention model (BA for short) are designed on a Deeplab V2+ VGG 16; and adopting semantic segmentation field evaluation criteria: IoU and F1 evaluated the results of the segmentation.
The data of the method of the invention respectively adopts two database sets: the first image database is an image set in the one-storey barn; the second image database is an image data set using the VOC2012 database to verify the method of the present invention.
Firstly, testing an internal diagram of a single-storey barn:
(1) acquiring 120 images in the bin as a training image set, and manually marking the acquired images to acquire a group channel; using 20 pictures as a verification image set;
the size of each collected image is 1080 × 1920, and the images are shot under different illumination and preset angles.
(2) And enhancing the marked training image set to obtain a training image set required by the subsequent deep convolutional neural network model training.
The method for enhancing the training image set comprises the steps of intercepting an interested area with a specified size by adopting a certain step length, adjusting image Gamma correction parameters, zooming an image, turning the image, rotating the image by no more than +/-10 degrees relative to an original image, increasing Gaussian noise and the like.
The original training image and the marked image are changed simultaneously in the training image enhancement process, and if interpolation operation exists in the marked image, nearest neighbor interpolation is selected.
Specifically, for any training image and its labeled image in the horizontal warehouse, an area of interest with a specified size is intercepted from the upper left corner of the image in a certain step length, operations such as inversion, Gamma parameter adjustment, rotation within +/-10 degrees, addition of Gaussian noise and the like are respectively carried out on the area, each operation has a corresponding parameter set, and finally training image sets close to the target number are generated, wherein the final training image sets are 5600 and the verification image sets are 120.
In the method, the original training image cannot be rotated in a large angle because the grain surface and the grain containing line have a certain semantic context relationship.
Table 1 shows the comparative test results of the present invention on the image set in the single-storey barn based on the reverse attention model (BA for short) on deepab V2+ VGG 16. As can be seen from table 1, the segmentation performance of the original model can be effectively improved by adding the reverse attention model to the original model.
TABLE 1 evaluation of image set training results in one-storey barn
Test based on VOC2012 database:
the labeling image set comprises 12081 images and marking images, and is divided into a training image set and a verification image set, wherein the training image set comprises 10582 images, and the verification image set comprises 1499 images.
Table 2 shows the test result of the invention on the VOC2012 data set, and the deepab V2+ VGG16 is used, from which it can be seen that the invention can effectively improve the segmentation performance of the original model by adding the gumblesoftmax attention model to the original model.
TABLE 2 VOC2012 image set training results evaluation
Therefore, the semantic segmentation method based on the reverse attention model further improves the segmentation performance of the image.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.
Claims (7)
1. A semantic segmentation method based on a reverse attention model is characterized by comprising the following steps:
(1) acquiring an image data set, and constructing a training set and a test set;
(2) constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model;
the basic network model comprises a plurality of convolution modules and an ASPP output module which are sequentially connected, wherein the ASPP output module is used for outputting the output characteristics of the basic network model;
the processing procedure of the reverse attention model is as follows:
1) obtaining output characteristics after dimensionality reduction by passing the output characteristics of the basic network model through a convolutional layer, inputting the output characteristics into an attention calculation model to obtain a first attention view, performing point multiplication on the output characteristics after dimensionality reduction and the first attention view, and then overlapping the output characteristics after dimensionality reduction to obtain first output characteristics;
2) the method comprises the steps of up-sampling a first output feature, obtaining at least two features with different scales, calculating an attention view for the feature with each scale, performing point multiplication on the obtained attention views with different scales and a feature map of a basic network model respectively, overlapping the result after point multiplication with the feature with the corresponding scale respectively to obtain output feature maps with different scales, and fusing the output feature maps with different scales and the first output feature to obtain an output result;
(3) inputting a training set into the deep semantic segmentation network model for training to obtain a trained deep semantic segmentation network model;
(4) and inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
2. The inverse attention model-based semantic segmentation method according to claim 1, wherein the loss functions adopted by the deep semantic segmentation network model comprise a cross entropy loss function and a Gumbel softmax-based loss function.
3. The inverse attention model-based semantic segmentation method according to claim 1, wherein the attention calculation model adopts a combination of channel attention and spatial attention, that is:
M(F)=σ(Mc(F)+Ms(F)),
wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
wherein MLP denotes multi-layer perceptron, i.e. fully connected; AvgPool is the average pooling layer, BN is the batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is the channel scaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are convolution kernel sizes.
4. The inverse attention model-based semantic segmentation method as set forth in claim 2, wherein the Gumbel softmax-based loss function calculation procedure is as follows:
(1) the first output feature is a feature Y with semantic segmentation class number through the output dimension of the convolutional layerN×cEach sample i (y) in the matrixi=[yi1,…,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c;
(2) Calculated noise is Gi=-log(-log(∈i));
(3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution: v. ofi=[yi1+G1,...,yic+Gc];
(4) Calculating the output characteristic probability size through a softmax function so as to obtain a class approximate to a one-hot form:
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjThe Gumbel distribution obtained by adding noise to the sample y;
(5) to sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary informationThen, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried outAnd L between B1And the paradigm is used as a loss function of the last layer of the basic network model.
5. The reverse attention model-based semantic segmentation method according to claim 1, wherein when the basic network model is the deep ladder v2 based on VGG16, the basic network model comprises five feature extraction blocks and an ASPP module, and the five feature extraction blocks sequentially comprise a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; the ASPP module is of a pyramid structure with cavity convolution.
6. The attention model-based semantic segmentation method of claim 5 wherein the first block convolution module in VGG16 includes 2 3 x 3 convolution layers, the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; and the output of the fifth convolution module is connected with the ASPP module.
7. The inverse attention model-based semantic segmentation method according to claim 1, wherein the deep semantic segmentation model is a deplab v3 model based on VGG 16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010513903.5A CN111680695A (en) | 2020-06-08 | 2020-06-08 | Semantic segmentation method based on reverse attention model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010513903.5A CN111680695A (en) | 2020-06-08 | 2020-06-08 | Semantic segmentation method based on reverse attention model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111680695A true CN111680695A (en) | 2020-09-18 |
Family
ID=72454054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010513903.5A Pending CN111680695A (en) | 2020-06-08 | 2020-06-08 | Semantic segmentation method based on reverse attention model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111680695A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381097A (en) * | 2020-11-16 | 2021-02-19 | 西南石油大学 | Scene semantic segmentation method based on deep learning |
CN112488115A (en) * | 2020-11-23 | 2021-03-12 | 石家庄铁路职业技术学院 | Semantic segmentation method based on two-stream architecture |
CN112580654A (en) * | 2020-12-25 | 2021-03-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Semantic segmentation method for ground objects of remote sensing image |
CN112613517A (en) * | 2020-12-17 | 2021-04-06 | 深圳大学 | Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium |
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN113052860A (en) * | 2021-04-02 | 2021-06-29 | 首都师范大学 | Three-dimensional cerebral vessel segmentation method and storage medium |
CN113298154A (en) * | 2021-05-27 | 2021-08-24 | 安徽大学 | RGB-D image salient target detection method |
CN113392711A (en) * | 2021-05-19 | 2021-09-14 | 中国科学院声学研究所南海研究站 | Smoke semantic segmentation method and system based on high-level semantics and noise suppression |
CN113435411A (en) * | 2021-07-26 | 2021-09-24 | 中国矿业大学(北京) | Improved DeepLabV3+ based open pit land utilization identification method |
CN113486897A (en) * | 2021-07-29 | 2021-10-08 | 辽宁工程技术大学 | Semantic segmentation method for convolution attention mechanism up-sampling decoding |
CN113537228A (en) * | 2021-07-07 | 2021-10-22 | 中国电子科技集团公司第五十四研究所 | Real-time image semantic segmentation method based on depth features |
CN113643311A (en) * | 2021-06-28 | 2021-11-12 | 清华大学 | Image segmentation method and device for boundary error robustness |
CN114140437A (en) * | 2021-12-03 | 2022-03-04 | 杭州电子科技大学 | Fundus hard exudate segmentation method based on deep learning |
CN114140469A (en) * | 2021-12-02 | 2022-03-04 | 北京交通大学 | Depth hierarchical image semantic segmentation method based on multilayer attention |
CN115587967A (en) * | 2022-09-06 | 2023-01-10 | 杭州电子科技大学 | Fundus image optic disk detection method based on HA-UNet network |
CN117079142A (en) * | 2023-10-13 | 2023-11-17 | 昆明理工大学 | Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle |
CN117236433A (en) * | 2023-11-14 | 2023-12-15 | 山东大学 | Intelligent communication perception method, system, equipment and medium for assisting blind person life |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
CN110458165A (en) * | 2019-08-14 | 2019-11-15 | 贵州大学 | A kind of natural scene Method for text detection introducing attention mechanism |
US20200134380A1 (en) * | 2018-10-30 | 2020-04-30 | Beijing Horizon Robotics Technology Research And Development Co., Ltd. | Method for Updating Neural Network and Electronic Device |
-
2020
- 2020-06-08 CN CN202010513903.5A patent/CN111680695A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200134380A1 (en) * | 2018-10-30 | 2020-04-30 | Beijing Horizon Robotics Technology Research And Development Co., Ltd. | Method for Updating Neural Network and Electronic Device |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN110458165A (en) * | 2019-08-14 | 2019-11-15 | 贵州大学 | A kind of natural scene Method for text detection introducing attention mechanism |
Non-Patent Citations (1)
Title |
---|
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12, 10 December 2019 (2019-12-10), pages 3496 - 3502 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112381097A (en) * | 2020-11-16 | 2021-02-19 | 西南石油大学 | Scene semantic segmentation method based on deep learning |
CN112488115A (en) * | 2020-11-23 | 2021-03-12 | 石家庄铁路职业技术学院 | Semantic segmentation method based on two-stream architecture |
CN112488115B (en) * | 2020-11-23 | 2023-07-25 | 石家庄铁路职业技术学院 | Semantic segmentation method based on two-stream architecture |
CN112613517A (en) * | 2020-12-17 | 2021-04-06 | 深圳大学 | Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium |
CN112613517B (en) * | 2020-12-17 | 2022-02-18 | 深圳大学 | Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium |
CN112580654A (en) * | 2020-12-25 | 2021-03-30 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Semantic segmentation method for ground objects of remote sensing image |
CN112801104B (en) * | 2021-01-20 | 2022-01-07 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN112801104A (en) * | 2021-01-20 | 2021-05-14 | 吉林大学 | Image pixel level pseudo label determination method and system based on semantic segmentation |
CN113052860A (en) * | 2021-04-02 | 2021-06-29 | 首都师范大学 | Three-dimensional cerebral vessel segmentation method and storage medium |
CN113052860B (en) * | 2021-04-02 | 2022-07-19 | 首都师范大学 | Three-dimensional cerebral vessel segmentation method and storage medium |
CN113392711A (en) * | 2021-05-19 | 2021-09-14 | 中国科学院声学研究所南海研究站 | Smoke semantic segmentation method and system based on high-level semantics and noise suppression |
CN113392711B (en) * | 2021-05-19 | 2023-01-06 | 中国科学院声学研究所南海研究站 | Smoke semantic segmentation method and system based on high-level semantics and noise suppression |
CN113298154B (en) * | 2021-05-27 | 2022-11-11 | 安徽大学 | RGB-D image salient object detection method |
CN113298154A (en) * | 2021-05-27 | 2021-08-24 | 安徽大学 | RGB-D image salient target detection method |
CN113643311B (en) * | 2021-06-28 | 2024-04-09 | 清华大学 | Image segmentation method and device with robust boundary errors |
CN113643311A (en) * | 2021-06-28 | 2021-11-12 | 清华大学 | Image segmentation method and device for boundary error robustness |
CN113537228A (en) * | 2021-07-07 | 2021-10-22 | 中国电子科技集团公司第五十四研究所 | Real-time image semantic segmentation method based on depth features |
CN113435411A (en) * | 2021-07-26 | 2021-09-24 | 中国矿业大学(北京) | Improved DeepLabV3+ based open pit land utilization identification method |
CN113486897A (en) * | 2021-07-29 | 2021-10-08 | 辽宁工程技术大学 | Semantic segmentation method for convolution attention mechanism up-sampling decoding |
CN114140469A (en) * | 2021-12-02 | 2022-03-04 | 北京交通大学 | Depth hierarchical image semantic segmentation method based on multilayer attention |
CN114140469B (en) * | 2021-12-02 | 2023-06-23 | 北京交通大学 | Depth layered image semantic segmentation method based on multi-layer attention |
CN114140437A (en) * | 2021-12-03 | 2022-03-04 | 杭州电子科技大学 | Fundus hard exudate segmentation method based on deep learning |
CN115587967A (en) * | 2022-09-06 | 2023-01-10 | 杭州电子科技大学 | Fundus image optic disk detection method based on HA-UNet network |
CN115587967B (en) * | 2022-09-06 | 2023-10-10 | 杭州电子科技大学 | Fundus image optic disk detection method based on HA-UNet network |
CN117079142A (en) * | 2023-10-13 | 2023-11-17 | 昆明理工大学 | Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle |
CN117079142B (en) * | 2023-10-13 | 2024-01-26 | 昆明理工大学 | Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle |
CN117236433A (en) * | 2023-11-14 | 2023-12-15 | 山东大学 | Intelligent communication perception method, system, equipment and medium for assisting blind person life |
CN117236433B (en) * | 2023-11-14 | 2024-02-02 | 山东大学 | Intelligent communication perception method, system, equipment and medium for assisting blind person life |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111680695A (en) | Semantic segmentation method based on reverse attention model | |
CN108509978B (en) | Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion | |
CN110135366B (en) | Shielded pedestrian re-identification method based on multi-scale generation countermeasure network | |
CN107945204B (en) | Pixel-level image matting method based on generation countermeasure network | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
CN109886066A (en) | Fast target detection method based on the fusion of multiple dimensioned and multilayer feature | |
CN110533041B (en) | Regression-based multi-scale scene text detection method | |
CN109035267B (en) | Image target matting method based on deep learning | |
CN109743642B (en) | Video abstract generation method based on hierarchical recurrent neural network | |
CN113763442A (en) | Deformable medical image registration method and system | |
CN115082293A (en) | Image registration method based on Swin transducer and CNN double-branch coupling | |
CN115731441A (en) | Target detection and attitude estimation method based on data cross-modal transfer learning | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN110930378A (en) | Emphysema image processing method and system based on low data demand | |
CN114898284B (en) | Crowd counting method based on feature pyramid local difference attention mechanism | |
CN112884668A (en) | Lightweight low-light image enhancement method based on multiple scales | |
CN116824239A (en) | Image recognition method and system based on transfer learning and ResNet50 neural network | |
CN111652273A (en) | Deep learning-based RGB-D image classification method | |
CN111860683A (en) | Target detection method based on feature fusion | |
CN115908772A (en) | Target detection method and system based on Transformer and fusion attention mechanism | |
CN114267025A (en) | Traffic sign detection method based on high-resolution network and light-weight attention mechanism | |
CN114998566A (en) | Interpretable multi-scale infrared small and weak target detection network design method | |
CN114359297A (en) | Attention pyramid-based multi-resolution semantic segmentation method and device | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN115049945B (en) | Unmanned aerial vehicle image-based wheat lodging area extraction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |