CN111680695A - Semantic segmentation method based on reverse attention model - Google Patents

Semantic segmentation method based on reverse attention model Download PDF

Info

Publication number
CN111680695A
CN111680695A CN202010513903.5A CN202010513903A CN111680695A CN 111680695 A CN111680695 A CN 111680695A CN 202010513903 A CN202010513903 A CN 202010513903A CN 111680695 A CN111680695 A CN 111680695A
Authority
CN
China
Prior art keywords
model
attention
output
semantic segmentation
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010513903.5A
Other languages
Chinese (zh)
Inventor
李磊
董卓莉
费选
母亚双
李卫东
王贵财
石帅锋
李铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan University of Technology
Original Assignee
Henan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan University of Technology filed Critical Henan University of Technology
Priority to CN202010513903.5A priority Critical patent/CN111680695A/en
Publication of CN111680695A publication Critical patent/CN111680695A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a semantic segmentation method based on a reverse attention model; firstly, acquiring an image data set, and constructing a training set and a test set; constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model; inputting the features output by the basic network into a reverse attention model to calculate an attention view, respectively reacting the attention view on the low-level output features of the basic semantic segmentation network step by step, and fusing the attention view with the output features of the basic network and the up-sampling features of the output features of the basic semantic segmentation network to obtain a final segmentation result; the model only uses the output features of the basic semantic segmentation network to calculate the attention view and guides the low-level features to be merged into the output features of the basic semantic segmentation network, so that the noise in the low-level features of the model is suppressed, and the robustness and the segmentation precision of the semantic segmentation model are improved; meanwhile, a Gumbel softmax-based loss function is added to the high-level output characteristics of the basic semantic segmentation model so as to accelerate the model training speed.

Description

Semantic segmentation method based on reverse attention model
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a semantic segmentation method based on a reverse attention model.
Background
In recent years, deep learning has been rapidly developed, and a deep learning model represented by a Convolutional Neural Network (CNN) has reignited a neural network at a time of silence, which raises a wave of deep learning in academia and industry.
In order to solve the problem that the DNN-based segmentation model is limited in that the size of an input image must be fixed, Long and Shelhamer of the university of berkeley propose a Full Convolution Network (FCN) for semantic segmentation of an image, and end-to-end semantic segmentation is realized by using convolution instead of a full connected layer and by mapping a dense prediction (dense prediction) image output by the network onto an original image by using techniques such as deconvolution and upsampling, and the DNN model can process images of any size. The increase of the receptive field is an important factor for acquiring the semantic information of the image, but multiple downsampling easily causes the problems of image detail loss, boundary offset and the like. On the basis, a Deeplab V2 model, a Deeplab V3 model, a Deeplab V3+ model, a PSPNet model, a U-net model and other improved models thereof are successively proposed, improvement is carried out on a model architecture, an upsampling strategy, a receptive field size and the like, and particularly, by introducing a cavity convolution technology into a Deeplab series, the segmentation precision is effectively improved.
At present, semantic segmentation methods based on deep learning are developed based on the idea of the full convolution network, and the segmentation accuracy is greatly improved, but most of algorithms are proposed based on a public data set, and a plurality of small targets or complex scenes are often contained in an actual scene image, so that the existing model is challenged. In recent years, researchers have applied Attention models (Attention models) to convolutional neural network models in an attempt to extract accurate pixel-level Attention features from the high-level features of CNNs to improve the segmentation effect from another perspective. The attention model in deep learning actually simulates the attention model of the human brain, and a larger weight is assigned to the concerned object. Li et al propose a pyramid attention network that uses global context information of an image to solve the semantic segmentation problem, and combines an attention mechanism with a spatial pyramid (spatial pyramid) to extract accurate and dense features and obtain pixel labels. Fu et al propose a scene segmentation network integrating a two-way attention mechanism. The latest semantic segmentation framework is a Gated Shape CNNs (GSCNN) network proposed by Takikawa et al, which converts information in a normal semantic network stream into Shape information of an object by using a Gated relational Layer network, so as to add a Shape stream in a typical network architecture, that is, the Gated relational Layer combines a Shape stream and a regular stream, and finally obtains a final segmentation result.
However, this method increases the number of model learning parameters and is complicated. In addition, cross attention models and convolution block attention models are widely used.
Disclosure of Invention
The invention aims to provide a semantic segmentation method based on a reverse attention model, which is used for improving the performance of image semantic segmentation.
In order to solve the technical problems, the technical scheme of the invention is as follows: a semantic segmentation method based on a reverse attention model comprises the following steps:
(1) acquiring an image data set, and constructing a training set and a test set;
(2) constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model;
the basic network model comprises a plurality of convolution modules and an ASPP output module which are sequentially connected, wherein the ASPP output module is used for outputting the output characteristics of the basic network model;
the processing procedure of the reverse attention model is as follows:
1) obtaining output characteristics after dimensionality reduction by passing the output characteristics of the basic network model through a convolutional layer, inputting the output characteristics into an attention calculation model to obtain a first attention view, performing point multiplication on the output characteristics after dimensionality reduction and the first attention view, and then overlapping the output characteristics after dimensionality reduction to obtain first output characteristics;
2) the method comprises the steps of up-sampling a first output feature, obtaining at least two features with different scales, calculating an attention view for the feature with each scale, performing point multiplication on the obtained attention views with different scales and a feature map of a basic network model respectively, overlapping the result after point multiplication with the feature with the corresponding scale respectively to obtain output feature maps with different scales, and fusing the output feature maps with different scales and the first output feature to obtain an output result;
(3) inputting a training set into the deep semantic segmentation network model for training to obtain a trained deep semantic segmentation network model;
(4) and inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
The invention has the beneficial effects that:
according to the invention, the attention view iteration of the high-level features of the model is reacted to the low-level features of the model to improve the precision of the semantic segmentation result. Compared with the traditional attention model in the semantic segmentation method, the self-attention model in the traditional method is a mode of calculating the attention view of the output feature of the current layer in the model and then applying the attention view to the output feature of the current layer, the method utilizes the semantic information in the reverse attention model to inhibit the noise in the low-layer feature of the model, namely the attention view is calculated by the last output feature of the basic model, the low-layer feature of the model is merged into the last output feature of the basic model by taking the last output feature as a guide, and the last output feature merges the semantic information of the high layer and the boundary information of the low layer. The reverse attention model in the invention can also accelerate the convergence process of the backward propagation process parameters of the deep convolutional neural network model. In addition, the difference between the model prediction boundary and the marked image boundary is calculated by using a Gumbel softmax-based loss function, so that the deep convolutional neural network model is guided to pay more attention to the boundary information of the image, and the model training speed is accelerated.
Further, the loss functions adopted by the deep semantic segmentation network model comprise a cross entropy loss function and a Gumbel softmax-based loss function.
Further, the attention calculation model adopts a combined way of channel attention and spatial attention, namely: m (f) ═ σ (M)c(F)+Ms(F)),
Wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
Figure BDA0002529283230000031
wherein MLP denotes multi-layer perceptron, i.e. fully connected; AvgPool is the average pooling layer, BN is the batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is a channelScaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are convolution kernel sizes.
Further, the calculation process of the loss function based on Gumbel softmax is as follows:
1) the first output feature is a feature Y with semantic segmentation class number through the output dimension of the convolutional layerN×cEach sample i (y) in the matrixi=[yi1,...,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c
2) Calculated noise is Gi=-log(-log(∈i));
3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution:
vi=[yi1+G1,...,yic+Gc];
4) calculating the output characteristic probability size through a softmax function so as to obtain a class approximate to a one-hot form:
Figure BDA0002529283230000041
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjThe Gumbel distribution obtained by adding noise to the sample y;
5) to sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary information
Figure BDA0002529283230000042
Then, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried out
Figure BDA0002529283230000043
And L between B1Paradigm, loss of the last layer of the underlying network modelA loss function.
Further, when the basic network model is the deep lambv 2 based on VGG16, the basic network model includes five feature extraction blocks and an ASPP module, and the five feature extraction blocks sequentially include a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module, and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; the ASPP module is of a pyramid structure with cavity convolution.
Further, the first block convolution module in the VGG16 includes 2 convolution layers of 3 × 3, and the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; and the output of the fifth convolution module is connected with the ASPP module.
Further, the depth semantic segmentation model is a deplab v3 model based on VGG 16.
Drawings
FIG. 1 is a schematic diagram of the semantic segmentation method of the reverse attention model based on Deeplab V2 in the present invention.
Detailed Description
For purposes of illustrating the objects, aspects and advantages of the present invention in detail, the present invention is further described in detail below with reference to specific implementation steps and the accompanying drawings.
The invention provides a semantic segmentation method based on a reverse attention model, which introduces the reverse attention model in a common full convolution network (CNN) semantic segmentation model, reacts an attention view of a high-level output feature of the model on a low-level feature of the model, performs multi-feature fusion, maintains boundary information in a segmentation result, filters partial noise information and improves the precision of the semantic segmentation result.
According to the invention, because a Gumbel softmax-based loss function is added to the characteristics of the last layer of output characteristics of the basic semantic segmentation model after attention self-enhancement, because Gumbel softmax is more similar to a one-hot type classification result, the boundary error can be calculated through the loss function, and the speed of model training parameter convergence can be accelerated.
Specifically, the semantic segmentation method of the present invention is described below by taking the deplabv 2 network architecture as an example.
It should be noted that the deep semantic segmentation network model in the application is a network model based on a network architecture of a traditional classical semantic segmentation model; the underlying network model architecture can be VGG16 or ResNet, etc.
As shown in fig. 1, taking image data in a horizontal warehouse as an example, and constructing a deep semantic segmentation network model based on a deep semantic segmentation V2 network model, where the deep semantic segmentation network model includes a VGG16 feature extraction module, an ASPP module, a reverse attention model, a cross entropy loss function, a loss function based on gumbelsoft max, and the like; the VGG16 network architecture comprises five feature extraction blocks, each convolution module comprises 2-3 convolution layers, each convolution layer in each convolution module is followed by a nonlinear corresponding ReLU layer, and each convolution module is followed by a pooling layer; the convolution layers of the fourth convolution module and the fifth convolution module are void convolution; the aspp (advanced Spatial Pyramid) module has a Pyramid structure with a hollow convolution.
Specifically, the semantic segmentation method of the present embodiment includes the following steps:
step 1, acquiring an image data set, and constructing a training set and a test set;
step 2, constructing a deep semantic segmentation network model, including a basic network model and a reverse attention model;
wherein the basic network model is a Deeplab V2 model based on VGG16 or a Deeplab V3 model based on VGG 16;
in this embodiment, taking the deep V2 model of VGG16 as an example, the basic network model includes five feature extraction blocks and an ASPP output module, where the five feature extraction blocks sequentially include a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module, and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; wherein the first block convolution module comprises 2 convolution layers of 3 × 3, and the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; the output of the fifth convolution module is connected with an ASPP output module; the ASPP output module is of a pyramid structure with a cavity convolution and is used for outputting the output characteristics of the basic network module.
The processing procedure of the reverse attention model is as follows:
1) reducing the output characteristics of the last layer (i.e. the ASPP layer in FIG. 1) in the basic network model by the convolution layer of 1 × 1 to obtain characteristics FhInputting the attention calculation model to obtain a first attention view M (F)0 h) Reaction on feature FhTo obtain
Figure BDA0002529283230000061
2) By outputting a characteristic F0For high-level features, two attention views A with different scales are respectively calculatedi(i ═ 1,2) and reacts to the two lower-level output features F of the underlying network modell i(output characteristics of i-th layer), and re-summing the processed characteristics0And feature fusion through scale change, the fused features pass through two convolutional layers of 3 × 3 and one convolutional layer of 1 × 1 to obtain an output result,
Figure BDA0002529283230000062
if F is to be mentionedl iAnd F0Are different, F may be interpolated using an upsampling operation0Upsampling, using a 3 × 3 convolutional layer to convert Fl iThe number of channels is decreased to a sum F0The same number of channels, the calculation being based on F0Attention view of (A)iReacting it with Fl iAnd F0The fusion process is as follows:
Figure BDA0002529283230000063
where ⊙ denotes the dot product of the elements,
Figure BDA0002529283230000064
represents an addition of elements;
the attention calculation model in this embodiment adopts a combination of channel attention and spatial attention, that is: m (f) ═ σ (M)c(F)+Ms(F) Wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
Figure BDA0002529283230000071
where MLP denotes multilayer perceptron (i.e. fully connected), AvgPool is average pooling layer, BN is batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is the channel scaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are rollsSize of the nuclei.
The bottom-layer output characteristics in this embodiment are the outputs of the first block convolution module and the second block convolution module (as shown in fig. 1), that is, the output characteristics can be obtained according to the above method
Figure BDA0002529283230000072
3) Will be provided with
Figure BDA0002529283230000073
F0Connecting and outputting final characteristics through two 3 × 3 × 256, one 1 × 1 × C convolutional layers and an upsampling layer;
it should be noted that, when two layers of output features in the lower layer of the basic network model are merged into the higher layer features, the convolution operation is performed on each branch output feature, so that the number of feature channels is reduced, the operation complexity is reduced, and the influence of noise in the lower layer features is reduced.
Step 3, training the deep semantic segmentation network model constructed by inputting the training set to obtain a trained deep semantic segmentation network model;
it should be noted that, in the model training process, the adopted loss functions include a cross entropy loss function and a loss function based on Gumbel softmax;
specifically, the calculation process of the loss function based on Gumbel softmax is as follows:
1) aiming at the first output feature of the last layer of the basic network model, the feature of which the output dimension is the semantic segmentation class number through the convolution layer is YN×cWhere N is the product of the length and width of the feature matrix, in which each sample i (y) isi=[yi1,...,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c...;
2) Calculated noise is Gi=-log(-log(∈i));
3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution: v. ofi=[yi1+G1,...,yic+Gc];
4) Calculating the output characteristic probability size through a softmax function so as to obtain a final class approximate to a one-hot form:
Figure BDA0002529283230000074
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjIs the Gumbel distribution obtained after adding noise to the sample y.
5) To sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary information
Figure BDA0002529283230000081
Then, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried out
Figure BDA0002529283230000082
And B as a loss function of the last layer of the underlying network model.
Wherein the cross entropy loss function of a sample is expressed as:
Figure BDA0002529283230000083
wherein the content of the first and second substances,
Figure BDA0002529283230000084
is a true tag, y, of a sample x in the one-hot formiAnd (3) a probability value output by the model and subjected to softmax, wherein i is the ith item in the vector, and the final cross entropy loss is the average value of sample loss values in all batch processing.
In the embodiment, the reverse attention model is adopted, so that the result obtained by the semantic segmentation method is fused with the semantic information and the boundary information, and the boundary loss function based on Gumbel softmax is used, so that the model training can be guided, and the speed of model parameter convergence is accelerated.
It is worth noting that the reverse attention model proposed by the invention can be applied to one or more layers of the lower layers of the model, and the strategy can be applied to any basic semantic segmentation model.
And 4, inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
In order to verify the superiority of the method, a series of comparative test results based on a reverse attention model (BA for short) are designed on a Deeplab V2+ VGG 16; and adopting semantic segmentation field evaluation criteria: IoU and F1 evaluated the results of the segmentation.
The data of the method of the invention respectively adopts two database sets: the first image database is an image set in the one-storey barn; the second image database is an image data set using the VOC2012 database to verify the method of the present invention.
Firstly, testing an internal diagram of a single-storey barn:
(1) acquiring 120 images in the bin as a training image set, and manually marking the acquired images to acquire a group channel; using 20 pictures as a verification image set;
the size of each collected image is 1080 × 1920, and the images are shot under different illumination and preset angles.
(2) And enhancing the marked training image set to obtain a training image set required by the subsequent deep convolutional neural network model training.
The method for enhancing the training image set comprises the steps of intercepting an interested area with a specified size by adopting a certain step length, adjusting image Gamma correction parameters, zooming an image, turning the image, rotating the image by no more than +/-10 degrees relative to an original image, increasing Gaussian noise and the like.
The original training image and the marked image are changed simultaneously in the training image enhancement process, and if interpolation operation exists in the marked image, nearest neighbor interpolation is selected.
Specifically, for any training image and its labeled image in the horizontal warehouse, an area of interest with a specified size is intercepted from the upper left corner of the image in a certain step length, operations such as inversion, Gamma parameter adjustment, rotation within +/-10 degrees, addition of Gaussian noise and the like are respectively carried out on the area, each operation has a corresponding parameter set, and finally training image sets close to the target number are generated, wherein the final training image sets are 5600 and the verification image sets are 120.
In the method, the original training image cannot be rotated in a large angle because the grain surface and the grain containing line have a certain semantic context relationship.
Table 1 shows the comparative test results of the present invention on the image set in the single-storey barn based on the reverse attention model (BA for short) on deepab V2+ VGG 16. As can be seen from table 1, the segmentation performance of the original model can be effectively improved by adding the reverse attention model to the original model.
TABLE 1 evaluation of image set training results in one-storey barn
Figure BDA0002529283230000091
Test based on VOC2012 database:
the labeling image set comprises 12081 images and marking images, and is divided into a training image set and a verification image set, wherein the training image set comprises 10582 images, and the verification image set comprises 1499 images.
Table 2 shows the test result of the invention on the VOC2012 data set, and the deepab V2+ VGG16 is used, from which it can be seen that the invention can effectively improve the segmentation performance of the original model by adding the gumblesoftmax attention model to the original model.
TABLE 2 VOC2012 image set training results evaluation
Figure BDA0002529283230000092
Therefore, the semantic segmentation method based on the reverse attention model further improves the segmentation performance of the image.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (7)

1. A semantic segmentation method based on a reverse attention model is characterized by comprising the following steps:
(1) acquiring an image data set, and constructing a training set and a test set;
(2) constructing a deep semantic segmentation network model, wherein the deep semantic segmentation network model comprises a basic network model and a reverse attention model;
the basic network model comprises a plurality of convolution modules and an ASPP output module which are sequentially connected, wherein the ASPP output module is used for outputting the output characteristics of the basic network model;
the processing procedure of the reverse attention model is as follows:
1) obtaining output characteristics after dimensionality reduction by passing the output characteristics of the basic network model through a convolutional layer, inputting the output characteristics into an attention calculation model to obtain a first attention view, performing point multiplication on the output characteristics after dimensionality reduction and the first attention view, and then overlapping the output characteristics after dimensionality reduction to obtain first output characteristics;
2) the method comprises the steps of up-sampling a first output feature, obtaining at least two features with different scales, calculating an attention view for the feature with each scale, performing point multiplication on the obtained attention views with different scales and a feature map of a basic network model respectively, overlapping the result after point multiplication with the feature with the corresponding scale respectively to obtain output feature maps with different scales, and fusing the output feature maps with different scales and the first output feature to obtain an output result;
(3) inputting a training set into the deep semantic segmentation network model for training to obtain a trained deep semantic segmentation network model;
(4) and inputting the test set into the trained deep semantic segmentation network model to obtain an image segmentation result.
2. The inverse attention model-based semantic segmentation method according to claim 1, wherein the loss functions adopted by the deep semantic segmentation network model comprise a cross entropy loss function and a Gumbel softmax-based loss function.
3. The inverse attention model-based semantic segmentation method according to claim 1, wherein the attention calculation model adopts a combination of channel attention and spatial attention, that is:
M(F)=σ(Mc(F)+Ms(F)),
wherein F ∈ RH×W×CFor input features, H is the length of the image, W is the width of the image, C is the number of channels, McCalculating a function for the channel attention, c is the attention of the channel, MsCalculating a function for spatial attention, s is spatial attention, and sigma is sigmoid function; mcAnd MsAre respectively defined as follows:
Mc(F)=BN(MLP(AvgPool(F)))
=BN(w1(w0AvgPool(F)+b0)+b1)
Figure FDA0002529283220000021
wherein MLP denotes multi-layer perceptron, i.e. fully connected; AvgPool is the average pooling layer, BN is the batch normalization, w0、w1As network weight, b0And b1As an offset parameter, w0∈RC/r×C、b0∈RC/r、w1∈RC×C/rAnd b1∈RCR is the channel scaling ratio, C is the number of channels; f. of0、f1、f2、f3For convolution operations, 1 × 1 and 3 × 3 are convolution kernel sizes.
4. The inverse attention model-based semantic segmentation method as set forth in claim 2, wherein the Gumbel softmax-based loss function calculation procedure is as follows:
(1) the first output feature is a feature Y with semantic segmentation class number through the output dimension of the convolutional layerN×cEach sample i (y) in the matrixi=[yi1,…,yic]) C independent samples ∈ each subject to a uniform distribution of U (0, 1) are generated1,...,∈c
(2) Calculated noise is Gi=-log(-log(∈i));
(3) Adding the randomly generated samples and the network model output characteristics Y to obtain Gumbel distribution: v. ofi=[yi1+G1,...,yic+Gc];
(4) Calculating the output characteristic probability size through a softmax function so as to obtain a class approximate to a one-hot form:
Figure FDA0002529283220000022
wherein tau is a temperature parameter, the output degree of the Gumbel softmax approximate to one-hot is controlled, the smaller the temperature coefficient value is, the more approximate to one-hot form the output result is, otherwise, the more approximate to uniform distribution is; v. ofiAnd vjThe Gumbel distribution obtained by adding noise to the sample y;
(5) to sigmaτ(vi) Performing Gaussian smoothing, calculating its gradient to obtain boundary information
Figure FDA0002529283220000023
Then, the marked image is converted into a one-hot form, Gaussian smoothing is carried out, gradient information B is calculated, and calculation is carried out
Figure FDA0002529283220000024
And L between B1And the paradigm is used as a loss function of the last layer of the basic network model.
5. The reverse attention model-based semantic segmentation method according to claim 1, wherein when the basic network model is the deep ladder v2 based on VGG16, the basic network model comprises five feature extraction blocks and an ASPP module, and the five feature extraction blocks sequentially comprise a first convolution module, a first pooling layer, a second convolution module, a second pooling layer, a third convolution module, a third pooling layer, a fourth convolution module, a fourth pooling layer, a fifth convolution module and a fifth pooling layer; each convolution module comprises 2-3 convolution layers, and the convolution layers of the fourth convolution module and the fifth convolution module are empty convolution; the ASPP module is of a pyramid structure with cavity convolution.
6. The attention model-based semantic segmentation method of claim 5 wherein the first block convolution module in VGG16 includes 2 3 x 3 convolution layers, the output dimension is 64; the second block of convolution modules includes 2 3 x 3 convolutional layers, the output dimension 128; the third convolution module includes 3 x 3 convolution layers, output dimension 256; the fourth convolution module includes 3 × 3 convolution layers, outputting dimension 512; the fifth convolution module includes 3 × 3 convolution layers, output dimension 512; and the output of the fifth convolution module is connected with the ASPP module.
7. The inverse attention model-based semantic segmentation method according to claim 1, wherein the deep semantic segmentation model is a deplab v3 model based on VGG 16.
CN202010513903.5A 2020-06-08 2020-06-08 Semantic segmentation method based on reverse attention model Pending CN111680695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010513903.5A CN111680695A (en) 2020-06-08 2020-06-08 Semantic segmentation method based on reverse attention model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010513903.5A CN111680695A (en) 2020-06-08 2020-06-08 Semantic segmentation method based on reverse attention model

Publications (1)

Publication Number Publication Date
CN111680695A true CN111680695A (en) 2020-09-18

Family

ID=72454054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010513903.5A Pending CN111680695A (en) 2020-06-08 2020-06-08 Semantic segmentation method based on reverse attention model

Country Status (1)

Country Link
CN (1) CN111680695A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381097A (en) * 2020-11-16 2021-02-19 西南石油大学 Scene semantic segmentation method based on deep learning
CN112488115A (en) * 2020-11-23 2021-03-12 石家庄铁路职业技术学院 Semantic segmentation method based on two-stream architecture
CN112580654A (en) * 2020-12-25 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Semantic segmentation method for ground objects of remote sensing image
CN112613517A (en) * 2020-12-17 2021-04-06 深圳大学 Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN113052860A (en) * 2021-04-02 2021-06-29 首都师范大学 Three-dimensional cerebral vessel segmentation method and storage medium
CN113298154A (en) * 2021-05-27 2021-08-24 安徽大学 RGB-D image salient target detection method
CN113392711A (en) * 2021-05-19 2021-09-14 中国科学院声学研究所南海研究站 Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN113435411A (en) * 2021-07-26 2021-09-24 中国矿业大学(北京) Improved DeepLabV3+ based open pit land utilization identification method
CN113486897A (en) * 2021-07-29 2021-10-08 辽宁工程技术大学 Semantic segmentation method for convolution attention mechanism up-sampling decoding
CN113537228A (en) * 2021-07-07 2021-10-22 中国电子科技集团公司第五十四研究所 Real-time image semantic segmentation method based on depth features
CN113643311A (en) * 2021-06-28 2021-11-12 清华大学 Image segmentation method and device for boundary error robustness
CN114140437A (en) * 2021-12-03 2022-03-04 杭州电子科技大学 Fundus hard exudate segmentation method based on deep learning
CN114140469A (en) * 2021-12-02 2022-03-04 北京交通大学 Depth hierarchical image semantic segmentation method based on multilayer attention
CN115587967A (en) * 2022-09-06 2023-01-10 杭州电子科技大学 Fundus image optic disk detection method based on HA-UNet network
CN117079142A (en) * 2023-10-13 2023-11-17 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle
CN117236433A (en) * 2023-11-14 2023-12-15 山东大学 Intelligent communication perception method, system, equipment and medium for assisting blind person life

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN110458165A (en) * 2019-08-14 2019-11-15 贵州大学 A kind of natural scene Method for text detection introducing attention mechanism
US20200134380A1 (en) * 2018-10-30 2020-04-30 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Method for Updating Neural Network and Electronic Device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134380A1 (en) * 2018-10-30 2020-04-30 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Method for Updating Neural Network and Electronic Device
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN110197182A (en) * 2019-06-11 2019-09-03 中国电子科技集团公司第五十四研究所 Remote sensing image semantic segmentation method based on contextual information and attention mechanism
CN110458165A (en) * 2019-08-14 2019-11-15 贵州大学 A kind of natural scene Method for text detection introducing attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12, 10 December 2019 (2019-12-10), pages 3496 - 3502 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381097A (en) * 2020-11-16 2021-02-19 西南石油大学 Scene semantic segmentation method based on deep learning
CN112488115A (en) * 2020-11-23 2021-03-12 石家庄铁路职业技术学院 Semantic segmentation method based on two-stream architecture
CN112488115B (en) * 2020-11-23 2023-07-25 石家庄铁路职业技术学院 Semantic segmentation method based on two-stream architecture
CN112613517A (en) * 2020-12-17 2021-04-06 深圳大学 Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium
CN112613517B (en) * 2020-12-17 2022-02-18 深圳大学 Endoscopic instrument segmentation method, endoscopic instrument segmentation apparatus, computer device, and storage medium
CN112580654A (en) * 2020-12-25 2021-03-30 西南电子技术研究所(中国电子科技集团公司第十研究所) Semantic segmentation method for ground objects of remote sensing image
CN112801104B (en) * 2021-01-20 2022-01-07 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN113052860A (en) * 2021-04-02 2021-06-29 首都师范大学 Three-dimensional cerebral vessel segmentation method and storage medium
CN113052860B (en) * 2021-04-02 2022-07-19 首都师范大学 Three-dimensional cerebral vessel segmentation method and storage medium
CN113392711A (en) * 2021-05-19 2021-09-14 中国科学院声学研究所南海研究站 Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN113392711B (en) * 2021-05-19 2023-01-06 中国科学院声学研究所南海研究站 Smoke semantic segmentation method and system based on high-level semantics and noise suppression
CN113298154B (en) * 2021-05-27 2022-11-11 安徽大学 RGB-D image salient object detection method
CN113298154A (en) * 2021-05-27 2021-08-24 安徽大学 RGB-D image salient target detection method
CN113643311B (en) * 2021-06-28 2024-04-09 清华大学 Image segmentation method and device with robust boundary errors
CN113643311A (en) * 2021-06-28 2021-11-12 清华大学 Image segmentation method and device for boundary error robustness
CN113537228A (en) * 2021-07-07 2021-10-22 中国电子科技集团公司第五十四研究所 Real-time image semantic segmentation method based on depth features
CN113435411A (en) * 2021-07-26 2021-09-24 中国矿业大学(北京) Improved DeepLabV3+ based open pit land utilization identification method
CN113486897A (en) * 2021-07-29 2021-10-08 辽宁工程技术大学 Semantic segmentation method for convolution attention mechanism up-sampling decoding
CN114140469A (en) * 2021-12-02 2022-03-04 北京交通大学 Depth hierarchical image semantic segmentation method based on multilayer attention
CN114140469B (en) * 2021-12-02 2023-06-23 北京交通大学 Depth layered image semantic segmentation method based on multi-layer attention
CN114140437A (en) * 2021-12-03 2022-03-04 杭州电子科技大学 Fundus hard exudate segmentation method based on deep learning
CN115587967A (en) * 2022-09-06 2023-01-10 杭州电子科技大学 Fundus image optic disk detection method based on HA-UNet network
CN115587967B (en) * 2022-09-06 2023-10-10 杭州电子科技大学 Fundus image optic disk detection method based on HA-UNet network
CN117079142A (en) * 2023-10-13 2023-11-17 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle
CN117079142B (en) * 2023-10-13 2024-01-26 昆明理工大学 Anti-attention generation countermeasure road center line extraction method for automatic inspection of unmanned aerial vehicle
CN117236433A (en) * 2023-11-14 2023-12-15 山东大学 Intelligent communication perception method, system, equipment and medium for assisting blind person life
CN117236433B (en) * 2023-11-14 2024-02-02 山东大学 Intelligent communication perception method, system, equipment and medium for assisting blind person life

Similar Documents

Publication Publication Date Title
CN111680695A (en) Semantic segmentation method based on reverse attention model
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN110135366B (en) Shielded pedestrian re-identification method based on multi-scale generation countermeasure network
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN111612807B (en) Small target image segmentation method based on scale and edge information
CN109886066A (en) Fast target detection method based on the fusion of multiple dimensioned and multilayer feature
CN110533041B (en) Regression-based multi-scale scene text detection method
CN109035267B (en) Image target matting method based on deep learning
CN109743642B (en) Video abstract generation method based on hierarchical recurrent neural network
CN113763442A (en) Deformable medical image registration method and system
CN115082293A (en) Image registration method based on Swin transducer and CNN double-branch coupling
CN115731441A (en) Target detection and attitude estimation method based on data cross-modal transfer learning
CN114048822A (en) Attention mechanism feature fusion segmentation method for image
CN110930378A (en) Emphysema image processing method and system based on low data demand
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN112884668A (en) Lightweight low-light image enhancement method based on multiple scales
CN116824239A (en) Image recognition method and system based on transfer learning and ResNet50 neural network
CN111652273A (en) Deep learning-based RGB-D image classification method
CN111860683A (en) Target detection method based on feature fusion
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
CN114267025A (en) Traffic sign detection method based on high-resolution network and light-weight attention mechanism
CN114998566A (en) Interpretable multi-scale infrared small and weak target detection network design method
CN114359297A (en) Attention pyramid-based multi-resolution semantic segmentation method and device
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN115049945B (en) Unmanned aerial vehicle image-based wheat lodging area extraction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination