CN116912501A - Weak supervision semantic segmentation method based on attention fusion - Google Patents
Weak supervision semantic segmentation method based on attention fusion Download PDFInfo
- Publication number
- CN116912501A CN116912501A CN202310981553.9A CN202310981553A CN116912501A CN 116912501 A CN116912501 A CN 116912501A CN 202310981553 A CN202310981553 A CN 202310981553A CN 116912501 A CN116912501 A CN 116912501A
- Authority
- CN
- China
- Prior art keywords
- attention
- semantic segmentation
- block
- weak supervision
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 64
- 230000004927 fusion Effects 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000004913 activation Effects 0.000 claims abstract description 52
- 238000010586 diagram Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims description 31
- 230000006870 function Effects 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000012360 testing method Methods 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 9
- 238000011176 pooling Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000006798 recombination Effects 0.000 claims description 3
- 238000005215 recombination Methods 0.000 claims description 3
- 230000008521 reorganization Effects 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000003190 augmentative effect Effects 0.000 claims description 2
- 230000003044 adaptive effect Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a weak supervision semantic segmentation method based on attention fusion, relates to the technical field of computer vision, and provides a simple and effective weak supervision semantic segmentation framework by taking Vision Transformer as a basic network structure. In the framework, an adaptive attention fusion module is designed firstly, different weights are distributed to the attention of different layers, and the attention after fusion can well inhibit background noise while keeping target details. In addition, aiming at the problem that the next-important area in the attention cannot well activate the target area, a modulation function is designed to increase the attention value of the next-important area, so that the target area is effectively highlighted. And then optimizing the rough class activation diagram by using the modulated attention, so that the target area in the obtained class activation diagram can be activated more completely and accurately, and the problem of incomplete activation of the class activation diagram can be better solved.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a weak supervision semantic segmentation method based on attention fusion.
Background
Semantic segmentation is one of the fundamental and challenging tasks in the field of computer vision that utilizes a computer's feature representation to simulate the human recognition process of an image, assigning a semantic class label to each pixel of a given image. In recent years, due to the vigorous development of deep learning methods, semantic segmentation has also made remarkable progress. As a dense prediction task, the training of the semantic segmentation model is not separated from the large-scale pixel-level labeling data, but the pixel-level labeling of the image is difficult to obtain, time-consuming and labor-consuming.
The weak supervision semantic segmentation technology can solve the problem of dependence of the existing semantic segmentation model on a large amount of pixel-level annotation data because the segmentation model is trained only by relying on weak annotation data, and therefore the weak annotation technology is becoming an academic research hotspot, and common weak annotations comprise boundary box annotations, graffiti annotations, point annotations, image-level annotations and the like. In the weak supervision labels described above, image level annotation is more readily available than in other ways, and weak supervision semantic segmentation based on image level annotation is also most challenging, since only specific object class information present in the image is given, and the location of the object class in the image is not pointed out.
Most image level weak supervision semantic segmentation methods typically require coarse location information generated by class activation graphs due to lack of specific location information of the target class in the image. Class activation maps are a deep classification network based technique that generates feature maps with the same number of channels as the total class. The specific operation flow of the method is as follows: 1) Obtaining a seed area through the class activation diagram; 2) Expanding the seed region to obtain a pseudo tag; 3) The pseudo-labels are used to train a conventional fully supervised neural network to obtain the final segmentation results. Since class activation maps tend to cover only the most discriminative areas of an object and misinterpret the background as foreground, much effort is devoted to generating higher quality class activation maps.
With the rapid development of visual converters (Vision Transformer, viT), researchers began introducing Vision Transformer into a weakly supervised semantic segmentation task, some methods extracted image features using Vision Transformer structures and generated coarse class activation maps, which were then optimized using attention to obtain higher quality class activation maps. Typically these methods will directly add and fuse the attention of the different layers. However, the shallow attention of the Vision Transformer structure focuses more on the local detail features of the image, and the class activation diagram obtained by shallow attention optimization often contains more detail information; deep attention is then more focused on the global information of the image, so directly merging the attention areas of different layers and not the optimal choice may lead to misleading information in the optimization stage.
Disclosure of Invention
In order to solve the technical problems, the invention provides a weak supervision semantic segmentation method based on attention fusion, which comprises the following steps of
S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
s2, carrying out data preprocessing on the images in the data set;
s3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a data image converter DeiT pre-trained on an image recognition data set ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Respectively multiplying the modulation functions G;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention to obtain a final class activation diagram;
s4, training the weak supervision semantic segmentation model based on attention fusion for multiple times, and storing the best parameters corresponding to the best round of results of training;
s5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram. .
The technical scheme of the invention is as follows:
further, in step S1, using the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object categories and one background category; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
The weak supervision semantic segmentation method based on attention fusion comprises a training set consisting of 1464 images, a verification set consisting of 1449 images and a test set consisting of 1456 images, wherein the training set adopts 10582 images expanded by additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
The aforementioned weak supervision semantic segmentation method based on attention fusion, step S2 comprises the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering treatment on the image;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S2.1, the method for performing color dithering on the image specifically includes: the brightness, contrast and saturation values of the image were all set to 0.3.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S3.7, the modulation function G is shown as follows:
where x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S3.8, the method for optimizing the rough activation map specifically includes: firstly multiplying a Coarse class activation diagram Coarse-CAM with modulated class-to-block attention element by element to obtain a preliminarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S4, model-related hyper-parameters are set, model training times Epoch is set to 60, model training batch size is set to 64, an optimizer used during training is an Adamw optimizer, a loss function is multi-label cross entropy loss, and initial learning rate is set to 4e-4.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S5, the stored best parameters are loaded into a weak supervision semantic segmentation model based on attention fusion, the pictures in the verification set and the test set are input into the model, then the average cross-over ratio MIoU is calculated, and the semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on the Pascal VOC 2012 and MS COCO 2014 datasets is measured according to the obtained MIoU value.
The beneficial effects of the invention are as follows:
(1) In the invention, the problem of incomplete activation of a class activation diagram in weak supervision semantic segmentation is mainly solved, a visual transducer is used as a basic network structure, and a simple and effective weak supervision semantic segmentation framework is provided. In the framework, an adaptive attention fusion module is designed, different weights are distributed to the attention of different layers, and the attention after fusion can effectively inhibit background noise while keeping target details;
(2) In the invention, a modulation function is designed for increasing the attention value of the secondary important area and effectively highlighting the target area aiming at the problem that the secondary important area in the attention cannot better activate the target area;
(3) In the invention, the rough class activation diagram is optimized by using the modulated attention, and the target area in the obtained class activation diagram can be activated more completely and accurately, so that the problem of incomplete activation of the class activation diagram can be effectively solved.
Drawings
FIG. 1 is a schematic diagram of a weak supervision semantic segmentation model based on attention fusion in an embodiment of the invention;
FIG. 2 is an exemplary graph of segmentation results on a PASCAL VOC 2012 validation set in accordance with an embodiment of the present invention;
fig. 3 is an exemplary diagram of segmentation results on the MS COCO 2014 verification set according to an embodiment of the present invention.
Detailed Description
The weak supervision semantic segmentation method based on attention fusion is used for weak supervision semantic segmentation tasks under image level annotation, and the overall structure of the framework is shown in fig. 1 and mainly comprises three parts: 1) Extracting features by using a visual transducer and generating a rough activation map; 2) The self-adaptive attention fusion module is used for adaptively distributing weights to attention of different layers, so that the attention after fusion can effectively inhibit background noise while target details are kept, and a modulation function is designed for increasing the attention value of a secondary important region aiming at the problem that the secondary important region in the attention cannot well activate the target region; 3) The rough class activation map is optimized by using the modulated attention, and the obtained final class activation map can more accurately and comprehensively cover the target area.
A weak supervision semantic segmentation method based on attention fusion is shown in FIG. 1, and comprises the following steps of S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
using the PASCAL VOC 2012 dataset and the MS COC02014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object classes and one background class; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
The PASCAL VOC 2012 dataset includes a training set of 1464 images, a validation set of 1449 images, and a test set of 1456 images, wherein the training set is augmented with 10582 images of additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
S2, preprocessing the data of the image in the data set, and specifically comprising the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering on an image, wherein the method for carrying out the color dithering on the image specifically comprises the following steps: setting the brightness, contrast and saturation value of the image to 0.3;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
S3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a Data image converter (Data-efficient image Transformers, deiT) pre-trained on an image recognition dataset ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Multiplied by a modulation function G, respectively, which is represented by the following formula:
wherein x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention, and multiplying the rough class activation diagram Coarse-CAM with the modulated class-to-block attention element by element to obtain a primarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
S4, training the weak supervision semantic segmentation model based on attention fusion for multiple rounds, and storing the best parameters corresponding to the best round of training results by observing the verification results;
setting model-related super parameters, setting model training times Epoch to 60, setting model training batch size to 64, using an optimizer as an Adamw optimizer during training, using a loss function as multi-label cross entropy loss, and setting initial learning rate to 4e-4.
S5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram;
loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, inputting pictures in a verification set and a test set into the model, then calculating an average cross-over ratio (Mean Intersection over Union, MIoU), and measuring semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on a Pascal VOC 2012 and MS COCO 2014 data set according to the obtained MIoU value, wherein the segmentation result is shown in fig. 2 and 3.
In summary, the invention aims to solve the problem that the target region cannot be fully activated in the weak supervision semantic segmentation. The embodiment provides an Adaptive attention fusion module (Adaptive AttentionFusion, AAF) to measure the importance of the attention of different layers to the class activation diagram, so as to assign weights to the attention of different layers. In order to ensure that the adaptive attention fusion module can estimate accurate weights, the embodiment proposes an end-to-end adaptive attention fusion module to generate a training strategy. In the training process, the embodiment applies the weight estimated by the adaptive attention fusion module to the attention of different layers. The attention after fusion using this approach can also better suppress background noise while preserving target detail.
In addition, it is found through experiments that although the target area can be activated in the process of optimizing the area of secondary importance in the attention, because the corresponding attention value is smaller, some areas which should be activated are not activated well, so a modulation function is designed to increase the attention value of the area of secondary importance in the attention, so that more target areas can be activated. Finally, the modulated attention-optimized rough class activation diagram is used for obtaining a class activation diagram which can cover more target areas, background noise is effectively restrained, and finally, a high-quality pseudo tag is obtained for training a semantic segmentation network.
In addition to the embodiments described above, other embodiments of the invention are possible. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.
Claims (9)
1. A weak supervision semantic segmentation method based on attention fusion is characterized by comprising the following steps of: comprises the following steps
S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
s2, carrying out data preprocessing on the images in the data set;
s3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a data image converter DeiT pre-trained on an image recognition data set ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Respectively multiplying the modulation functions G;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention to obtain a final class activation diagram;
s4, training the weak supervision semantic segmentation model based on attention fusion for multiple times, and storing the best parameters corresponding to the best round of results of training;
s5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram.
2. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S1, using the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object categories and one background category; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
3. The weak supervision semantic segmentation method based on attention fusion according to claim 2, wherein: the PASCAL VOC 2012 dataset comprises a training set of 1464 images, a validation set of 1449 images, and a test set of 1456 images, wherein the training set is 10582 images augmented with additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
4. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: the step S2 comprises the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering treatment on the image;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
5. The weak supervision semantic segmentation method based on attention fusion according to claim 4, wherein: in the step S2.1, the method for performing color dithering processing on the image specifically includes: the brightness, contrast and saturation values of the image were all set to 0.3.
6. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S3.7, the modulation function G is represented by the following formula:
where x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio.
7. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S3.8, the method for optimizing the rough activation map specifically includes: firstly multiplying a Coarse class activation diagram Coarse-CAM with modulated class-to-block attention element by element to obtain a preliminarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
8. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S4, setting model-related super parameters, setting model training times Epoch to 60, setting model training batch size to 64, setting an optimizer used in training as an Adamw optimizer, setting a loss function as multi-label cross entropy loss, and setting an initial learning rate to 4e-4.
9. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S5, the stored best parameters are loaded into the weak supervision semantic segmentation model based on attention fusion, the pictures in the verification set and the test set are input into the model, then the average intersection ratio MIoU is calculated, and the semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on the Pascal VOC 2012 and MS COCO 2014 data sets is measured according to the obtained MIoU value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310981553.9A CN116912501A (en) | 2023-08-04 | 2023-08-04 | Weak supervision semantic segmentation method based on attention fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310981553.9A CN116912501A (en) | 2023-08-04 | 2023-08-04 | Weak supervision semantic segmentation method based on attention fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116912501A true CN116912501A (en) | 2023-10-20 |
Family
ID=88362941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310981553.9A Pending CN116912501A (en) | 2023-08-04 | 2023-08-04 | Weak supervision semantic segmentation method based on attention fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116912501A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118154884A (en) * | 2024-05-13 | 2024-06-07 | 山东锋士信息技术有限公司 | Weak supervision image semantic segmentation method based on sample mixing and contrast learning |
-
2023
- 2023-08-04 CN CN202310981553.9A patent/CN116912501A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118154884A (en) * | 2024-05-13 | 2024-06-07 | 山东锋士信息技术有限公司 | Weak supervision image semantic segmentation method based on sample mixing and contrast learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | Picanet: Learning pixel-wise contextual attention for saliency detection | |
CN113240580B (en) | Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation | |
CN110837836B (en) | Semi-supervised semantic segmentation method based on maximized confidence | |
CN110837846A (en) | Image recognition model construction method, image recognition method and device | |
CN110263858B (en) | Bolt image synthesis method and device and related equipment | |
CN108108751A (en) | A kind of scene recognition method based on convolution multiple features and depth random forest | |
CN110210482B (en) | Target detection method for improving class imbalance | |
WO2016095068A1 (en) | Pedestrian detection apparatus and method | |
CN116912501A (en) | Weak supervision semantic segmentation method based on attention fusion | |
CN113159067A (en) | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation | |
CN112101364A (en) | Semantic segmentation method based on parameter importance incremental learning | |
CN109447897B (en) | Real scene image synthesis method and system | |
CN114821058A (en) | Image semantic segmentation method and device, electronic equipment and storage medium | |
CN113569852A (en) | Training method and device of semantic segmentation model, electronic equipment and storage medium | |
CN115410081A (en) | Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium | |
CN109492610A (en) | A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again | |
CN113111716A (en) | Remote sensing image semi-automatic labeling method and device based on deep learning | |
CN112329771A (en) | Building material sample identification method based on deep learning | |
CN116071553A (en) | Weak supervision semantic segmentation method and device based on naive VisionTransformer | |
CN115222750A (en) | Remote sensing image segmentation method and system based on multi-scale fusion attention | |
CN114519819A (en) | Remote sensing image target detection method based on global context awareness | |
CN117036711A (en) | Weak supervision semantic segmentation method based on attention adjustment | |
CN116824330A (en) | Small sample cross-domain target detection method based on deep learning | |
Wan et al. | One-shot unsupervised domain adaptation for object detection | |
CN115688234A (en) | Building layout generation method, device and medium based on conditional convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |