CN116912501A - Weak supervision semantic segmentation method based on attention fusion - Google Patents

Weak supervision semantic segmentation method based on attention fusion Download PDF

Info

Publication number
CN116912501A
CN116912501A CN202310981553.9A CN202310981553A CN116912501A CN 116912501 A CN116912501 A CN 116912501A CN 202310981553 A CN202310981553 A CN 202310981553A CN 116912501 A CN116912501 A CN 116912501A
Authority
CN
China
Prior art keywords
attention
semantic segmentation
block
weak supervision
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310981553.9A
Other languages
Chinese (zh)
Inventor
苏京峰
李军侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310981553.9A priority Critical patent/CN116912501A/en
Publication of CN116912501A publication Critical patent/CN116912501A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a weak supervision semantic segmentation method based on attention fusion, relates to the technical field of computer vision, and provides a simple and effective weak supervision semantic segmentation framework by taking Vision Transformer as a basic network structure. In the framework, an adaptive attention fusion module is designed firstly, different weights are distributed to the attention of different layers, and the attention after fusion can well inhibit background noise while keeping target details. In addition, aiming at the problem that the next-important area in the attention cannot well activate the target area, a modulation function is designed to increase the attention value of the next-important area, so that the target area is effectively highlighted. And then optimizing the rough class activation diagram by using the modulated attention, so that the target area in the obtained class activation diagram can be activated more completely and accurately, and the problem of incomplete activation of the class activation diagram can be better solved.

Description

Weak supervision semantic segmentation method based on attention fusion
Technical Field
The invention relates to the technical field of computer vision, in particular to a weak supervision semantic segmentation method based on attention fusion.
Background
Semantic segmentation is one of the fundamental and challenging tasks in the field of computer vision that utilizes a computer's feature representation to simulate the human recognition process of an image, assigning a semantic class label to each pixel of a given image. In recent years, due to the vigorous development of deep learning methods, semantic segmentation has also made remarkable progress. As a dense prediction task, the training of the semantic segmentation model is not separated from the large-scale pixel-level labeling data, but the pixel-level labeling of the image is difficult to obtain, time-consuming and labor-consuming.
The weak supervision semantic segmentation technology can solve the problem of dependence of the existing semantic segmentation model on a large amount of pixel-level annotation data because the segmentation model is trained only by relying on weak annotation data, and therefore the weak annotation technology is becoming an academic research hotspot, and common weak annotations comprise boundary box annotations, graffiti annotations, point annotations, image-level annotations and the like. In the weak supervision labels described above, image level annotation is more readily available than in other ways, and weak supervision semantic segmentation based on image level annotation is also most challenging, since only specific object class information present in the image is given, and the location of the object class in the image is not pointed out.
Most image level weak supervision semantic segmentation methods typically require coarse location information generated by class activation graphs due to lack of specific location information of the target class in the image. Class activation maps are a deep classification network based technique that generates feature maps with the same number of channels as the total class. The specific operation flow of the method is as follows: 1) Obtaining a seed area through the class activation diagram; 2) Expanding the seed region to obtain a pseudo tag; 3) The pseudo-labels are used to train a conventional fully supervised neural network to obtain the final segmentation results. Since class activation maps tend to cover only the most discriminative areas of an object and misinterpret the background as foreground, much effort is devoted to generating higher quality class activation maps.
With the rapid development of visual converters (Vision Transformer, viT), researchers began introducing Vision Transformer into a weakly supervised semantic segmentation task, some methods extracted image features using Vision Transformer structures and generated coarse class activation maps, which were then optimized using attention to obtain higher quality class activation maps. Typically these methods will directly add and fuse the attention of the different layers. However, the shallow attention of the Vision Transformer structure focuses more on the local detail features of the image, and the class activation diagram obtained by shallow attention optimization often contains more detail information; deep attention is then more focused on the global information of the image, so directly merging the attention areas of different layers and not the optimal choice may lead to misleading information in the optimization stage.
Disclosure of Invention
In order to solve the technical problems, the invention provides a weak supervision semantic segmentation method based on attention fusion, which comprises the following steps of
S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
s2, carrying out data preprocessing on the images in the data set;
s3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a data image converter DeiT pre-trained on an image recognition data set ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Respectively multiplying the modulation functions G;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention to obtain a final class activation diagram;
s4, training the weak supervision semantic segmentation model based on attention fusion for multiple times, and storing the best parameters corresponding to the best round of results of training;
s5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram. .
The technical scheme of the invention is as follows:
further, in step S1, using the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object categories and one background category; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
The weak supervision semantic segmentation method based on attention fusion comprises a training set consisting of 1464 images, a verification set consisting of 1449 images and a test set consisting of 1456 images, wherein the training set adopts 10582 images expanded by additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
The aforementioned weak supervision semantic segmentation method based on attention fusion, step S2 comprises the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering treatment on the image;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S2.1, the method for performing color dithering on the image specifically includes: the brightness, contrast and saturation values of the image were all set to 0.3.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S3.7, the modulation function G is shown as follows:
where x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S3.8, the method for optimizing the rough activation map specifically includes: firstly multiplying a Coarse class activation diagram Coarse-CAM with modulated class-to-block attention element by element to obtain a preliminarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S4, model-related hyper-parameters are set, model training times Epoch is set to 60, model training batch size is set to 64, an optimizer used during training is an Adamw optimizer, a loss function is multi-label cross entropy loss, and initial learning rate is set to 4e-4.
In the foregoing weak supervision semantic segmentation method based on attention fusion, in step S5, the stored best parameters are loaded into a weak supervision semantic segmentation model based on attention fusion, the pictures in the verification set and the test set are input into the model, then the average cross-over ratio MIoU is calculated, and the semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on the Pascal VOC 2012 and MS COCO 2014 datasets is measured according to the obtained MIoU value.
The beneficial effects of the invention are as follows:
(1) In the invention, the problem of incomplete activation of a class activation diagram in weak supervision semantic segmentation is mainly solved, a visual transducer is used as a basic network structure, and a simple and effective weak supervision semantic segmentation framework is provided. In the framework, an adaptive attention fusion module is designed, different weights are distributed to the attention of different layers, and the attention after fusion can effectively inhibit background noise while keeping target details;
(2) In the invention, a modulation function is designed for increasing the attention value of the secondary important area and effectively highlighting the target area aiming at the problem that the secondary important area in the attention cannot better activate the target area;
(3) In the invention, the rough class activation diagram is optimized by using the modulated attention, and the target area in the obtained class activation diagram can be activated more completely and accurately, so that the problem of incomplete activation of the class activation diagram can be effectively solved.
Drawings
FIG. 1 is a schematic diagram of a weak supervision semantic segmentation model based on attention fusion in an embodiment of the invention;
FIG. 2 is an exemplary graph of segmentation results on a PASCAL VOC 2012 validation set in accordance with an embodiment of the present invention;
fig. 3 is an exemplary diagram of segmentation results on the MS COCO 2014 verification set according to an embodiment of the present invention.
Detailed Description
The weak supervision semantic segmentation method based on attention fusion is used for weak supervision semantic segmentation tasks under image level annotation, and the overall structure of the framework is shown in fig. 1 and mainly comprises three parts: 1) Extracting features by using a visual transducer and generating a rough activation map; 2) The self-adaptive attention fusion module is used for adaptively distributing weights to attention of different layers, so that the attention after fusion can effectively inhibit background noise while target details are kept, and a modulation function is designed for increasing the attention value of a secondary important region aiming at the problem that the secondary important region in the attention cannot well activate the target region; 3) The rough class activation map is optimized by using the modulated attention, and the obtained final class activation map can more accurately and comprehensively cover the target area.
A weak supervision semantic segmentation method based on attention fusion is shown in FIG. 1, and comprises the following steps of S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
using the PASCAL VOC 2012 dataset and the MS COC02014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object classes and one background class; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
The PASCAL VOC 2012 dataset includes a training set of 1464 images, a validation set of 1449 images, and a test set of 1456 images, wherein the training set is augmented with 10582 images of additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
S2, preprocessing the data of the image in the data set, and specifically comprising the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering on an image, wherein the method for carrying out the color dithering on the image specifically comprises the following steps: setting the brightness, contrast and saturation value of the image to 0.3;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
S3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a Data image converter (Data-efficient image Transformers, deiT) pre-trained on an image recognition dataset ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Multiplied by a modulation function G, respectively, which is represented by the following formula:
wherein x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention, and multiplying the rough class activation diagram Coarse-CAM with the modulated class-to-block attention element by element to obtain a primarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
S4, training the weak supervision semantic segmentation model based on attention fusion for multiple rounds, and storing the best parameters corresponding to the best round of training results by observing the verification results;
setting model-related super parameters, setting model training times Epoch to 60, setting model training batch size to 64, using an optimizer as an Adamw optimizer during training, using a loss function as multi-label cross entropy loss, and setting initial learning rate to 4e-4.
S5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram;
loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, inputting pictures in a verification set and a test set into the model, then calculating an average cross-over ratio (Mean Intersection over Union, MIoU), and measuring semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on a Pascal VOC 2012 and MS COCO 2014 data set according to the obtained MIoU value, wherein the segmentation result is shown in fig. 2 and 3.
In summary, the invention aims to solve the problem that the target region cannot be fully activated in the weak supervision semantic segmentation. The embodiment provides an Adaptive attention fusion module (Adaptive AttentionFusion, AAF) to measure the importance of the attention of different layers to the class activation diagram, so as to assign weights to the attention of different layers. In order to ensure that the adaptive attention fusion module can estimate accurate weights, the embodiment proposes an end-to-end adaptive attention fusion module to generate a training strategy. In the training process, the embodiment applies the weight estimated by the adaptive attention fusion module to the attention of different layers. The attention after fusion using this approach can also better suppress background noise while preserving target detail.
In addition, it is found through experiments that although the target area can be activated in the process of optimizing the area of secondary importance in the attention, because the corresponding attention value is smaller, some areas which should be activated are not activated well, so a modulation function is designed to increase the attention value of the area of secondary importance in the attention, so that more target areas can be activated. Finally, the modulated attention-optimized rough class activation diagram is used for obtaining a class activation diagram which can cover more target areas, background noise is effectively restrained, and finally, a high-quality pseudo tag is obtained for training a semantic segmentation network.
In addition to the embodiments described above, other embodiments of the invention are possible. All technical schemes formed by equivalent substitution or equivalent transformation fall within the protection scope of the invention.

Claims (9)

1. A weak supervision semantic segmentation method based on attention fusion is characterized by comprising the following steps of: comprises the following steps
S1, preparing a data set, wherein the data set comprises a training set, a verification set and a test set;
s2, carrying out data preprocessing on the images in the data set;
s3, constructing a weak supervision semantic segmentation model based on attention fusion, and adopting a data image converter DeiT pre-trained on an image recognition data set ImageNet as a backbone of the model; step S3 comprises the following sub-steps:
s3.1, dividing the image subjected to data preprocessing in the step S2 into N non-overlapping blocks, constructing N block tokens through linear mapping, and splicing C class tokens and N block tokens to obtain an input token of the model;
s3.2, inputting the input token into a Transfomer coding layer of a weak supervision semantic segmentation model based on attention fusion to obtain an output token;
s3.3, extracting the last N block tokens from the output tokens to form the output block tokens, carrying out recombination operation and convolution operation on the output block tokens to obtain a rough class activation diagram,
Coarse-CAM=Conv(Reshape(Tp_out))
where Tp_out represents the output block token, reshape represents the reorganization operation, conv represents the convolution operation, and Coarse-CAM represents the Coarse class activation map;
s3.4, when the input token passes through the Transfomer coding layer, attention is calculated on the input token through an Attention module to generate Attention, and the calculation formula is as follows:
wherein Q and K respectively represent a array matrix and a Key matrix obtained by linear projection of an input token when the input token passes through a transducer coding layer, and d k Representing a scaling factor, T representing a matrix transposition operation;
s3.5, each transducer coding layer generates one attention, all the attention is obtained after L transducer coding layers are passed, and the L attention is named as A; and then carrying out global average pooling operation on the A, and then carrying out information interaction through a full connection layer to generate weight W, wherein the weight W is as follows:
W=FC(GAP(A))
GAP represents global average pooling operation, and FC represents a fully connected layer;
s3.6, multiplying the obtained weight W by A and fusing to obtain the final attention W';
s3.7, further dividing the final attention W' into class-to-block attention A c2p Sum block-to-block attention a p2p And to class into block attention A c2p Sum block-to-block attention a p2p Respectively multiplying the modulation functions G;
s3.8, optimizing the rough class activation diagram sequentially by using the modulated class-to-block attention and the modulated block-to-block attention to obtain a final class activation diagram;
s4, training the weak supervision semantic segmentation model based on attention fusion for multiple times, and storing the best parameters corresponding to the best round of results of training;
s5, loading the stored best parameters into a weak supervision semantic segmentation model based on attention fusion, and then inputting test set data into the model to generate a complete class activation diagram.
2. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S1, using the PASCAL VOC 2012 dataset and the MS COCO 2014 dataset as datasets, the PASCAL VOC 2012 dataset has 21 categories including 20 object categories and one background category; the MS COCO 2014 dataset has 81 categories, including 80 object classes and one background class.
3. The weak supervision semantic segmentation method based on attention fusion according to claim 2, wherein: the PASCAL VOC 2012 dataset comprises a training set of 1464 images, a validation set of 1449 images, and a test set of 1456 images, wherein the training set is 10582 images augmented with additional data; the MS COCO 2014 dataset included a training set consisting of 82081 images and a validation set consisting of 40137 images.
4. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: the step S2 comprises the following sub-steps:
s2.1, carrying out random horizontal overturn and color dithering treatment on the image;
s2.2, performing normalization processing on the image, and adjusting the image size to 256×256;
s2.3, finally, randomly clipping the image, and adjusting the size of the image to 224 multiplied by 224.
5. The weak supervision semantic segmentation method based on attention fusion according to claim 4, wherein: in the step S2.1, the method for performing color dithering processing on the image specifically includes: the brightness, contrast and saturation values of the image were all set to 0.3.
6. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S3.7, the modulation function G is represented by the following formula:
where x represents the input, μ represents the average of the input, σ represents the variance of the input, e represents the exponential function, and pi represents the circumference ratio.
7. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S3.8, the method for optimizing the rough activation map specifically includes: firstly multiplying a Coarse class activation diagram Coarse-CAM with modulated class-to-block attention element by element to obtain a preliminarily optimized class activation diagram; and then the class activation map is further optimized by matrix multiplication by using the modulated block-to-block attention, and a Final class activation map Final-CAM is obtained.
8. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S4, setting model-related super parameters, setting model training times Epoch to 60, setting model training batch size to 64, setting an optimizer used in training as an Adamw optimizer, setting a loss function as multi-label cross entropy loss, and setting an initial learning rate to 4e-4.
9. The weak supervision semantic segmentation method based on attention fusion according to claim 1, wherein: in the step S5, the stored best parameters are loaded into the weak supervision semantic segmentation model based on attention fusion, the pictures in the verification set and the test set are input into the model, then the average intersection ratio MIoU is calculated, and the semantic segmentation performance of the weak supervision semantic segmentation model based on attention fusion on the Pascal VOC 2012 and MS COCO 2014 data sets is measured according to the obtained MIoU value.
CN202310981553.9A 2023-08-04 2023-08-04 Weak supervision semantic segmentation method based on attention fusion Pending CN116912501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310981553.9A CN116912501A (en) 2023-08-04 2023-08-04 Weak supervision semantic segmentation method based on attention fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310981553.9A CN116912501A (en) 2023-08-04 2023-08-04 Weak supervision semantic segmentation method based on attention fusion

Publications (1)

Publication Number Publication Date
CN116912501A true CN116912501A (en) 2023-10-20

Family

ID=88362941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310981553.9A Pending CN116912501A (en) 2023-08-04 2023-08-04 Weak supervision semantic segmentation method based on attention fusion

Country Status (1)

Country Link
CN (1) CN116912501A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154884A (en) * 2024-05-13 2024-06-07 山东锋士信息技术有限公司 Weak supervision image semantic segmentation method based on sample mixing and contrast learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118154884A (en) * 2024-05-13 2024-06-07 山东锋士信息技术有限公司 Weak supervision image semantic segmentation method based on sample mixing and contrast learning

Similar Documents

Publication Publication Date Title
Liu et al. Picanet: Learning pixel-wise contextual attention for saliency detection
CN113240580B (en) Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN110837836B (en) Semi-supervised semantic segmentation method based on maximized confidence
CN110837846A (en) Image recognition model construction method, image recognition method and device
CN110263858B (en) Bolt image synthesis method and device and related equipment
CN108108751A (en) A kind of scene recognition method based on convolution multiple features and depth random forest
CN110210482B (en) Target detection method for improving class imbalance
WO2016095068A1 (en) Pedestrian detection apparatus and method
CN116912501A (en) Weak supervision semantic segmentation method based on attention fusion
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN112101364A (en) Semantic segmentation method based on parameter importance incremental learning
CN109447897B (en) Real scene image synthesis method and system
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN113569852A (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN109492610A (en) A kind of pedestrian recognition methods, device and readable storage medium storing program for executing again
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN112329771A (en) Building material sample identification method based on deep learning
CN116071553A (en) Weak supervision semantic segmentation method and device based on naive VisionTransformer
CN115222750A (en) Remote sensing image segmentation method and system based on multi-scale fusion attention
CN114519819A (en) Remote sensing image target detection method based on global context awareness
CN117036711A (en) Weak supervision semantic segmentation method based on attention adjustment
CN116824330A (en) Small sample cross-domain target detection method based on deep learning
Wan et al. One-shot unsupervised domain adaptation for object detection
CN115688234A (en) Building layout generation method, device and medium based on conditional convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination