CN118015282A - Weak supervision semantic segmentation method based on background priori - Google Patents
Weak supervision semantic segmentation method based on background priori Download PDFInfo
- Publication number
- CN118015282A CN118015282A CN202410311121.1A CN202410311121A CN118015282A CN 118015282 A CN118015282 A CN 118015282A CN 202410311121 A CN202410311121 A CN 202410311121A CN 118015282 A CN118015282 A CN 118015282A
- Authority
- CN
- China
- Prior art keywords
- background
- mask
- map
- image
- semantic segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000004927 fusion Effects 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 238000010586 diagram Methods 0.000 claims abstract description 16
- 230000000007 visual effect Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000012800 visualization Methods 0.000 claims abstract description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000009825 accumulation Methods 0.000 claims description 6
- 238000002156 mixing Methods 0.000 claims description 6
- 230000003313 weakening effect Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 238000005192 partition Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000009191 jumping Effects 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 10
- 238000002372 labelling Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000002059 diagnostic imaging Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori, which comprises the following steps: inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map; inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of data set features and generate a classification feature map; distinguishing the foreground from the background in the classification characteristic diagram by using the background mask diagram; inputting the foreground into a classifier for classification operation and performing visual visualization; the invention solves the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor in the existing semantic segmentation technology.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori.
Background
Semantic segmentation is one of the classical problems of computer vision, can be widely applied to fine segmentation scenes such as road scene segmentation and remote sensing image segmentation based on vision, has higher segmentation precision and segmentation effect under the condition of pixel-level label supervision, can even perform relatively accurate semantic segmentation on all targets contained in a natural image by using a large-scale annotation data-trained segmentation large model, but has poor effect of segmenting the large model in the fields such as medical images, infrared images and remote sensing images, and has high acquisition cost of pixel-level labels in special fields, and can not realize training of a deep learning model by using a fully-supervised semantic segmentation method.
With the rapid development of image semantic segmentation technology, weak supervision semantic segmentation technology is presented at present, and the learning process of constructing a prediction model under the condition of no pixel-level label supervision is aimed at realizing semantic segmentation by only using the existing partial image-level labels, target detection frames or fuzzy labels, and compared with the supervised learning semantic annotation, the weak supervision learning semantic annotation is easier to acquire.
However, the traditional weak supervision technology has the limitations of less labeling data and poor network generalization capability, and has the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor.
Disclosure of Invention
The invention aims to provide a weak supervision semantic segmentation method based on background priori, which aims to solve the problems that segmentation labels are difficult to obtain and weak supervision segmentation effects are poor in the existing semantic segmentation technology.
In order to achieve the above object, the present invention provides a weak supervision semantic segmentation method based on background priori, comprising:
Inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map;
Inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
distinguishing the foreground from the background in the classification feature map by the background mask map;
Inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation graph.
The specific step of inputting the specific data set with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a clustering mask map comprises the following specific steps:
performing RGB image-to-gray image conversion operation on an input image, and deleting a transparency channel of the image;
The pixels are arranged from large to small, the pixels contained in the image form a pixel sequence, the distance calculation is carried out on the first pixel value and the last pixel value of the fixed value adjacent pixel sequence, the pixel sequence with the maximum distance is found out, the median value of the pixel sequence is used as a threshold value to be divided, the color division is realized, and the cluster mask map is generated.
The specific steps of inputting the data set into a SAM model which is pre-trained on a natural image for reasoning, and obtaining a pre-training mask map comprise the following steps:
encoding the dataset through a weight frozen image token encoder, and storing the feature vector in numpy modes;
And decoding by numpy vectors to obtain a plurality of instance partition graphs, filtering, deleting the instances of the noise level, carrying out partition graph mask accumulation on the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
The specific steps of generating a background mask map by performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module include:
searching a mask crossing area by using the cluster mask map and the pre-training mask map;
When the IOU of the crossing area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and shielding when the IOU is smaller than 0.5, so as to generate a fusion background mask image.
The specific steps of utilizing a plurality of serial multi-scale spliced convolution blocks comprise:
Performing primary extraction by using square convolution to check the features, and realizing transverse mixing and longitudinal mixing extraction of the feature map on the space position by using grouping convolution of transverse and longitudinal firstly;
and splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, and performing channel dimension reduction by using point convolution.
The specific steps for realizing tower extraction of the data set features comprise:
performing batch normalization and convolution kernel from small to large convolution preliminary processing on the input features at each layer;
And jumping connection is carried out on the primary processing features, accumulation of the transverse and longitudinal extraction features and the primary processing features is realized, the accumulation features are output by utilizing LP pooling, and the operations are repeated for a plurality of times to realize tower extraction of the features.
The specific step of distinguishing the foreground from the background in the classification characteristic map by the background mask map comprises the following steps:
calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask map to obtain a pixel weakening value W=1-d/N, wherein N is the side length of the mask map;
And multiplying the background features with the weakening values, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map include:
Visual interpretation is carried out on the convolution layer of the last layer of the classification operation, and the forward and backward propagation gradient weights of the convolution of the last layer are recorded by using two hook functions of register_forward_ hook, register _backward_hook;
Grad-CAM is generated by using gradient weight, and normalized to obtain an interpretable weight map, pixel selection is performed by using a threshold Seg_threshold, and if the pixel selection exceeds the threshold, the pixel selection is set to be white, otherwise, the pixel selection is set to be black, and a weak supervision segmentation map is obtained.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of performing weak supervision semantic segmentation on an image by using background priori knowledge, performing pre-segmentation on the image by using a clustering algorithm and a SAM pre-training model, reducing the difficulty of weak supervision tasks, and simultaneously realizing screening of pre-segmentation areas by using a segmentation result fusion module, thereby improving the effectiveness and the information quality of the foreground segmentation areas; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors.
FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map.
FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map.
Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module.
FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks.
FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention.
Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map.
FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.
Fig. 9 is an effect diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.
Fig. 10 is a schematic diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.
Detailed Description
Referring to fig. 1-8, fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors according to the present invention. FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map. FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map. Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module. FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks. FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention. Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map. FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.
The invention provides a weak supervision semantic segmentation method based on background priori, comprising the following steps:
s1, inputting a specific data set with only image-level labels into a background clustering algorithm to perform background clustering according to pixel values to obtain a cluster mask map;
The method comprises the following specific steps:
s11, carrying out RGB image-gray image conversion operation on an input image, and deleting a transparency channel of the image;
S12, arranging pixels from large to small, forming pixel points contained in the image into a pixel sequence, calculating the distance between the first pixel value and the last pixel value of the fixed value adjacent pixel point sequences, finding out a pixel sequence with the maximum distance, and dividing the median value of the pixel sequence as a threshold value to realize color division and generate a cluster mask map.
S2, inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
The method comprises the following specific steps:
S21, encoding the data set by an image token encoder with frozen weights, and storing the feature vectors in numpy modes;
S22, decoding by utilizing numpy vectors to obtain a plurality of instance segmentation graphs, filtering, deleting the instances of noise levels, accumulating the segmentation graphs of the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
S3, performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
The method comprises the following specific steps:
S31, searching a mask crossing area by using the cluster mask map and the pre-training mask map;
S32, when the IOU of the intersection area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and when the IOU is smaller than 0.5, shielding to generate a fusion background mask image.
S4, utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
The specific steps of the multi-scale splicing convolution block utilizing a plurality of serial components comprise:
S41, performing primary extraction by using square convolution check features, and realizing transverse mixing and longitudinal mixing extraction of feature images on space positions by using transverse and longitudinal grouping convolution;
S42, splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, performing channel dimension reduction by utilizing point convolution, and extracting features as follows: p= Pointconv (concat (GELU (Vconv (Hconv (F)))) where F represents the preliminary extracted feature, pointconv represents the point convolution, GELU is the activation function, vconv represents the vertical convolution, and Hconv represents the horizontal convolution.
The specific steps for realizing the tower extraction of the data set features comprise:
s43, carrying out batch normalization and convolution primary processing from small to large on the input features in each layer;
S44, performing jump connection on the primary processing features to accumulate the transverse and longitudinal extraction features and the primary processing features, outputting the accumulated features by utilizing LP pooling, and repeating the operations for a plurality of times to realize tower extraction of the features.
S5, distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram;
The method comprises the following specific steps:
S51, calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask graph to obtain pixel weakening values W=1-d/N, wherein N is the side length of the mask graph;
S52, multiplying the background features by the weakening value, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
S6, inputting the foreground into a classifier to perform classification operation, and performing visual visualization to generate a semantic segmentation map.
The method comprises the following specific steps:
S61, performing visual interpretation on the last layer of convolution layer of the classification operation, and recording forward and backward propagation gradient weights of the last layer of convolution by using two hook functions of register_forward_ hook, register _backward_hook;
S62, utilizing gradient weights to generate Grad-CAM, normalizing the Grad-CAM to obtain an interpretable weight map, utilizing a threshold Seg_threshold to select pixels, setting the pixels to be white when exceeding the threshold, and otherwise, obtaining a weak supervision segmentation map when the threshold is black.
As shown in fig. 9, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: forming pixel sequences from pixel points contained in the image by arranging pixels from large to small, finding out the pixel sequences with the maximum distance, realizing color segmentation and generating a cluster mask map; the image token encoder frozen by the weight encodes, saves the feature vector in numpy form, and decodes the feature vector by numpy vector to generate a pre-training mask map.
As shown in fig. 10, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: the method comprises the steps of combining a clustering algorithm, background priori generated by SAM and a feature map extracted by a backbone network to generate a weak supervision segmentation image, wherein the specific process comprises the following steps: inputting the data set into a background clustering algorithm to perform background clustering according to the pixel values to obtain a cluster mask map; inputting the data set into a SAM model which is subjected to large-scale pre-training on a natural image for reasoning to obtain a pre-training mask map; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generating a classification characteristic diagram; distinguishing the foreground from the background in the classification feature map by the background mask map; inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation map.
According to the weak supervision semantic segmentation method based on the background priori, the weak supervision semantic segmentation of the image is carried out by utilizing the background priori knowledge, the image is pre-segmented by utilizing the clustering algorithm and the SAM pre-training model, the difficulty of a weak supervision task is reduced, meanwhile, the screening of pre-segmentation areas is realized by utilizing the segmentation result fusion module, and the effectiveness and the information quality of the foreground segmentation areas are improved; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.
The foregoing disclosure is only illustrative of one or more preferred embodiments of the present application, and it is not intended to limit the scope of the claims hereof, as persons of ordinary skill in the art will understand that all or part of the processes for practicing the embodiments described herein may be practiced with equivalent variations in the claims, which are within the scope of the application.
Claims (8)
1. The weak supervision semantic segmentation method based on the background priori is characterized by comprising the following steps of:
Inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map;
Inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
distinguishing the foreground from the background in the classification feature map by the background mask map;
Inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation graph.
2. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific data set only with the image-level label is input into a background clustering algorithm to carry out background clustering according to the pixel value, and the specific steps for obtaining the clustering mask map comprise:
performing RGB image-to-gray image conversion operation on an input image, and deleting a transparency channel of the image;
The pixels are arranged from large to small, the pixels contained in the image form a pixel sequence, the distance calculation is carried out on the first pixel value and the last pixel value of the fixed value adjacent pixel sequence, the pixel sequence with the maximum distance is found out, the median value of the pixel sequence is used as a threshold value to be divided, the color division is realized, and the cluster mask map is generated.
3. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of inputting the data set into a SAM model which is pre-trained on natural images for reasoning, and obtaining a pre-training mask map comprise the following steps:
encoding the dataset through a weight frozen image token encoder, and storing the feature vector in numpy modes;
And decoding by numpy vectors to obtain a plurality of instance partition graphs, filtering, deleting the instances of the noise level, carrying out partition graph mask accumulation on the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
4. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of generating a background mask map by performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module include:
searching a mask crossing area by using the cluster mask map and the pre-training mask map;
When the IOU of the crossing area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and shielding when the IOU is smaller than 0.5, so as to generate a fusion background mask image.
5. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of the multi-scale splicing convolution block utilizing a plurality of serial components comprise:
Performing primary extraction by using square convolution to check the features, and realizing transverse mixing and longitudinal mixing extraction of the feature map on the space position by using grouping convolution of transverse and longitudinal firstly;
and splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, and performing channel dimension reduction by using point convolution.
6. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps for realizing the tower extraction of the data set features comprise:
performing batch normalization and convolution kernel from small to large convolution preliminary processing on the input features at each layer;
And jumping connection is carried out on the primary processing features, accumulation of the transverse and longitudinal extraction features and the primary processing features is realized, the accumulation features are output by utilizing LP pooling, and the operations are repeated for a plurality of times to realize tower extraction of the features.
7. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps for distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram comprise the following steps:
calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask map to obtain a pixel weakening value W=1-d/N, wherein N is the side length of the mask map;
And multiplying the background features with the weakening values, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
8. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map comprise the following steps:
Visual interpretation is carried out on the convolution layer of the last layer of the classification operation, and the forward and backward propagation gradient weights of the convolution of the last layer are recorded by using two hook functions of register_forward_ hook, register _backward_hook;
Grad-CAM is generated by using gradient weight, and normalized to obtain an interpretable weight map, pixel selection is performed by using a threshold Seg_threshold, and if the pixel selection exceeds the threshold, the pixel selection is set to be white, otherwise, the pixel selection is set to be black, and a weak supervision segmentation map is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410311121.1A CN118015282A (en) | 2024-03-19 | 2024-03-19 | Weak supervision semantic segmentation method based on background priori |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410311121.1A CN118015282A (en) | 2024-03-19 | 2024-03-19 | Weak supervision semantic segmentation method based on background priori |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118015282A true CN118015282A (en) | 2024-05-10 |
Family
ID=90952273
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410311121.1A Pending CN118015282A (en) | 2024-03-19 | 2024-03-19 | Weak supervision semantic segmentation method based on background priori |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118015282A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633632A (en) * | 2019-08-06 | 2019-12-31 | 厦门大学 | Weak supervision combined target detection and semantic segmentation method based on loop guidance |
US20230154007A1 (en) * | 2021-11-15 | 2023-05-18 | Elekta Limited | Few-shot semantic image segmentation using dynamic convolution |
CN116229465A (en) * | 2023-02-27 | 2023-06-06 | 哈尔滨工程大学 | Ship weak supervision semantic segmentation method |
-
2024
- 2024-03-19 CN CN202410311121.1A patent/CN118015282A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633632A (en) * | 2019-08-06 | 2019-12-31 | 厦门大学 | Weak supervision combined target detection and semantic segmentation method based on loop guidance |
US20230154007A1 (en) * | 2021-11-15 | 2023-05-18 | Elekta Limited | Few-shot semantic image segmentation using dynamic convolution |
CN116229465A (en) * | 2023-02-27 | 2023-06-06 | 哈尔滨工程大学 | Ship weak supervision semantic segmentation method |
Non-Patent Citations (2)
Title |
---|
ZHU, K.; XIONG, N.N.; LU, M.: "A Survey of Weakly-supervised Semantic Segmentati", 2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, 10 June 2023 (2023-06-10) * |
李晨: "基于深度学习的弱监督语义分割方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 01, 15 January 2024 (2024-01-15), pages 138 - 1315 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11315345B2 (en) | Method for dim and small object detection based on discriminant feature of video satellite data | |
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN109509192B (en) | Semantic segmentation network integrating multi-scale feature space and semantic space | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
US12020437B2 (en) | Computer-implemented method of analyzing an image to segment article of interest therein | |
US20180114071A1 (en) | Method for analysing media content | |
Lin et al. | RefineU-Net: Improved U-Net with progressive global feedbacks and residual attention guided local refinement for medical image segmentation | |
CN110909594A (en) | Video significance detection method based on depth fusion | |
Xu et al. | Fast vehicle and pedestrian detection using improved Mask R‐CNN | |
CN111008600B (en) | Lane line detection method | |
CN101453575A (en) | Video subtitle information extracting method | |
CN109886159B (en) | Face detection method under non-limited condition | |
CN111461129B (en) | Context prior-based scene segmentation method and system | |
CN105095835A (en) | Pedestrian detection method and system | |
CN114565770A (en) | Image segmentation method and system based on edge auxiliary calculation and mask attention | |
CN116630850A (en) | Twin target tracking method based on multi-attention task fusion and bounding box coding | |
CN115222750A (en) | Remote sensing image segmentation method and system based on multi-scale fusion attention | |
Liu et al. | OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction | |
Zhang et al. | Small target detection based on squared cross entropy and dense feature pyramid networks | |
CN117372853A (en) | Underwater target detection algorithm based on image enhancement and attention mechanism | |
CN118015282A (en) | Weak supervision semantic segmentation method based on background priori | |
Rashid et al. | Fast-DSAGCN: Enhancing semantic segmentation with multifaceted attention mechanisms | |
Qu et al. | Method of feature pyramid and attention enhancement network for pavement crack detection | |
Sharma | Semantic Segmentation for Urban-Scene Images | |
CN112487967A (en) | Scenic spot painting behavior identification method based on three-dimensional convolution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |