CN118015282A - Weak supervision semantic segmentation method based on background priori - Google Patents

Weak supervision semantic segmentation method based on background priori Download PDF

Info

Publication number
CN118015282A
CN118015282A CN202410311121.1A CN202410311121A CN118015282A CN 118015282 A CN118015282 A CN 118015282A CN 202410311121 A CN202410311121 A CN 202410311121A CN 118015282 A CN118015282 A CN 118015282A
Authority
CN
China
Prior art keywords
background
mask
map
image
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410311121.1A
Other languages
Chinese (zh)
Inventor
丁建睿
张听
丁卓
段艺博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Longyuan Information Technology Co ltd
Harbin Institute of Technology Weihai
Original Assignee
Nanjing Longyuan Information Technology Co ltd
Harbin Institute of Technology Weihai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Longyuan Information Technology Co ltd, Harbin Institute of Technology Weihai filed Critical Nanjing Longyuan Information Technology Co ltd
Priority to CN202410311121.1A priority Critical patent/CN118015282A/en
Publication of CN118015282A publication Critical patent/CN118015282A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori, which comprises the following steps: inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map; inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of data set features and generate a classification feature map; distinguishing the foreground from the background in the classification characteristic diagram by using the background mask diagram; inputting the foreground into a classifier for classification operation and performing visual visualization; the invention solves the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor in the existing semantic segmentation technology.

Description

Weak supervision semantic segmentation method based on background priori
Technical Field
The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori.
Background
Semantic segmentation is one of the classical problems of computer vision, can be widely applied to fine segmentation scenes such as road scene segmentation and remote sensing image segmentation based on vision, has higher segmentation precision and segmentation effect under the condition of pixel-level label supervision, can even perform relatively accurate semantic segmentation on all targets contained in a natural image by using a large-scale annotation data-trained segmentation large model, but has poor effect of segmenting the large model in the fields such as medical images, infrared images and remote sensing images, and has high acquisition cost of pixel-level labels in special fields, and can not realize training of a deep learning model by using a fully-supervised semantic segmentation method.
With the rapid development of image semantic segmentation technology, weak supervision semantic segmentation technology is presented at present, and the learning process of constructing a prediction model under the condition of no pixel-level label supervision is aimed at realizing semantic segmentation by only using the existing partial image-level labels, target detection frames or fuzzy labels, and compared with the supervised learning semantic annotation, the weak supervision learning semantic annotation is easier to acquire.
However, the traditional weak supervision technology has the limitations of less labeling data and poor network generalization capability, and has the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor.
Disclosure of Invention
The invention aims to provide a weak supervision semantic segmentation method based on background priori, which aims to solve the problems that segmentation labels are difficult to obtain and weak supervision segmentation effects are poor in the existing semantic segmentation technology.
In order to achieve the above object, the present invention provides a weak supervision semantic segmentation method based on background priori, comprising:
Inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map;
Inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
distinguishing the foreground from the background in the classification feature map by the background mask map;
Inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation graph.
The specific step of inputting the specific data set with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a clustering mask map comprises the following specific steps:
performing RGB image-to-gray image conversion operation on an input image, and deleting a transparency channel of the image;
The pixels are arranged from large to small, the pixels contained in the image form a pixel sequence, the distance calculation is carried out on the first pixel value and the last pixel value of the fixed value adjacent pixel sequence, the pixel sequence with the maximum distance is found out, the median value of the pixel sequence is used as a threshold value to be divided, the color division is realized, and the cluster mask map is generated.
The specific steps of inputting the data set into a SAM model which is pre-trained on a natural image for reasoning, and obtaining a pre-training mask map comprise the following steps:
encoding the dataset through a weight frozen image token encoder, and storing the feature vector in numpy modes;
And decoding by numpy vectors to obtain a plurality of instance partition graphs, filtering, deleting the instances of the noise level, carrying out partition graph mask accumulation on the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
The specific steps of generating a background mask map by performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module include:
searching a mask crossing area by using the cluster mask map and the pre-training mask map;
When the IOU of the crossing area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and shielding when the IOU is smaller than 0.5, so as to generate a fusion background mask image.
The specific steps of utilizing a plurality of serial multi-scale spliced convolution blocks comprise:
Performing primary extraction by using square convolution to check the features, and realizing transverse mixing and longitudinal mixing extraction of the feature map on the space position by using grouping convolution of transverse and longitudinal firstly;
and splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, and performing channel dimension reduction by using point convolution.
The specific steps for realizing tower extraction of the data set features comprise:
performing batch normalization and convolution kernel from small to large convolution preliminary processing on the input features at each layer;
And jumping connection is carried out on the primary processing features, accumulation of the transverse and longitudinal extraction features and the primary processing features is realized, the accumulation features are output by utilizing LP pooling, and the operations are repeated for a plurality of times to realize tower extraction of the features.
The specific step of distinguishing the foreground from the background in the classification characteristic map by the background mask map comprises the following steps:
calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask map to obtain a pixel weakening value W=1-d/N, wherein N is the side length of the mask map;
And multiplying the background features with the weakening values, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map include:
Visual interpretation is carried out on the convolution layer of the last layer of the classification operation, and the forward and backward propagation gradient weights of the convolution of the last layer are recorded by using two hook functions of register_forward_ hook, register _backward_hook;
Grad-CAM is generated by using gradient weight, and normalized to obtain an interpretable weight map, pixel selection is performed by using a threshold Seg_threshold, and if the pixel selection exceeds the threshold, the pixel selection is set to be white, otherwise, the pixel selection is set to be black, and a weak supervision segmentation map is obtained.
Compared with the prior art, the invention has the following beneficial effects:
the method comprises the steps of performing weak supervision semantic segmentation on an image by using background priori knowledge, performing pre-segmentation on the image by using a clustering algorithm and a SAM pre-training model, reducing the difficulty of weak supervision tasks, and simultaneously realizing screening of pre-segmentation areas by using a segmentation result fusion module, thereby improving the effectiveness and the information quality of the foreground segmentation areas; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors.
FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map.
FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map.
Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module.
FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks.
FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention.
Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map.
FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.
Fig. 9 is an effect diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.
Fig. 10 is a schematic diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.
Detailed Description
Referring to fig. 1-8, fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors according to the present invention. FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map. FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map. Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module. FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks. FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention. Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map. FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.
The invention provides a weak supervision semantic segmentation method based on background priori, comprising the following steps:
s1, inputting a specific data set with only image-level labels into a background clustering algorithm to perform background clustering according to pixel values to obtain a cluster mask map;
The method comprises the following specific steps:
s11, carrying out RGB image-gray image conversion operation on an input image, and deleting a transparency channel of the image;
S12, arranging pixels from large to small, forming pixel points contained in the image into a pixel sequence, calculating the distance between the first pixel value and the last pixel value of the fixed value adjacent pixel point sequences, finding out a pixel sequence with the maximum distance, and dividing the median value of the pixel sequence as a threshold value to realize color division and generate a cluster mask map.
S2, inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
The method comprises the following specific steps:
S21, encoding the data set by an image token encoder with frozen weights, and storing the feature vectors in numpy modes;
S22, decoding by utilizing numpy vectors to obtain a plurality of instance segmentation graphs, filtering, deleting the instances of noise levels, accumulating the segmentation graphs of the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
S3, performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
The method comprises the following specific steps:
S31, searching a mask crossing area by using the cluster mask map and the pre-training mask map;
S32, when the IOU of the intersection area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and when the IOU is smaller than 0.5, shielding to generate a fusion background mask image.
S4, utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
The specific steps of the multi-scale splicing convolution block utilizing a plurality of serial components comprise:
S41, performing primary extraction by using square convolution check features, and realizing transverse mixing and longitudinal mixing extraction of feature images on space positions by using transverse and longitudinal grouping convolution;
S42, splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, performing channel dimension reduction by utilizing point convolution, and extracting features as follows: p= Pointconv (concat (GELU (Vconv (Hconv (F)))) where F represents the preliminary extracted feature, pointconv represents the point convolution, GELU is the activation function, vconv represents the vertical convolution, and Hconv represents the horizontal convolution.
The specific steps for realizing the tower extraction of the data set features comprise:
s43, carrying out batch normalization and convolution primary processing from small to large on the input features in each layer;
S44, performing jump connection on the primary processing features to accumulate the transverse and longitudinal extraction features and the primary processing features, outputting the accumulated features by utilizing LP pooling, and repeating the operations for a plurality of times to realize tower extraction of the features.
S5, distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram;
The method comprises the following specific steps:
S51, calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask graph to obtain pixel weakening values W=1-d/N, wherein N is the side length of the mask graph;
S52, multiplying the background features by the weakening value, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
S6, inputting the foreground into a classifier to perform classification operation, and performing visual visualization to generate a semantic segmentation map.
The method comprises the following specific steps:
S61, performing visual interpretation on the last layer of convolution layer of the classification operation, and recording forward and backward propagation gradient weights of the last layer of convolution by using two hook functions of register_forward_ hook, register _backward_hook;
S62, utilizing gradient weights to generate Grad-CAM, normalizing the Grad-CAM to obtain an interpretable weight map, utilizing a threshold Seg_threshold to select pixels, setting the pixels to be white when exceeding the threshold, and otherwise, obtaining a weak supervision segmentation map when the threshold is black.
As shown in fig. 9, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: forming pixel sequences from pixel points contained in the image by arranging pixels from large to small, finding out the pixel sequences with the maximum distance, realizing color segmentation and generating a cluster mask map; the image token encoder frozen by the weight encodes, saves the feature vector in numpy form, and decodes the feature vector by numpy vector to generate a pre-training mask map.
As shown in fig. 10, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: the method comprises the steps of combining a clustering algorithm, background priori generated by SAM and a feature map extracted by a backbone network to generate a weak supervision segmentation image, wherein the specific process comprises the following steps: inputting the data set into a background clustering algorithm to perform background clustering according to the pixel values to obtain a cluster mask map; inputting the data set into a SAM model which is subjected to large-scale pre-training on a natural image for reasoning to obtain a pre-training mask map; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generating a classification characteristic diagram; distinguishing the foreground from the background in the classification feature map by the background mask map; inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation map.
According to the weak supervision semantic segmentation method based on the background priori, the weak supervision semantic segmentation of the image is carried out by utilizing the background priori knowledge, the image is pre-segmented by utilizing the clustering algorithm and the SAM pre-training model, the difficulty of a weak supervision task is reduced, meanwhile, the screening of pre-segmentation areas is realized by utilizing the segmentation result fusion module, and the effectiveness and the information quality of the foreground segmentation areas are improved; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.
The foregoing disclosure is only illustrative of one or more preferred embodiments of the present application, and it is not intended to limit the scope of the claims hereof, as persons of ordinary skill in the art will understand that all or part of the processes for practicing the embodiments described herein may be practiced with equivalent variations in the claims, which are within the scope of the application.

Claims (8)

1. The weak supervision semantic segmentation method based on the background priori is characterized by comprising the following steps of:
Inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map;
Inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;
performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;
utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;
distinguishing the foreground from the background in the classification feature map by the background mask map;
Inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation graph.
2. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific data set only with the image-level label is input into a background clustering algorithm to carry out background clustering according to the pixel value, and the specific steps for obtaining the clustering mask map comprise:
performing RGB image-to-gray image conversion operation on an input image, and deleting a transparency channel of the image;
The pixels are arranged from large to small, the pixels contained in the image form a pixel sequence, the distance calculation is carried out on the first pixel value and the last pixel value of the fixed value adjacent pixel sequence, the pixel sequence with the maximum distance is found out, the median value of the pixel sequence is used as a threshold value to be divided, the color division is realized, and the cluster mask map is generated.
3. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of inputting the data set into a SAM model which is pre-trained on natural images for reasoning, and obtaining a pre-training mask map comprise the following steps:
encoding the dataset through a weight frozen image token encoder, and storing the feature vector in numpy modes;
And decoding by numpy vectors to obtain a plurality of instance partition graphs, filtering, deleting the instances of the noise level, carrying out partition graph mask accumulation on the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.
4. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of generating a background mask map by performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module include:
searching a mask crossing area by using the cluster mask map and the pre-training mask map;
When the IOU of the crossing area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and shielding when the IOU is smaller than 0.5, so as to generate a fusion background mask image.
5. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of the multi-scale splicing convolution block utilizing a plurality of serial components comprise:
Performing primary extraction by using square convolution to check the features, and realizing transverse mixing and longitudinal mixing extraction of the feature map on the space position by using grouping convolution of transverse and longitudinal firstly;
and splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, and performing channel dimension reduction by using point convolution.
6. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps for realizing the tower extraction of the data set features comprise:
performing batch normalization and convolution kernel from small to large convolution preliminary processing on the input features at each layer;
And jumping connection is carried out on the primary processing features, accumulation of the transverse and longitudinal extraction features and the primary processing features is realized, the accumulation features are output by utilizing LP pooling, and the operations are repeated for a plurality of times to realize tower extraction of the features.
7. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps for distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram comprise the following steps:
calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask map to obtain a pixel weakening value W=1-d/N, wherein N is the side length of the mask map;
And multiplying the background features with the weakening values, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.
8. A weak supervision semantic segmentation method based on background prior as defined in claim 1,
The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map comprise the following steps:
Visual interpretation is carried out on the convolution layer of the last layer of the classification operation, and the forward and backward propagation gradient weights of the convolution of the last layer are recorded by using two hook functions of register_forward_ hook, register _backward_hook;
Grad-CAM is generated by using gradient weight, and normalized to obtain an interpretable weight map, pixel selection is performed by using a threshold Seg_threshold, and if the pixel selection exceeds the threshold, the pixel selection is set to be white, otherwise, the pixel selection is set to be black, and a weak supervision segmentation map is obtained.
CN202410311121.1A 2024-03-19 2024-03-19 Weak supervision semantic segmentation method based on background priori Pending CN118015282A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410311121.1A CN118015282A (en) 2024-03-19 2024-03-19 Weak supervision semantic segmentation method based on background priori

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410311121.1A CN118015282A (en) 2024-03-19 2024-03-19 Weak supervision semantic segmentation method based on background priori

Publications (1)

Publication Number Publication Date
CN118015282A true CN118015282A (en) 2024-05-10

Family

ID=90952273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410311121.1A Pending CN118015282A (en) 2024-03-19 2024-03-19 Weak supervision semantic segmentation method based on background priori

Country Status (1)

Country Link
CN (1) CN118015282A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
US20230154007A1 (en) * 2021-11-15 2023-05-18 Elekta Limited Few-shot semantic image segmentation using dynamic convolution
CN116229465A (en) * 2023-02-27 2023-06-06 哈尔滨工程大学 Ship weak supervision semantic segmentation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110633632A (en) * 2019-08-06 2019-12-31 厦门大学 Weak supervision combined target detection and semantic segmentation method based on loop guidance
US20230154007A1 (en) * 2021-11-15 2023-05-18 Elekta Limited Few-shot semantic image segmentation using dynamic convolution
CN116229465A (en) * 2023-02-27 2023-06-06 哈尔滨工程大学 Ship weak supervision semantic segmentation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHU, K.; XIONG, N.N.; LU, M.: "A Survey of Weakly-supervised Semantic Segmentati", 2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, 10 June 2023 (2023-06-10) *
李晨: "基于深度学习的弱监督语义分割方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 01, 15 January 2024 (2024-01-15), pages 138 - 1315 *

Similar Documents

Publication Publication Date Title
US11315345B2 (en) Method for dim and small object detection based on discriminant feature of video satellite data
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN109509192B (en) Semantic segmentation network integrating multi-scale feature space and semantic space
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
US12020437B2 (en) Computer-implemented method of analyzing an image to segment article of interest therein
US20180114071A1 (en) Method for analysing media content
Lin et al. RefineU-Net: Improved U-Net with progressive global feedbacks and residual attention guided local refinement for medical image segmentation
CN110909594A (en) Video significance detection method based on depth fusion
Xu et al. Fast vehicle and pedestrian detection using improved Mask R‐CNN
CN111008600B (en) Lane line detection method
CN101453575A (en) Video subtitle information extracting method
CN109886159B (en) Face detection method under non-limited condition
CN111461129B (en) Context prior-based scene segmentation method and system
CN105095835A (en) Pedestrian detection method and system
CN114565770A (en) Image segmentation method and system based on edge auxiliary calculation and mask attention
CN116630850A (en) Twin target tracking method based on multi-attention task fusion and bounding box coding
CN115222750A (en) Remote sensing image segmentation method and system based on multi-scale fusion attention
Liu et al. OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction
Zhang et al. Small target detection based on squared cross entropy and dense feature pyramid networks
CN117372853A (en) Underwater target detection algorithm based on image enhancement and attention mechanism
CN118015282A (en) Weak supervision semantic segmentation method based on background priori
Rashid et al. Fast-DSAGCN: Enhancing semantic segmentation with multifaceted attention mechanisms
Qu et al. Method of feature pyramid and attention enhancement network for pavement crack detection
Sharma Semantic Segmentation for Urban-Scene Images
CN112487967A (en) Scenic spot painting behavior identification method based on three-dimensional convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination