CN118015282A

CN118015282A - Weak supervision semantic segmentation method based on background priori

Info

Publication number: CN118015282A
Application number: CN202410311121.1A
Authority: CN
Inventors: 丁建睿; 张听; 丁卓; 段艺博
Original assignee: Nanjing Longyuan Information Technology Co ltd; Harbin Institute of Technology Weihai
Current assignee: Nanjing Longyuan Information Technology Co ltd; Harbin Institute of Technology Weihai
Priority date: 2024-03-19
Filing date: 2024-03-19
Publication date: 2024-05-10

Abstract

The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori, which comprises the following steps: inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map; inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of data set features and generate a classification feature map; distinguishing the foreground from the background in the classification characteristic diagram by using the background mask diagram; inputting the foreground into a classifier for classification operation and performing visual visualization; the invention solves the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor in the existing semantic segmentation technology.

Description

Weak supervision semantic segmentation method based on background priori

Technical Field

The invention relates to the technical field of data processing, in particular to a weak supervision semantic segmentation method based on background priori.

Background

Semantic segmentation is one of the classical problems of computer vision, can be widely applied to fine segmentation scenes such as road scene segmentation and remote sensing image segmentation based on vision, has higher segmentation precision and segmentation effect under the condition of pixel-level label supervision, can even perform relatively accurate semantic segmentation on all targets contained in a natural image by using a large-scale annotation data-trained segmentation large model, but has poor effect of segmenting the large model in the fields such as medical images, infrared images and remote sensing images, and has high acquisition cost of pixel-level labels in special fields, and can not realize training of a deep learning model by using a fully-supervised semantic segmentation method.

With the rapid development of image semantic segmentation technology, weak supervision semantic segmentation technology is presented at present, and the learning process of constructing a prediction model under the condition of no pixel-level label supervision is aimed at realizing semantic segmentation by only using the existing partial image-level labels, target detection frames or fuzzy labels, and compared with the supervised learning semantic annotation, the weak supervision learning semantic annotation is easier to acquire.

However, the traditional weak supervision technology has the limitations of less labeling data and poor network generalization capability, and has the problems that the segmentation labels are difficult to obtain and the weak supervision segmentation effect is poor.

Disclosure of Invention

The invention aims to provide a weak supervision semantic segmentation method based on background priori, which aims to solve the problems that segmentation labels are difficult to obtain and weak supervision segmentation effects are poor in the existing semantic segmentation technology.

In order to achieve the above object, the present invention provides a weak supervision semantic segmentation method based on background priori, comprising:

Inputting a specific data set only with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a cluster mask map;

Inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;

performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;

utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;

distinguishing the foreground from the background in the classification feature map by the background mask map;

Inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation graph.

The specific step of inputting the specific data set with the image-level label into a background clustering algorithm to perform background clustering according to the pixel value to obtain a clustering mask map comprises the following specific steps:

performing RGB image-to-gray image conversion operation on an input image, and deleting a transparency channel of the image;

The pixels are arranged from large to small, the pixels contained in the image form a pixel sequence, the distance calculation is carried out on the first pixel value and the last pixel value of the fixed value adjacent pixel sequence, the pixel sequence with the maximum distance is found out, the median value of the pixel sequence is used as a threshold value to be divided, the color division is realized, and the cluster mask map is generated.

The specific steps of inputting the data set into a SAM model which is pre-trained on a natural image for reasoning, and obtaining a pre-training mask map comprise the following steps:

encoding the dataset through a weight frozen image token encoder, and storing the feature vector in numpy modes;

And decoding by numpy vectors to obtain a plurality of instance partition graphs, filtering, deleting the instances of the noise level, carrying out partition graph mask accumulation on the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.

The specific steps of generating a background mask map by performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module include:

searching a mask crossing area by using the cluster mask map and the pre-training mask map;

When the IOU of the crossing area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and shielding when the IOU is smaller than 0.5, so as to generate a fusion background mask image.

The specific steps of utilizing a plurality of serial multi-scale spliced convolution blocks comprise:

Performing primary extraction by using square convolution to check the features, and realizing transverse mixing and longitudinal mixing extraction of the feature map on the space position by using grouping convolution of transverse and longitudinal firstly;

and splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, and performing channel dimension reduction by using point convolution.

The specific steps for realizing tower extraction of the data set features comprise:

performing batch normalization and convolution kernel from small to large convolution preliminary processing on the input features at each layer;

And jumping connection is carried out on the primary processing features, accumulation of the transverse and longitudinal extraction features and the primary processing features is realized, the accumulation features are output by utilizing LP pooling, and the operations are repeated for a plurality of times to realize tower extraction of the features.

The specific step of distinguishing the foreground from the background in the classification characteristic map by the background mask map comprises the following steps:

calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask map to obtain a pixel weakening value W=1-d/N, wherein N is the side length of the mask map;

And multiplying the background features with the weakening values, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.

The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map include:

Visual interpretation is carried out on the convolution layer of the last layer of the classification operation, and the forward and backward propagation gradient weights of the convolution of the last layer are recorded by using two hook functions of register_forward_ hook, register _backward_hook;

Grad-CAM is generated by using gradient weight, and normalized to obtain an interpretable weight map, pixel selection is performed by using a threshold Seg_threshold, and if the pixel selection exceeds the threshold, the pixel selection is set to be white, otherwise, the pixel selection is set to be black, and a weak supervision segmentation map is obtained.

Compared with the prior art, the invention has the following beneficial effects:

the method comprises the steps of performing weak supervision semantic segmentation on an image by using background priori knowledge, performing pre-segmentation on the image by using a clustering algorithm and a SAM pre-training model, reducing the difficulty of weak supervision tasks, and simultaneously realizing screening of pre-segmentation areas by using a segmentation result fusion module, thereby improving the effectiveness and the information quality of the foreground segmentation areas; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors.

FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map.

FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map.

Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module.

FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks.

FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention.

Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map.

FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.

Fig. 9 is an effect diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.

Fig. 10 is a schematic diagram of another embodiment of a weak supervision semantic segmentation method based on background priors of the present invention.

Detailed Description

Referring to fig. 1-8, fig. 1 is a flowchart of a weak supervision semantic segmentation method based on background priors according to the present invention. FIG. 2 is a flow chart of the present invention for inputting a specific data set with only image-level labels into a background clustering algorithm for background clustering based on pixel values to obtain a cluster mask map. FIG. 3 is a flow chart of the present invention for inputting the data set into a SAM model that is pre-trained on natural images for reasoning to obtain a pre-trained mask map. Fig. 4 is a flowchart of the present invention for generating a background mask map by performing IOU-based mask fusion on the cluster mask map and the pre-training mask map by the segmentation result fusion module. FIG. 5 is a flow chart of the present invention utilizing a plurality of serial multi-scale concatenated convolutional blocks. FIG. 6 is a flow chart of tower extraction of features of the dataset of the present invention. Fig. 7 is a flow chart of the present invention distinguishing foreground from background in the classification feature map. FIG. 8 is a flow chart of the invention for inputting the foreground into a classifier for classification and visual visualization to generate a semantic segmentation map.

The invention provides a weak supervision semantic segmentation method based on background priori, comprising the following steps:

s1, inputting a specific data set with only image-level labels into a background clustering algorithm to perform background clustering according to pixel values to obtain a cluster mask map;

The method comprises the following specific steps:

s11, carrying out RGB image-gray image conversion operation on an input image, and deleting a transparency channel of the image;

S12, arranging pixels from large to small, forming pixel points contained in the image into a pixel sequence, calculating the distance between the first pixel value and the last pixel value of the fixed value adjacent pixel point sequences, finding out a pixel sequence with the maximum distance, and dividing the median value of the pixel sequence as a threshold value to realize color division and generate a cluster mask map.

S2, inputting the data set into a SAM model which is pre-trained on a natural image for reasoning to obtain a pre-training mask diagram;

The method comprises the following specific steps:

S21, encoding the data set by an image token encoder with frozen weights, and storing the feature vectors in numpy modes;

S22, decoding by utilizing numpy vectors to obtain a plurality of instance segmentation graphs, filtering, deleting the instances of noise levels, accumulating the segmentation graphs of the rest instances, and when the masks overlap, using a large-area mask graph to prevent the loss of target information.

S3, performing mask fusion based on the IOU on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map;

The method comprises the following specific steps:

S31, searching a mask crossing area by using the cluster mask map and the pre-training mask map;

S32, when the IOU of the intersection area and the corresponding pre-training mask area exceeds 0.5, selecting the corresponding area on the pre-training mask image, and when the IOU is smaller than 0.5, shielding to generate a fusion background mask image.

S4, utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generate a classification characteristic diagram;

The specific steps of the multi-scale splicing convolution block utilizing a plurality of serial components comprise:

S41, performing primary extraction by using square convolution check features, and realizing transverse mixing and longitudinal mixing extraction of feature images on space positions by using transverse and longitudinal grouping convolution;

S42, splicing the activation features accumulated by the transverse convolution and the longitudinal convolution of a plurality of scales, performing channel dimension reduction by utilizing point convolution, and extracting features as follows: p= Pointconv (concat (GELU (Vconv (Hconv (F)))) where F represents the preliminary extracted feature, pointconv represents the point convolution, GELU is the activation function, vconv represents the vertical convolution, and Hconv represents the horizontal convolution.

The specific steps for realizing the tower extraction of the data set features comprise:

s43, carrying out batch normalization and convolution primary processing from small to large on the input features in each layer;

S44, performing jump connection on the primary processing features to accumulate the transverse and longitudinal extraction features and the primary processing features, outputting the accumulated features by utilizing LP pooling, and repeating the operations for a plurality of times to realize tower extraction of the features.

S5, distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram;

The method comprises the following specific steps:

S51, calculating the shortest distance d between each background point and a foreground region according to the foreground points in the background mask graph to obtain pixel weakening values W=1-d/N, wherein N is the side length of the mask graph;

S52, multiplying the background features by the weakening value, multiplying the foreground features by 1, and generating a foreground feature map with weakened background.

S6, inputting the foreground into a classifier to perform classification operation, and performing visual visualization to generate a semantic segmentation map.

The method comprises the following specific steps:

S61, performing visual interpretation on the last layer of convolution layer of the classification operation, and recording forward and backward propagation gradient weights of the last layer of convolution by using two hook functions of register_forward_ hook, register _backward_hook;

S62, utilizing gradient weights to generate Grad-CAM, normalizing the Grad-CAM to obtain an interpretable weight map, utilizing a threshold Seg_threshold to select pixels, setting the pixels to be white when exceeding the threshold, and otherwise, obtaining a weak supervision segmentation map when the threshold is black.

As shown in fig. 9, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: forming pixel sequences from pixel points contained in the image by arranging pixels from large to small, finding out the pixel sequences with the maximum distance, realizing color segmentation and generating a cluster mask map; the image token encoder frozen by the weight encodes, saves the feature vector in numpy form, and decodes the feature vector by numpy vector to generate a pre-training mask map.

As shown in fig. 10, in another embodiment of the present invention, a weak supervision semantic segmentation method based on background priori provided by the present invention further includes: the method comprises the steps of combining a clustering algorithm, background priori generated by SAM and a feature map extracted by a backbone network to generate a weak supervision segmentation image, wherein the specific process comprises the following steps: inputting the data set into a background clustering algorithm to perform background clustering according to the pixel values to obtain a cluster mask map; inputting the data set into a SAM model which is subjected to large-scale pre-training on a natural image for reasoning to obtain a pre-training mask map; performing IOU-based mask fusion on the cluster mask map and the pre-training mask map through a segmentation result fusion module to generate a background mask map; utilizing a plurality of serial multi-scale spliced convolution blocks to realize tower extraction of the data set characteristics and generating a classification characteristic diagram; distinguishing the foreground from the background in the classification feature map by the background mask map; inputting the foreground into a classifier for classification operation, and performing visual visualization to generate a semantic segmentation map.

According to the weak supervision semantic segmentation method based on the background priori, the weak supervision semantic segmentation of the image is carried out by utilizing the background priori knowledge, the image is pre-segmented by utilizing the clustering algorithm and the SAM pre-training model, the difficulty of a weak supervision task is reduced, meanwhile, the screening of pre-segmentation areas is realized by utilizing the segmentation result fusion module, and the effectiveness and the information quality of the foreground segmentation areas are improved; the method has the advantages that background priori knowledge and large model pre-training priori knowledge are fully combined, weak supervision semantic segmentation of images is realized, the problems that labeling samples of the weak supervision semantic segmentation are difficult to obtain, the weak supervision segmentation effect is poor, and the model generalization capability is poor in medical imaging, infrared imaging and remote sensing imaging are effectively solved, the segmentation effect and generalization capability of the weak supervision semantic segmentation model in an image segmentation task are effectively improved, and the dependence of the model on large-scale labeling data is reduced.

The foregoing disclosure is only illustrative of one or more preferred embodiments of the present application, and it is not intended to limit the scope of the claims hereof, as persons of ordinary skill in the art will understand that all or part of the processes for practicing the embodiments described herein may be practiced with equivalent variations in the claims, which are within the scope of the application.

Claims

1. The weak supervision semantic segmentation method based on the background priori is characterized by comprising the following steps of:

2. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

The specific data set only with the image-level label is input into a background clustering algorithm to carry out background clustering according to the pixel value, and the specific steps for obtaining the clustering mask map comprise:

3. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

The specific steps of inputting the data set into a SAM model which is pre-trained on natural images for reasoning, and obtaining a pre-training mask map comprise the following steps:

4. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

5. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

6. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

7. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

The specific steps for distinguishing the foreground from the background in the classification characteristic diagram by the background mask diagram comprise the following steps:

8. A weak supervision semantic segmentation method based on background prior as defined in claim 1,

The specific steps of inputting the foreground into a classifier for classification operation and performing visual visualization to generate a semantic segmentation map comprise the following steps: