CN114943834B

CN114943834B - Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples

Info

Publication number: CN114943834B
Application number: CN202210390663.3A
Authority: CN
Inventors: 袁媛; 王子超; 姜志宇
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2024-02-23
Anticipated expiration: 2042-04-14
Also published as: CN114943834A

Abstract

The invention discloses a full-field Jing Yuyi segmentation method based on prototype queue learning under a few labeling samples, which comprises the steps of firstly carrying out prototype queue segmentation, carrying out mask averaging pooling on a feature map by utilizing a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and then calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the predictive probability map by adopting an argmax function to obtain a segmentation result mask label, carrying out mask averaging pooling on the feature map by utilizing the mask label, generating a foreground prototype and a background prototype of the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature map to obtain a final segmentation result. The invention reduces the dependence on model parameters, improves generalization, and realizes better segmentation effect by using fewer labeling samples.

Description

Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples

Technical Field

The invention belongs to the technical field of pattern recognition, and particularly relates to a full scene semantic segmentation method.

Background

Image semantic segmentation (Semantic Segmentation) is the classification of images at the pixel level according to the semantic class to which the pixels in the scene belong. The semantic segmentation method based on deep learning often needs a large number of dense pixel-level labels, but labeling samples in an actual task is time-consuming and labor-consuming, and labeling samples in a specific task is difficult to obtain. Based on this, the full field Jing Yuyi segmentation under few labeling samples related to the invention aims to realize the division of all pixels in an image according to the belonging semantic category under the condition that only a small number of samples are labeled. The technology plays a key role in practical high-complexity and strong dynamic scene application such as urban planning, accurate agriculture, forest inspection, national defense and military, and the like.

Along with the development of deep learning, the semantic segmentation field has advanced, and the small sample semantic segmentation technology under the small labeling sample has advanced to a certain extent by combining the migration effect of meta learning and the suitability of the small sample of metric learning. However, current small sample semantic segmentation mainly focuses on segmentation of foreground objects and background, while the need for multi-category semantic segmentation is often ignored. How to fully utilize a small number of marked samples to guide a test sample in a measurement learning mode is an important problem in the small sample semantic segmentation technology. The process of prototype guided segmentation is reverse aligned regularized in the literature "Kaixin Wang, jun Hao Liew, YIngtian Zou, daquan Zhou, and Jiashi Feng. Panet: few-shot image semantic segmentation with prototype alignment.In IEEE International Conference on Computer Vision,2019, pp.9197-9206. The correlation of pixels to pixels was established in the literature "Haochen Wang, xudong Zhang, yutao Hu, yandan Yang, xian bin Cao, and xian dong zhen. Few-shot semantic segmentation with democratic attention works. In European Conference on Computer Vision,2020, pp.730-746," by Wang et al, replacing prototypes generated by mask pooling to deepen the guided segmentation of the test samples by the sample labels.

Furthermore, the use of potentially new classes of information in the background helps to alleviate the problem of feature confusion, i.e. to further enhance the effective representation of different semantic classes. Additional branching networks are introduced by Yang et al in the literature "Lihe Yang, wei Zhuo, lei Qi, YInghuan Shi, and Yang gao.mining latent classes for few-shot segment.In IEEE International Conference on Computer Vision,2021, pp.8721-8730," to take advantage of potential new classes of information, on the basis of which more stable prototype guidance is achieved by correcting for foreground and background. In addition, the conventional small sample segmentation method extracts the prototype process to be rough, thereby causing loss of detail information when mask is pooled on average. The loss of detail information can be reduced by iterative optimization of the prototype extraction process, and important and comprehensive semantic information is reserved, such as C.Zhang et al in the literature "Chi Zhang, guosheng Lin, fayao Liu, rui Yao, and Chunhua Shan. Canet: class-agnostic segmentation networks with iterative refinement and attentive few-shot learning.In IEEE Conference on Computer Vision and Pattern Recognition,2019, pp.5217-5226," an iterative optimization module is designed to optimize the segmentation process, but the method does not directly update the prototype, resulting in difficult completion of the extracted prototype lost detail information. The loss of detail information can be further reduced by means of iterative optimization, but the optimization of the prototype extraction process is still insufficient.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a full-field Jing Yuyi segmentation method based on prototype queue learning under a few labeling samples, which comprises the steps of firstly carrying out prototype queue segmentation, carrying out mask averaging pooling on a feature map by using a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and then calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the predictive probability map by adopting an argmax function to obtain a segmentation result mask label, carrying out mask averaging pooling on the feature map by utilizing the mask label, generating a foreground prototype and a background prototype of the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature map to obtain a final segmentation result. The invention reduces the dependence on model parameters, improves generalization, and realizes better segmentation effect by using fewer labeling samples.

The technical scheme adopted by the invention for solving the technical problems comprises the following steps:

step 1: prototype queue segmentation;

step 1-1: uniformly cutting the training image and the corresponding label image pair into fixed sizes; establishing an empty prototype queue;

step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;

step 1-3: mask-averaged pooling of feature map F using label image M to generate foreground prototype p _c And background prototype p _bg ：

Wherein, (x, y) represents pixel coordinates, 1[ ] represents an indication function, i.e., the function value is 1 when the formula in brackets is correct, otherwise is 0; c is the foreground class set, C is the front Jing Leibie in the figure, and h and w are the length and width of the input image, respectively;

step 1-4: prototype foreground p _c And background prototype p _bg Storing the foreground categories into a prototype queue, wherein the prototype queue has a plurality of foreground categories, and only has one background category;

step 1-5: repeating the steps 1-2 to 1-4, and traversing all training images and corresponding label images; when the foreground prototype or the background prototype is stored in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype with the same category in the prototype queue, covering the foreground prototype or the background prototype with the same category in the prototype queue;

step 1-6: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in a prototype queue and each pixel position in a feature map F to obtain a preliminary prediction probability mapConnecting P with F, and performing convolution calculation to obtain a new predictive probability map P _final The calculation is as follows:

P _final ＝Conv(Concat(F，P)) (3)

predictive probability map P _final Namely, a preliminary prediction segmentation result;

step 2: a second stage of segmentation constraint;

step 2-1: prediction probability map P using argmax function _final Calculating to obtain a segmentation result mask label, and binarizing to uniformly mark non-foreground categories as background categories to obtain a mask label only comprising the foreground categories and the background categories;

step 2-2: carrying out mask average pooling on the feature map F by using a mask label to generate a foreground prototype and a background prototype of the second stage;

step 2-3: storing the foreground prototype and the background prototype of the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the foreground prototype or the background prototype of the same class exists in the prototype queue;

step 2-4: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage predictive probability mapSecond stage predictive probability mapThe final segmentation result is obtained;

step 3: training according to the overall loss function to obtain a final segmentation model;

step 3-1: evaluating the loss;

using predictive probability map P _final And the label image M calculates the preliminary segmentation result evaluation loss for the foreground category as follows:

wherein,c, for each position in the input image, the probability of being predicted as foreground _fg Is a foreground category label; n represents the product of h and w;

predicting probability maps using a second stageAnd the label image M calculates the evaluation loss of the segmentation result of the second stage for the foreground category as follows:

wherein,representing the second stageThe probability that each position of the input image in the prediction result is predicted as a foreground;

the evaluation loss was calculated as follows:

L _eval ＝L _seg +L _t-s (6)

step 3-2: multi-category loss;

multi-class loss L _mult The calculation is as follows:

wherein the pseudo tagThe method comprises the steps of calculating a preliminary prediction probability map P by adopting an argmax function; multi-class predictive probability mapThe characteristic diagram F is obtained through convolution operation and up-sampling calculation; />Representing a probability that each position of the input image in the multi-class prediction result is predicted as a class cl;

step 3-3: background hiding class loss functions;

constraint loss is calculated for the background region of the input image, and the label image M and the predictive probability map P are utilized through a cross entropy formula _final Calculating the false positive rate of the background area, namely the background Entropy loss Entropy _bg Background Entropy loss Entropy _bg Describing the probability that the background region is not mispredicted as foreground, the following is calculated:

to prevent background regions from being predicted as foreground, increasing the background Entropy value, reducing the probability that hidden classes of the background regions are mispredicted, losing the background Entropy Entropy _bg The loss-in constraint is as follows:

wherein lambda is a background optimization weight parameter;

step 3-4: overall loss function:

Loss＝L _eval +L _blr +α×L _mult (10)

wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.

Preferably, in the step 1-1, the training image and the corresponding label image pair are uniformly cut to a fixed size of 512×512.

Preferably, the lambda value ranges between 1 and 2.

The beneficial effects of the invention are as follows:

1. and expanding the small sample foreground and background segmentation to full scene multi-category semantic segmentation. Prototype queues proposed by the present invention may be used to update and store different class prototypes and to guide multi-class segmentation. Different from the prior method which is applicable to simple scenes, the method can realize the analysis of multi-category scenes.

2. The multi-class segmentation effect is better, and multi-class segmentation can be realized by inputting single class labels. The multi-class guide branch designed by the invention adopts the preliminary multi-class segmentation result as a pseudo tag to replace a single-class tag guide model to learn multi-class characteristics, thereby realizing better multi-class segmentation effect.

3. Segmentation robustness is stronger in the absence of sample labeling. The invention is based on small sample learning and metric learning, extracts image features and maps to feature metric space. The pixel-level multi-category segmentation is completed in a measurement mode, dependence on model parameters is reduced, generalization is improved, better segmentation effect is achieved by using fewer labeling samples, and robustness is higher in environments where sample labeling is lacking.

4. The segmentation results are higher in accuracy and average cross-over ratio. The background hiding type optimization module and the two-stage segmentation module provided by the invention further optimize the segmentation result, and can help the model framework to better analyze the scene.

5. The technology has more practical and industrial value. The invention promotes the small sample segmentation to more practical multi-category semantic segmentation, can meet the industrial requirements of city planning, accurate agriculture, automatic driving and the like, only needs fewer labeling samples, reduces the labeling cost, and is more suitable for practical application scenes.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a graph of semantic segmentation results generated by the method of the present invention and the comparison method.

Detailed Description

The invention will be further described with reference to the drawings and examples.

The invention discloses a full-scene semantic segmentation framework based on prototype queue learning under a small sample marking, which mainly solves the problems of multi-category semantic segmentation and background potential categories in small sample semantic segmentation. Specifically, the present invention aims to solve the following aspects:

1. the existing small sample semantic segmentation technology only segments the foreground and the background without analyzing the background in the complex scene, so that more practical multi-category small sample semantic segmentation is realized.

2. The prior art lacks the full utilization of potential new class information contained in the background class in the training samples.

3. The prior art adopts the mask average pooling method for extracting semantic category feature prototypes, and local detail information is easy to lose.

A full-field Jing Yuyi segmentation method based on prototype queue learning under a few labeled samples comprises the following steps:

step 1: prototype queue segmentation;

step 1-3: using label imagesM carrying out mask average pooling on the feature map F to generate a foreground prototype p _c And background prototype p _bg ：

Wherein, (x, y) represents pixel coordinates, C is a foreground class set, C is the front Jing Leibie in the figure, and h and w are the length and width of the input image respectively;

P _final ＝Conv(Concat(F，P)) (3)

step 2: two-stage segmentation constraint;

step 2-1: prediction probability map P using argmax function _final Calculating to obtain a segmentation result mask label, and binarizing to obtain a non-foreground mask labelThe category is uniformly marked as a background category, and a mask label only comprising a foreground category and a background category is obtained;

step 2-4: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage predictive probability mapSecond stage predictive probability mapNamely, the segmentation result of the second stage;

step 3-1: evaluating the loss;

wherein,c, for each position in the input image, the probability of being predicted as foreground _fg Is a foreground category label;

predicting probability maps using a second stageAnd a label image M pairThe evaluation loss of the segmentation result in the second stage of foreground class calculation is as follows:

the evaluation loss was calculated as follows:

L _eval ＝L _seg +L _t-s (6)

step 3-2: multi-category loss;

multi-class loss L _mult The calculation is as follows:

wherein the pseudo tagThe method comprises the steps of calculating a preliminary prediction probability map P by adopting an argmax function; multi-class predictive probability mapThe feature diagram F is directly obtained by convolution and up-sampling calculation;

step 3-3: background hiding class loss functions;

wherein lambda is a background optimization weight parameter;

step 3-4: overall loss function:

Loss＝L _eval +L _blr +α×L _mult (10)

Specific examples:

1. simulation conditions

The invention is a simulation performed by using Pytorch on an operating system with a central processing unit of Intel (R) Xeon (R) Silver 4110CPU@2.10GHz and a memory 40G, linux. The data used in the simulation is a public data set.

2. Emulation content

The data used in the simulation are from the UDD and Vaihingen datasets. The UDD dataset contained 141 RGB pictures taken with the drone, six categories, image blocks cut out as 2439 720 x 720 pixels. The Vaihingen dataset is an aerial photograph dataset published by ISPRS, and is composed of a total of 33 RGB pictures, including six categories, cut out into 426 512 x 512 image blocks. Five pictures and corresponding class labels of the pictures are selected as small samples for model training in each class, and the rest pictures are used for testing. To ensure that the fairness training sample of the experiment is randomly selected five times, the test index selects the average value of five groups of experiment indexes.

To demonstrate the effectiveness of the algorithm, the present invention selected PANet, HRNet and HRNet+ for comparison on both datasets. Among them, PANet is literature "Kaixin Wang, jun Hao Liew, YIngtian Zou, daquan Zhou, and Jiashi Feng. Panet: few-shot image semantic segmentation with prototype alignment in IEEE International Conference on Computer Vision,2019, pp.9197-9206, "is a classical thumbnail semantic segmentation algorithm; HRNet is proposed in the documents "Ke Sun, bin Xiao, dong Liu, and JingdongWang. Deep high-resolution representation learning for human pose estimation, in IEEE Conference on Computer Vision and Pattern Recognition,2019, pp.5693-5703," and is a classical semantic segmentation algorithm, which is used for verifying the effect of a fine tuning method on multiple types of small sample segmentation tasks; the HRNet+ is a model which is obtained by taking the HRNet as a feature extractor and adopting a small sample segmentation experimental method based on measurement, and is a basic network of the invention. PQLNet is the method proposed in the invention, OA and mIoU are evaluation indexes for small sample semantic segmentation quality, and comparison results are shown in Table 1:

table 1 comparative results

It can be seen from Table 1 that the present invention is superior to other algorithms in terms of OA and mIoU metrics over the UDD dataset and Vaihingen dataset.

FIG. 2 is a graph of the semantic segmentation results of the inventive method and the generation of a contrast algorithm. Compared with a comparison algorithm, the method has more accurate multi-category segmentation edges, which can prove that the method effectively utilizes multi-category joint information and increases the distinction degree of different category characteristics. In addition, the invention also achieves the effects of eliminating particles and thinning edges, thereby proving the effects of the background hiding type distribution optimization and the two-stage segmentation module.

Claims

1. The full-field Jing Yuyi segmentation method based on prototype queue learning under the condition of few labeling samples is characterized by comprising the following steps of:

step 1: prototype queue segmentation;

P _final ＝Conv(Concat(F,P)) (3)

step 2: a second stage of segmentation constraint;

step 2-4: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage predictive probability mapSecond stage predictive probability map->The final segmentation result is obtained;

step 3-1: evaluating the loss;

wherein,representing a probability that each position of the input image in the second stage prediction result is predicted to be foreground;

the evaluation loss was calculated as follows:

L _eval ＝L _seg +L _t-s (6)

step 3-2: multi-category loss;

multi-class loss L _mult The calculation is as follows:

wherein the pseudo tagThe method comprises the steps of calculating a preliminary prediction probability map P by adopting an argmax function; multi-class predictive probability map->The characteristic diagram F is obtained through convolution operation and up-sampling calculation; />Representing a probability that each position of the input image in the multi-class prediction result is predicted as a class cl;

step 3-3: background hiding class loss functions;

wherein lambda is a background optimization weight parameter;

step 3-4: overall loss function:

Loss＝L _eval +L _blr +α×L _mult (10)

2. The full field Jing Yuyi segmentation method based on prototype queue learning with few labeling samples according to claim 1, wherein the training image and the corresponding label image pair are uniformly cut to a fixed size of 512×512 in step 1-1.

3. The full field Jing Yuyi segmentation method based on prototype queue learning with few labeling samples of claim 1, wherein the λ range is between 1 and 2.