CN114943834A - Full scene semantic segmentation method based on prototype queue learning under few-label samples - Google Patents

Full scene semantic segmentation method based on prototype queue learning under few-label samples Download PDF

Info

Publication number
CN114943834A
CN114943834A CN202210390663.3A CN202210390663A CN114943834A CN 114943834 A CN114943834 A CN 114943834A CN 202210390663 A CN202210390663 A CN 202210390663A CN 114943834 A CN114943834 A CN 114943834A
Authority
CN
China
Prior art keywords
prototype
foreground
background
queue
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210390663.3A
Other languages
Chinese (zh)
Other versions
CN114943834B (en
Inventor
袁媛
王子超
姜志宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210390663.3A priority Critical patent/CN114943834B/en
Publication of CN114943834A publication Critical patent/CN114943834A/en
Application granted granted Critical
Publication of CN114943834B publication Critical patent/CN114943834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a full scene semantic segmentation method based on prototype queue learning under few labeled samples, which comprises the steps of firstly, performing prototype queue segmentation, performing mask average pooling on a feature map by using a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the prediction probability graph by adopting an argmax function to obtain a mask label of the segmentation result, performing mask average pooling on the feature graph by utilizing the mask label, generating a foreground prototype and a background prototype at the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature graph to obtain a final segmentation result. The method reduces the dependence on model parameters, improves the generalization and realizes better segmentation effect by using less labeled samples.

Description

Full scene semantic segmentation method based on prototype queue learning under few-label samples
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a full scene semantic segmentation method.
Background
Image Semantic Segmentation (Semantic Segmentation) is the pixel-level classification of images according to the Semantic class to which the pixels in a scene belong. The semantic segmentation method based on deep learning usually needs a large number of dense pixel-level labels, but the labeling of samples in an actual task is time-consuming and labor-consuming, and the labeling of samples in a specific task is difficult to obtain. Based on the above, the full scene semantic segmentation under the condition of few labeled samples aims at realizing the division of all pixels in the image according to the semantic categories under the condition of only a few labeled samples. The technology plays a key role in the application of practical high-complexity and strong dynamic scenes such as city planning, precision agriculture, forest inspection, national defense and military and the like.
With the development of deep learning, the semantic segmentation field makes many progresses, and a small sample semantic segmentation technology under the condition of few labeled samples is developed to a certain extent by combining the migration effect of meta-learning and the less sample adaptability of metric learning. However, the current small sample semantic segmentation mainly focuses on segmenting foreground objects and background, and often neglects the requirement of multi-class semantic segmentation. How to guide a test sample by fully utilizing a small number of marked samples in a metric learning mode is an important problem in a small sample semantic segmentation technology. Wang et al in the literature "Kaixin Wang, Jun Hao View, Yingtian Zou, Daquan Zhou, and Jianshi Feng. Panet: Few-shot image segmentation with protocol alignment in IEEE International Conference on Computer Vision,2019, pp.9197-9206" reverse alignment regularization of the process of prototype guided segmentation, thereby enhancing the propagation of key semantics. Wang et al in the literature "Haochen Wang, Xudong Zhang, Yutao Hu, Yandan Yang, Xianbin Cao, and Xiianong Zhen. Few-shot segmentation with a removal characterization networks. in European Conference on Computer Vision,2020, pp.730-746" established pixel-to-pixel correlations, replacing prototypes generated by mask pooling to deepen guided segmentation of test samples by sample labels.
Furthermore, the use of potentially new classes of information in the background helps to alleviate the problem of feature confusion, i.e. further enhances the efficient representation of different semantic classes. The Yang et al document "life Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, and Yang gao. minor classes for raw-shot segmentation. in IEEE International Conference on Computer Vision,2021, pp.8721-8730" introduces an additional branched network to utilize the potential new class information, and realizes more stable prototype guidance by correcting the foreground and background on this basis. In addition, the conventional small sample segmentation method has a coarse process of extracting prototypes, thereby causing loss of detail information when the masks are averaged and pooled. The loss of detail information can be reduced by performing iterative optimization on the prototype extraction process, and important and comprehensive semantic information can be retained, for example, c.zhang et al in documents "Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua shell.canet: Class-advertising segmentation networks with iterative refinement and objective raw-shot learning in IEEE Conference Computer Vision and Pattern Recognition,2019, pp.5217-5226" designs an iterative optimization module to optimize the segmentation process, but the above method does not directly update the prototype, so that the detail information lost by extracting the prototype is difficult to be supplemented. The loss of detail information can be further reduced by means of iterative optimization, but the optimization of the prototype extraction process is still insufficient.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a full scene semantic segmentation method based on prototype queue learning under few labeled samples, which comprises the steps of firstly carrying out prototype queue segmentation, carrying out mask average pooling on a feature map by utilizing a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and then calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the prediction probability graph by adopting an argmax function to obtain a mask label of the segmentation result, performing mask average pooling on the feature graph by utilizing the mask label, generating a foreground prototype and a background prototype at the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature graph to obtain a final segmentation result. The method reduces the dependence on model parameters, improves the generalization and realizes better segmentation effect by using less labeled samples.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg
Figure BDA0003595354100000021
Figure BDA0003595354100000022
Wherein, (x, y) represents the coordinate of the pixel point, 1[ ] represents the indicating function, namely the function value is 1 when the formula in the bracket is correct, otherwise, it is 0; c is a foreground category set, C is a foreground category in the image, and h and w are the length and the width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability map
Figure BDA0003595354100000031
Connecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: second stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a segmentation result mask label, and then carrying out binarization to uniformly label the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature graph F by using a mask label to generate a foreground prototype and a background prototype at the second stage;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the foreground prototype or the background prototype in the same category exists in the prototype queue;
step 2-4: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability map
Figure BDA0003595354100000034
Second stage predictive probability map
Figure BDA0003595354100000035
The final segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the label image M calculates the evaluation loss of the preliminary segmentation result on the foreground category as follows:
Figure BDA0003595354100000032
wherein the content of the first and second substances,
Figure BDA0003595354100000033
for each position in the input image a probability of being predicted as foreground, c fg A foreground category label; n represents the product of h and w;
using second stage predictive probability maps
Figure BDA0003595354100000036
And calculating the evaluation loss of the second-stage segmentation result of the foreground class by the label image M as follows:
Figure BDA0003595354100000041
wherein the content of the first and second substances,
Figure BDA0003595354100000042
representing the probability that each position of the input image in the second stage prediction result is predicted to be foreground;
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multi-class loss L mult The calculation is as follows:
Figure BDA0003595354100000043
wherein the pseudo label
Figure BDA0003595354100000044
Calculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability map
Figure BDA0003595354100000045
The characteristic diagram F is obtained through convolution operation and up-sampling calculation;
Figure BDA0003595354100000046
representing each position of input image in multiple classes of prediction resultsThe probability of prediction as class cl;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
Figure BDA0003595354100000047
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is wrongly predicted, and lose the background Entropy bg The addition loss constraints are as follows:
Figure BDA0003595354100000048
wherein λ is a background optimization weight parameter;
step 3-4: overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
Preferably, the training image and the corresponding label image pair are uniformly cropped to be a fixed size of 512 × 512 in step 1-1.
Preferably, said λ ranges between 1 and 2.
The invention has the following beneficial effects:
1. and expanding the foreground and background segmentation of the small sample to full scene multi-class semantic segmentation. The prototype queue provided by the invention can be used for updating and storing different types of prototypes and guiding multi-type segmentation. Different from the traditional method which is suitable for simple scenes, the method can realize the analysis of multi-class scenes.
2. The multi-class segmentation effect is better, and the multi-class segmentation can be realized by inputting single-class labels. The multi-class guiding branch designed by the invention adopts the preliminary multi-class segmentation result as a pseudo label to replace a single-class label guiding model to learn multi-class characteristics, thereby realizing better multi-class segmentation effect.
3. And the segmentation robustness is stronger under the condition of lacking sample labeling. The method is based on small sample learning and metric learning, and extracts image features and maps the image features to a feature metric space. The pixel-level multi-class segmentation is completed in a measuring mode, dependence on model parameters is reduced, generalization is improved, a better segmentation effect is achieved by using fewer labeled samples, and robustness is stronger in an environment where sample labeling is lack.
4. The accuracy rate and the average intersection ratio of the segmentation results are higher. The background hiding optimization module and the two-stage segmentation module further optimize the segmentation result, and can help the model architecture better analyze the scene.
5. The technology has more practical and industrial values. The method expands the small sample segmentation to more practical multi-class semantic segmentation, can meet the industrial requirements of urban planning, precision agriculture, automatic driving and the like, only needs fewer labeled samples, reduces the labeling cost, and is more suitable for practical application scenes.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a semantic segmentation result comparison graph generated by the method and the comparison method of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention discloses a full scene semantic segmentation framework based on prototype queue learning under less sample labeling, which mainly solves the problems of multi-class semantic segmentation and background potential class in small sample semantic segmentation. In particular, the present invention aims to solve the following aspects:
1. the existing small sample semantic segmentation technology only segments the foreground and the background but does not analyze the background in a complex scene, and more practical multi-class small sample semantic segmentation is realized.
2. The prior art lacks of fully utilizing potential new class information contained in the background class in the training sample.
3. In the prior art, local detail information is easy to lose by extracting the average pooling of the masks adopted by the semantic category feature prototype.
A full scene semantic segmentation method based on prototype queue learning under few labeled samples comprises the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg
Figure BDA0003595354100000061
Figure BDA0003595354100000062
Wherein, (x, y) represents the coordinates of the pixel points, C is a foreground category set, C is a foreground category in the image, and h and w are the length and width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability map
Figure BDA0003595354100000063
Connecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: two-stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a segmentation result mask label, and then carrying out binarization to uniformly label the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature map F by using a mask label to generate a second-stage foreground prototype and a background prototype;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the same type of foreground prototype or background prototype exists in the prototype queue;
step 2-4: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability map
Figure BDA0003595354100000071
Second stage predictive probability map
Figure BDA0003595354100000072
The second stage segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the evaluation loss of the label image M on the foreground classification calculation preliminary segmentation result is as follows:
Figure BDA0003595354100000073
wherein the content of the first and second substances,
Figure BDA0003595354100000074
for each position in the input image, the probability of being predicted as foreground, c fg A foreground category label;
using second stage predictive probability maps
Figure BDA0003595354100000075
And calculating the evaluation loss of the segmentation result of the second stage on the foreground category by the label image M as follows:
Figure BDA0003595354100000076
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multiple class loss L mult The calculation is as follows:
Figure BDA0003595354100000077
wherein the pseudo label
Figure BDA0003595354100000078
Calculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability map
Figure BDA0003595354100000079
Directly performing convolution and up-sampling calculation on the feature map F to obtain a feature map F;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
Figure BDA00035953541000000710
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is mispredicted, and lose the background Entropy bg The addition loss constraints are as follows:
Figure BDA00035953541000000711
wherein λ is a background optimization weight parameter;
step 3-4: overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
The specific embodiment is as follows:
1. simulation conditions
The invention is a simulation by using Pythrch on an operating system with a central processing unit of Intel (R) Xeon (R) Silver 4110CPU @2.10GHz and a memory 40G, Linux. The data used in the simulation is an open data set.
2. Emulated content
The data used in the simulation were from the UDD and Vaihingen datasets. The UDD dataset contains 141 RGB pictures taken with a drone, containing six categories, cut into 2439 image blocks of 720 × 720 pixels. The Vaihingen dataset is an aerial photograph dataset published by the ispss, having a total of 33 RGB pictures, containing six categories, cut into 426 512 × 512 image blocks. Five pictures and corresponding class labels of the five pictures are selected as small samples for model training in each class, and the rest pictures are used for testing. In order to ensure that the fairness training samples of the experiment are randomly selected for five times, the test indexes are the average values of five groups of experiment indexes.
In order to prove the effectiveness of the algorithm, the invention selects PANET, HRNet and HRNet + to compare on two data sets. Among these, PANET is a reference to "Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jianshi Feng. Panet: few-shot image segmentation with prototypic alignment in IEEE International Conference on Computer Vision,2019, pp.9197-9206 ", which is a classic thumbnail semantic segmentation algorithm; HRNet is a classic semantic segmentation algorithm proposed in the documents "Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang.deep high-resolution representation for human position estimation. in IEEE Conference on Computer Vision and Pattern registration, 2019, pp.5693-5703", which we use to verify the effect of a fine tuning method on multi-type small sample segmentation tasks; HRNet + is a model obtained by taking HRNet as a feature extractor and adopting a small sample segmentation experiment method based on measurement to improve, and is a basic network of the invention. PQLNet is the method proposed in the present invention, OA and mlou are evaluation indexes for semantic segmentation quality of small samples, and the comparison results are shown in table 1:
TABLE 1 comparative results
Figure BDA0003595354100000081
Figure BDA0003595354100000091
As can be seen from table 1, the present invention is superior to other algorithms in OA and mlou indices on the UDD dataset and the Vaihingen dataset.
FIG. 2 is a graph of semantic segmentation results generated by the method of the present invention and the comparison algorithm. Compared with a comparison algorithm, the multi-class feature segmentation method has more accurate multi-class segmentation edges, and can prove that the multi-class feature segmentation method effectively utilizes multi-class joint information and increases the feature discrimination of different classes. In addition, the invention also achieves the effects of eliminating particles and thinning edges, thereby proving the effects of background hidden type distribution optimization and two-stage segmentation modules.

Claims (3)

1. A full scene semantic segmentation method based on prototype queue learning under few labeled samples is characterized by comprising the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg
Figure FDA0003595354090000011
Figure FDA0003595354090000012
Wherein, (x, y) represents the pixel point coordinate, 1[ ] represents the indication function, namely the function value is 1 when the formula in the bracket is correct, otherwise it is 0; c is a foreground category set, C is a foreground category in the image, and h and w are the length and the width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability map
Figure FDA0003595354090000013
Connecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: second stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a mask label of a segmentation result, and then carrying out binarization to uniformly mark the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature graph F by using a mask label to generate a foreground prototype and a background prototype at the second stage;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the same type of foreground prototype or background prototype exists in the prototype queue;
step 2-4: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability map
Figure FDA0003595354090000029
Second stage predictive probability map
Figure FDA00035953540900000210
The final segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the label image M calculates the evaluation loss of the preliminary segmentation result on the foreground category as follows:
Figure FDA0003595354090000021
wherein the content of the first and second substances,
Figure FDA0003595354090000022
for each position in the input image a probability of being predicted as foreground, c fg A foreground category label; n represents the product of h and w;
using second stage predictive probability maps
Figure FDA0003595354090000023
And calculating the evaluation loss of the second-stage segmentation result of the foreground class by the label image M as follows:
Figure FDA0003595354090000024
wherein the content of the first and second substances,
Figure FDA0003595354090000025
representing the probability that each position of the input image in the second stage prediction result is predicted to be foreground;
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multi-class loss L mult The calculation is as follows:
Figure FDA0003595354090000026
wherein the pseudo label
Figure FDA0003595354090000027
Calculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability map
Figure FDA00035953540900000211
The characteristic diagram F is obtained through convolution operation and up-sampling calculation;
Figure FDA0003595354090000028
representing the probability that each position of the input image in the multi-class prediction result is predicted to be a class cl;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
Figure FDA0003595354090000031
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is mispredicted, and lose the background Entropy bg The addition loss constraints are as follows:
Figure FDA0003595354090000032
wherein λ is a background optimization weight parameter;
step 3-4: the overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
2. The method for full scene semantic segmentation based on prototype-queue learning under few-label samples according to claim 1, wherein the training image and the corresponding label image pair are uniformly cropped to have a fixed size of 512 x 512 in step 1-1.
3. The method for full scene semantic segmentation based on prototype queue learning under few labeled samples according to claim 1, wherein the λ is in a range from 1 to 2.
CN202210390663.3A 2022-04-14 2022-04-14 Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples Active CN114943834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210390663.3A CN114943834B (en) 2022-04-14 2022-04-14 Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210390663.3A CN114943834B (en) 2022-04-14 2022-04-14 Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples

Publications (2)

Publication Number Publication Date
CN114943834A true CN114943834A (en) 2022-08-26
CN114943834B CN114943834B (en) 2024-02-23

Family

ID=82907661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210390663.3A Active CN114943834B (en) 2022-04-14 2022-04-14 Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples

Country Status (1)

Country Link
CN (1) CN114943834B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422879A (en) * 2023-12-14 2024-01-19 山东大学 Prototype evolution small sample semantic segmentation method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112150471A (en) * 2020-09-23 2020-12-29 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
RU2742701C1 (en) * 2020-06-18 2021-02-09 Самсунг Электроникс Ко., Лтд. Method for interactive segmentation of object on image and electronic computing device for realizing said object
CN114049384A (en) * 2021-11-09 2022-02-15 北京字节跳动网络技术有限公司 Method and device for generating video from image and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2742701C1 (en) * 2020-06-18 2021-02-09 Самсунг Электроникс Ко., Лтд. Method for interactive segmentation of object on image and electronic computing device for realizing said object
CN112150471A (en) * 2020-09-23 2020-12-29 创新奇智(上海)科技有限公司 Semantic segmentation method and device based on few samples, electronic equipment and storage medium
CN114049384A (en) * 2021-11-09 2022-02-15 北京字节跳动网络技术有限公司 Method and device for generating video from image and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗会兰;张云;: "结合上下文特征与CNN多层特征融合的语义分割", 中国图象图形学报, no. 12, 31 December 2019 (2019-12-31), pages 2200 - 2209 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117422879A (en) * 2023-12-14 2024-01-19 山东大学 Prototype evolution small sample semantic segmentation method and system
CN117422879B (en) * 2023-12-14 2024-03-08 山东大学 Prototype evolution small sample semantic segmentation method and system

Also Published As

Publication number Publication date
CN114943834B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
CN111369581A (en) Image processing method, device, equipment and storage medium
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
RU2697649C1 (en) Methods and systems of document segmentation
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN114399644A (en) Target detection method and device based on small sample
CN108734200B (en) Human target visual detection method and device based on BING (building information network) features
CN114067118B (en) Processing method of aerial photogrammetry data
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN110599453A (en) Panel defect detection method and device based on image fusion and equipment terminal
CN111738319B (en) Clustering result evaluation method and device based on large-scale samples
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN111767854A (en) SLAM loop detection method combined with scene text semantic information
CN114943834B (en) Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples
CN111582057B (en) Face verification method based on local receptive field
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment
CN116403071B (en) Method and device for detecting few-sample concrete defects based on feature reconstruction
CN110910497B (en) Method and system for realizing augmented reality map
CN112016434A (en) Lens motion identification method based on attention mechanism 3D residual error network
CN116721288A (en) Helmet detection method and system based on YOLOv5
Ibrahem et al. Weakly supervised traffic sign detection in real time using single CNN architecture for multiple purposes
CN110889418A (en) Gas contour identification method
CN113269171B (en) Lane line detection method, electronic device and vehicle
CN111860289B (en) Time sequence action detection method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant