CN114943834A - Full scene semantic segmentation method based on prototype queue learning under few-label samples - Google Patents
Full scene semantic segmentation method based on prototype queue learning under few-label samples Download PDFInfo
- Publication number
- CN114943834A CN114943834A CN202210390663.3A CN202210390663A CN114943834A CN 114943834 A CN114943834 A CN 114943834A CN 202210390663 A CN202210390663 A CN 202210390663A CN 114943834 A CN114943834 A CN 114943834A
- Authority
- CN
- China
- Prior art keywords
- prototype
- foreground
- background
- queue
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 85
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 21
- 238000011176 pooling Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000002372 labelling Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- BQCADISMDOOEFD-UHFFFAOYSA-N Silver Chemical compound [Ag] BQCADISMDOOEFD-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 229910052709 silver Inorganic materials 0.000 description 1
- 239000004332 silver Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/143—Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a full scene semantic segmentation method based on prototype queue learning under few labeled samples, which comprises the steps of firstly, performing prototype queue segmentation, performing mask average pooling on a feature map by using a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the prediction probability graph by adopting an argmax function to obtain a mask label of the segmentation result, performing mask average pooling on the feature graph by utilizing the mask label, generating a foreground prototype and a background prototype at the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature graph to obtain a final segmentation result. The method reduces the dependence on model parameters, improves the generalization and realizes better segmentation effect by using less labeled samples.
Description
Technical Field
The invention belongs to the technical field of pattern recognition, and particularly relates to a full scene semantic segmentation method.
Background
Image Semantic Segmentation (Semantic Segmentation) is the pixel-level classification of images according to the Semantic class to which the pixels in a scene belong. The semantic segmentation method based on deep learning usually needs a large number of dense pixel-level labels, but the labeling of samples in an actual task is time-consuming and labor-consuming, and the labeling of samples in a specific task is difficult to obtain. Based on the above, the full scene semantic segmentation under the condition of few labeled samples aims at realizing the division of all pixels in the image according to the semantic categories under the condition of only a few labeled samples. The technology plays a key role in the application of practical high-complexity and strong dynamic scenes such as city planning, precision agriculture, forest inspection, national defense and military and the like.
With the development of deep learning, the semantic segmentation field makes many progresses, and a small sample semantic segmentation technology under the condition of few labeled samples is developed to a certain extent by combining the migration effect of meta-learning and the less sample adaptability of metric learning. However, the current small sample semantic segmentation mainly focuses on segmenting foreground objects and background, and often neglects the requirement of multi-class semantic segmentation. How to guide a test sample by fully utilizing a small number of marked samples in a metric learning mode is an important problem in a small sample semantic segmentation technology. Wang et al in the literature "Kaixin Wang, Jun Hao View, Yingtian Zou, Daquan Zhou, and Jianshi Feng. Panet: Few-shot image segmentation with protocol alignment in IEEE International Conference on Computer Vision,2019, pp.9197-9206" reverse alignment regularization of the process of prototype guided segmentation, thereby enhancing the propagation of key semantics. Wang et al in the literature "Haochen Wang, Xudong Zhang, Yutao Hu, Yandan Yang, Xianbin Cao, and Xiianong Zhen. Few-shot segmentation with a removal characterization networks. in European Conference on Computer Vision,2020, pp.730-746" established pixel-to-pixel correlations, replacing prototypes generated by mask pooling to deepen guided segmentation of test samples by sample labels.
Furthermore, the use of potentially new classes of information in the background helps to alleviate the problem of feature confusion, i.e. further enhances the efficient representation of different semantic classes. The Yang et al document "life Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, and Yang gao. minor classes for raw-shot segmentation. in IEEE International Conference on Computer Vision,2021, pp.8721-8730" introduces an additional branched network to utilize the potential new class information, and realizes more stable prototype guidance by correcting the foreground and background on this basis. In addition, the conventional small sample segmentation method has a coarse process of extracting prototypes, thereby causing loss of detail information when the masks are averaged and pooled. The loss of detail information can be reduced by performing iterative optimization on the prototype extraction process, and important and comprehensive semantic information can be retained, for example, c.zhang et al in documents "Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, and Chunhua shell.canet: Class-advertising segmentation networks with iterative refinement and objective raw-shot learning in IEEE Conference Computer Vision and Pattern Recognition,2019, pp.5217-5226" designs an iterative optimization module to optimize the segmentation process, but the above method does not directly update the prototype, so that the detail information lost by extracting the prototype is difficult to be supplemented. The loss of detail information can be further reduced by means of iterative optimization, but the optimization of the prototype extraction process is still insufficient.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a full scene semantic segmentation method based on prototype queue learning under few labeled samples, which comprises the steps of firstly carrying out prototype queue segmentation, carrying out mask average pooling on a feature map by utilizing a label image to generate a foreground prototype and a background prototype, storing the foreground prototype and the background prototype into a prototype queue, and then calculating the cosine distance of the feature map to obtain a new prediction probability map; and calculating the prediction probability graph by adopting an argmax function to obtain a mask label of the segmentation result, performing mask average pooling on the feature graph by utilizing the mask label, generating a foreground prototype and a background prototype at the second stage, storing the foreground prototype and the background prototype into a prototype queue, and calculating the cosine distance between the foreground prototype and the feature graph to obtain a final segmentation result. The method reduces the dependence on model parameters, improves the generalization and realizes better segmentation effect by using less labeled samples.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg :
Wherein, (x, y) represents the coordinate of the pixel point, 1[ ] represents the indicating function, namely the function value is 1 when the formula in the bracket is correct, otherwise, it is 0; c is a foreground category set, C is a foreground category in the image, and h and w are the length and the width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability mapConnecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: second stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a segmentation result mask label, and then carrying out binarization to uniformly label the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature graph F by using a mask label to generate a foreground prototype and a background prototype at the second stage;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the foreground prototype or the background prototype in the same category exists in the prototype queue;
step 2-4: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability mapSecond stage predictive probability mapThe final segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the label image M calculates the evaluation loss of the preliminary segmentation result on the foreground category as follows:
wherein the content of the first and second substances,for each position in the input image a probability of being predicted as foreground, c fg A foreground category label; n represents the product of h and w;
using second stage predictive probability mapsAnd calculating the evaluation loss of the second-stage segmentation result of the foreground class by the label image M as follows:
wherein the content of the first and second substances,representing the probability that each position of the input image in the second stage prediction result is predicted to be foreground;
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multi-class loss L mult The calculation is as follows:
wherein the pseudo labelCalculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability mapThe characteristic diagram F is obtained through convolution operation and up-sampling calculation;representing each position of input image in multiple classes of prediction resultsThe probability of prediction as class cl;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is wrongly predicted, and lose the background Entropy bg The addition loss constraints are as follows:
wherein λ is a background optimization weight parameter;
step 3-4: overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
Preferably, the training image and the corresponding label image pair are uniformly cropped to be a fixed size of 512 × 512 in step 1-1.
Preferably, said λ ranges between 1 and 2.
The invention has the following beneficial effects:
1. and expanding the foreground and background segmentation of the small sample to full scene multi-class semantic segmentation. The prototype queue provided by the invention can be used for updating and storing different types of prototypes and guiding multi-type segmentation. Different from the traditional method which is suitable for simple scenes, the method can realize the analysis of multi-class scenes.
2. The multi-class segmentation effect is better, and the multi-class segmentation can be realized by inputting single-class labels. The multi-class guiding branch designed by the invention adopts the preliminary multi-class segmentation result as a pseudo label to replace a single-class label guiding model to learn multi-class characteristics, thereby realizing better multi-class segmentation effect.
3. And the segmentation robustness is stronger under the condition of lacking sample labeling. The method is based on small sample learning and metric learning, and extracts image features and maps the image features to a feature metric space. The pixel-level multi-class segmentation is completed in a measuring mode, dependence on model parameters is reduced, generalization is improved, a better segmentation effect is achieved by using fewer labeled samples, and robustness is stronger in an environment where sample labeling is lack.
4. The accuracy rate and the average intersection ratio of the segmentation results are higher. The background hiding optimization module and the two-stage segmentation module further optimize the segmentation result, and can help the model architecture better analyze the scene.
5. The technology has more practical and industrial values. The method expands the small sample segmentation to more practical multi-class semantic segmentation, can meet the industrial requirements of urban planning, precision agriculture, automatic driving and the like, only needs fewer labeled samples, reduces the labeling cost, and is more suitable for practical application scenes.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a semantic segmentation result comparison graph generated by the method and the comparison method of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
The invention discloses a full scene semantic segmentation framework based on prototype queue learning under less sample labeling, which mainly solves the problems of multi-class semantic segmentation and background potential class in small sample semantic segmentation. In particular, the present invention aims to solve the following aspects:
1. the existing small sample semantic segmentation technology only segments the foreground and the background but does not analyze the background in a complex scene, and more practical multi-class small sample semantic segmentation is realized.
2. The prior art lacks of fully utilizing potential new class information contained in the background class in the training sample.
3. In the prior art, local detail information is easy to lose by extracting the average pooling of the masks adopted by the semantic category feature prototype.
A full scene semantic segmentation method based on prototype queue learning under few labeled samples comprises the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg :
Wherein, (x, y) represents the coordinates of the pixel points, C is a foreground category set, C is a foreground category in the image, and h and w are the length and width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability mapConnecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: two-stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a segmentation result mask label, and then carrying out binarization to uniformly label the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature map F by using a mask label to generate a second-stage foreground prototype and a background prototype;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the same type of foreground prototype or background prototype exists in the prototype queue;
step 2-4: respectively calculating cosine distances between foreground prototypes and background prototypes of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability mapSecond stage predictive probability mapThe second stage segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the evaluation loss of the label image M on the foreground classification calculation preliminary segmentation result is as follows:
wherein the content of the first and second substances,for each position in the input image, the probability of being predicted as foreground, c fg A foreground category label;
using second stage predictive probability mapsAnd calculating the evaluation loss of the segmentation result of the second stage on the foreground category by the label image M as follows:
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multiple class loss L mult The calculation is as follows:
wherein the pseudo labelCalculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability mapDirectly performing convolution and up-sampling calculation on the feature map F to obtain a feature map F;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is mispredicted, and lose the background Entropy bg The addition loss constraints are as follows:
wherein λ is a background optimization weight parameter;
step 3-4: overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
The specific embodiment is as follows:
1. simulation conditions
The invention is a simulation by using Pythrch on an operating system with a central processing unit of Intel (R) Xeon (R) Silver 4110CPU @2.10GHz and a memory 40G, Linux. The data used in the simulation is an open data set.
2. Emulated content
The data used in the simulation were from the UDD and Vaihingen datasets. The UDD dataset contains 141 RGB pictures taken with a drone, containing six categories, cut into 2439 image blocks of 720 × 720 pixels. The Vaihingen dataset is an aerial photograph dataset published by the ispss, having a total of 33 RGB pictures, containing six categories, cut into 426 512 × 512 image blocks. Five pictures and corresponding class labels of the five pictures are selected as small samples for model training in each class, and the rest pictures are used for testing. In order to ensure that the fairness training samples of the experiment are randomly selected for five times, the test indexes are the average values of five groups of experiment indexes.
In order to prove the effectiveness of the algorithm, the invention selects PANET, HRNet and HRNet + to compare on two data sets. Among these, PANET is a reference to "Kaixin Wang, Jun Hao Liew, Yingtian Zou, Daquan Zhou, and Jianshi Feng. Panet: few-shot image segmentation with prototypic alignment in IEEE International Conference on Computer Vision,2019, pp.9197-9206 ", which is a classic thumbnail semantic segmentation algorithm; HRNet is a classic semantic segmentation algorithm proposed in the documents "Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang.deep high-resolution representation for human position estimation. in IEEE Conference on Computer Vision and Pattern registration, 2019, pp.5693-5703", which we use to verify the effect of a fine tuning method on multi-type small sample segmentation tasks; HRNet + is a model obtained by taking HRNet as a feature extractor and adopting a small sample segmentation experiment method based on measurement to improve, and is a basic network of the invention. PQLNet is the method proposed in the present invention, OA and mlou are evaluation indexes for semantic segmentation quality of small samples, and the comparison results are shown in table 1:
TABLE 1 comparative results
As can be seen from table 1, the present invention is superior to other algorithms in OA and mlou indices on the UDD dataset and the Vaihingen dataset.
FIG. 2 is a graph of semantic segmentation results generated by the method of the present invention and the comparison algorithm. Compared with a comparison algorithm, the multi-class feature segmentation method has more accurate multi-class segmentation edges, and can prove that the multi-class feature segmentation method effectively utilizes multi-class joint information and increases the feature discrimination of different classes. In addition, the invention also achieves the effects of eliminating particles and thinning edges, thereby proving the effects of background hidden type distribution optimization and two-stage segmentation modules.
Claims (3)
1. A full scene semantic segmentation method based on prototype queue learning under few labeled samples is characterized by comprising the following steps:
step 1: dividing a prototype queue;
step 1-1: uniformly cutting the training image and the corresponding label image pair into a fixed size; establishing an empty prototype queue;
step 1-2: taking a training image as input data, and generating a feature map F through a feature extractor;
step 1-3: carrying out mask average pooling on the feature map F by using the label image M to generate a foreground prototype p c And background prototype p bg :
Wherein, (x, y) represents the pixel point coordinate, 1[ ] represents the indication function, namely the function value is 1 when the formula in the bracket is correct, otherwise it is 0; c is a foreground category set, C is a foreground category in the image, and h and w are the length and the width of the input image respectively;
step 1-4: the foreground is prototyped p c And background prototype p bg Storing the foreground categories into an original queue, wherein the number of the foreground categories in the original queue is multiple, and the number of the background categories in the original queue is only one;
step 1-5: repeating the steps 1-2 to 1-4, and traversing all the training images and the corresponding label images; when storing in the prototype queue, if the foreground prototype or the background prototype generated later has the foreground prototype or the background prototype of the same category in the prototype queue, covering the foreground prototype or the background prototype of the same category in the prototype queue;
step 1-6: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue and each pixel position in the feature map F to obtain a preliminary prediction probability mapConnecting P and F, and performing convolution calculation to obtain a new prediction probability map P final The calculation is as follows:
P final =Conv(Concat(F,P)) (3)
prediction probability map P final The result is the preliminary prediction segmentation result;
step 2: second stage segmentation constraints;
step 2-1: using argmax function to predict probability map P final Calculating to obtain a mask label of a segmentation result, and then carrying out binarization to uniformly mark the non-foreground category as a background category to obtain a mask label only containing the foreground category and the background category;
step 2-2: performing mask average pooling on the feature graph F by using a mask label to generate a foreground prototype and a background prototype at the second stage;
step 2-3: storing the foreground prototype and the background prototype in the second stage into a prototype queue, and covering the foreground prototype or the background prototype in the prototype queue if the same type of foreground prototype or background prototype exists in the prototype queue;
step 2-4: respectively calculating the cosine distance between the foreground prototype and the background prototype of different categories in the prototype queue obtained in the step 2-3 and each pixel position in the feature map F to obtain a second-stage prediction probability mapSecond stage predictive probability mapThe final segmentation result is obtained;
and step 3: training according to the overall loss function to obtain a final segmentation model;
step 3-1: evaluating the loss;
using predictive probability maps P final And the label image M calculates the evaluation loss of the preliminary segmentation result on the foreground category as follows:
wherein the content of the first and second substances,for each position in the input image a probability of being predicted as foreground, c fg A foreground category label; n represents the product of h and w;
using second stage predictive probability mapsAnd calculating the evaluation loss of the second-stage segmentation result of the foreground class by the label image M as follows:
wherein the content of the first and second substances,representing the probability that each position of the input image in the second stage prediction result is predicted to be foreground;
the evaluation loss was calculated as follows:
L eval =L seg +L t-s (6)
step 3-2: a multi-class loss;
multi-class loss L mult The calculation is as follows:
wherein the pseudo labelCalculating a preliminary prediction probability map P by adopting an argmax function; multi-class prediction probability mapThe characteristic diagram F is obtained through convolution operation and up-sampling calculation;representing the probability that each position of the input image in the multi-class prediction result is predicted to be a class cl;
step 3-3: a background hiding class loss function;
calculating constraint loss for background region of input image, and using label image M and prediction probability map P by cross entropy formula final Calculating false positive rate of background region, namely background Entropy loss Encopy bg Loss of background Entropy Encopy bg Describing the probability that a background region is not mispredicted as foreground, is calculated as follows:
in order to prevent the background area from being predicted as the foreground, increase the background Entropy value, reduce the probability that the hidden class of the background area is mispredicted, and lose the background Entropy bg The addition loss constraints are as follows:
wherein λ is a background optimization weight parameter;
step 3-4: the overall loss function:
Loss=L eval +L blr +α×L mult (10)
wherein alpha is a multi-class constraint weight parameter, and the value range is between 0 and 1.
2. The method for full scene semantic segmentation based on prototype-queue learning under few-label samples according to claim 1, wherein the training image and the corresponding label image pair are uniformly cropped to have a fixed size of 512 x 512 in step 1-1.
3. The method for full scene semantic segmentation based on prototype queue learning under few labeled samples according to claim 1, wherein the λ is in a range from 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210390663.3A CN114943834B (en) | 2022-04-14 | 2022-04-14 | Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210390663.3A CN114943834B (en) | 2022-04-14 | 2022-04-14 | Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114943834A true CN114943834A (en) | 2022-08-26 |
CN114943834B CN114943834B (en) | 2024-02-23 |
Family
ID=82907661
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210390663.3A Active CN114943834B (en) | 2022-04-14 | 2022-04-14 | Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114943834B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117422879A (en) * | 2023-12-14 | 2024-01-19 | 山东大学 | Prototype evolution small sample semantic segmentation method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112150471A (en) * | 2020-09-23 | 2020-12-29 | 创新奇智(上海)科技有限公司 | Semantic segmentation method and device based on few samples, electronic equipment and storage medium |
RU2742701C1 (en) * | 2020-06-18 | 2021-02-09 | Самсунг Электроникс Ко., Лтд. | Method for interactive segmentation of object on image and electronic computing device for realizing said object |
CN114049384A (en) * | 2021-11-09 | 2022-02-15 | 北京字节跳动网络技术有限公司 | Method and device for generating video from image and electronic equipment |
-
2022
- 2022-04-14 CN CN202210390663.3A patent/CN114943834B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2742701C1 (en) * | 2020-06-18 | 2021-02-09 | Самсунг Электроникс Ко., Лтд. | Method for interactive segmentation of object on image and electronic computing device for realizing said object |
CN112150471A (en) * | 2020-09-23 | 2020-12-29 | 创新奇智(上海)科技有限公司 | Semantic segmentation method and device based on few samples, electronic equipment and storage medium |
CN114049384A (en) * | 2021-11-09 | 2022-02-15 | 北京字节跳动网络技术有限公司 | Method and device for generating video from image and electronic equipment |
Non-Patent Citations (1)
Title |
---|
罗会兰;张云;: "结合上下文特征与CNN多层特征融合的语义分割", 中国图象图形学报, no. 12, 31 December 2019 (2019-12-31), pages 2200 - 2209 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117422879A (en) * | 2023-12-14 | 2024-01-19 | 山东大学 | Prototype evolution small sample semantic segmentation method and system |
CN117422879B (en) * | 2023-12-14 | 2024-03-08 | 山东大学 | Prototype evolution small sample semantic segmentation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN114943834B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111369581A (en) | Image processing method, device, equipment and storage medium | |
CN111738055B (en) | Multi-category text detection system and bill form detection method based on same | |
RU2697649C1 (en) | Methods and systems of document segmentation | |
CN112801047B (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN114399644A (en) | Target detection method and device based on small sample | |
CN108734200B (en) | Human target visual detection method and device based on BING (building information network) features | |
CN114067118B (en) | Processing method of aerial photogrammetry data | |
CN110852327A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN113487610B (en) | Herpes image recognition method and device, computer equipment and storage medium | |
CN110599453A (en) | Panel defect detection method and device based on image fusion and equipment terminal | |
CN111738319B (en) | Clustering result evaluation method and device based on large-scale samples | |
CN115424017B (en) | Building inner and outer contour segmentation method, device and storage medium | |
CN114332473A (en) | Object detection method, object detection device, computer equipment, storage medium and program product | |
CN111767854A (en) | SLAM loop detection method combined with scene text semantic information | |
CN114943834B (en) | Full-field Jing Yuyi segmentation method based on prototype queue learning under few labeling samples | |
CN111582057B (en) | Face verification method based on local receptive field | |
CN116704490B (en) | License plate recognition method, license plate recognition device and computer equipment | |
CN116403071B (en) | Method and device for detecting few-sample concrete defects based on feature reconstruction | |
CN110910497B (en) | Method and system for realizing augmented reality map | |
CN112016434A (en) | Lens motion identification method based on attention mechanism 3D residual error network | |
CN116721288A (en) | Helmet detection method and system based on YOLOv5 | |
Ibrahem et al. | Weakly supervised traffic sign detection in real time using single CNN architecture for multiple purposes | |
CN110889418A (en) | Gas contour identification method | |
CN113269171B (en) | Lane line detection method, electronic device and vehicle | |
CN111860289B (en) | Time sequence action detection method and device and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |