CN113326886B

CN113326886B - Method and system for detecting salient object based on unsupervised learning

Info

Publication number: CN113326886B
Application number: CN202110665987.9A
Authority: CN
Inventors: 李冠彬; 吴梓溢; 颜鹏翔; 刘梦梦; 林倞
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2023-09-15
Anticipated expiration: 2041-06-16
Also published as: CN113326886A

Abstract

The invention discloses a method and a system for detecting a salient object based on unsupervised learning, wherein the method comprises the following steps: obtaining a target domain sample, wherein the label of the target domain sample is a pseudo label obtained by predicting a target domain image by using a model obtained by the previous iteration; performing uncertainty evaluation on the pseudo tag and performing uncertainty sorting; according to the sorting result, performing picture-level screening on the pseudo tags to obtain target domain samples with the uncertainty of the pseudo tags lower than a preset threshold; and carrying out pixel-level pseudo tag re-weighting processing on the target domain sample to obtain the target domain sample for the next iteration training. The saliency object detection method based on the unsupervised learning provided by the invention can obtain excellent performance on a plurality of saliency object detection data sets under the condition of not depending on the manual label, and achieves the capability which is comparable with the full-supervised saliency detection method, thereby greatly reducing the dependence of the saliency object detection method on the manual label at the pixel level.

Description

Method and system for detecting salient object based on unsupervised learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a system for detecting a salient object based on unsupervised learning.

Background

In recent years, a salient object detection technique has been directly applied to numerous business scenes such as image editing, short video creation, live broadcasting, and the like as an important image processing technique. These businesses all need to use significant object detection techniques for video compression, object detection, visual tracking, or video segmentation. Compared with the traditional salient object detection method, the salient object detection method based on the full convolution neural network is fast and popular in the field by virtue of the convenient trainable capacity and high-efficiency calculation efficiency, but the method can achieve a good segmentation effect by relying on a large number of pixel-by-pixel labeling images or videos and performing a large number of training, so that a large amount of manpower and material resources are consumed, and the labeling results are different due to experience of a labeling person, so that the accuracy of the follow-up detection results is affected.

In order to alleviate the above problems, researchers have proposed a significant object detection method based on deep unsupervised learning. The main idea of the method is to perform significance learning by using the noise-containing significance label generated by the traditional significance object recognition method, and the significance learning is mainly realized by two modes of noise modeling or pseudo-label learning. However, the noise label generated by the traditional saliency detection method is difficult to be suitable for complex scenes such as low contrast and fine object morphology, and the fine saliency characteristics cannot be learned, so that further development of the saliency object detection work is hindered.

Disclosure of Invention

The invention aims to provide a saliency object detection method and system based on unsupervised learning, which are used for solving the technical problem that the existing saliency object detection method cannot be applied to complex scenes such as low contrast, fine object forms and the like.

In order to overcome the defects in the prior art, the invention provides a saliency object detection method based on unsupervised learning, which comprises the following steps:

obtaining a target domain sample, wherein the label of the target domain sample is a pseudo label obtained by predicting a target domain image by using a model obtained by the previous iteration;

evaluating consistency of the significance prediction probability map of the target domain image under different data enhancement by using variance, evaluating uncertainty of the pseudo tag, and sequencing uncertainty according to the uncertainty score of each target domain sample;

the formula of the significance prediction probability map generated under different data enhancement is as follows:

a saliency prediction map representing a target domain image; i _t Representing a target domain image; alpha _j (. Cndot.) represents the j-th data enhancement mode; />Representing alpha _j An inverse transform operation of (-); />Representing a model operation for generating a pseudo tag of the target domain image;

the formula for evaluating the consistency of the saliency probability map using variance is:

a variance diagram representing a target domain image; e represents an average operation; />A saliency prediction map representing a target domain image; n represents the data enhancement times;

the uncertainty score for each target domain sample is obtained by the following equation:

an uncertainty score representing the target domain image; h represents the height of the target domain image; w represents the width of the target domain image; h represents a coordinate value of the target domain image in the vertical direction; w represents a coordinate value of the target domain image in the horizontal direction;representing the value of the variance diagram of the target domain image at coordinates (h, w);

according to the sorting result, performing picture-level screening on the pseudo tags to obtain target domain samples with the uncertainty of the pseudo tags lower than a preset threshold;

carrying out pixel-level pseudo tag re-weighting treatment on the target domain sample to obtain a target domain sample for the next iteration training; the pixel-level pseudo tag re-weighting weight of the target domain sample is obtained by the following formula:

a pixel-level pseudo tag re-weighting weight representing the target domain image; k represents the pixel-level pseudo tag weight decrease amplitude of the target domain image; />A variance diagram representing the target domain image.

Further, the data enhancement is a reversible data enhancement mode.

Further, before the picture-level filtering is performed on the pseudo tag according to the sorting result, the method further includes:

and deleting the pseudo labels of the salient pixel areas or the non-salient pixel areas larger than a preset range by using priori knowledge.

The invention also provides a saliency object detection system based on unsupervised learning, comprising:

the pseudo tag obtaining unit is used for obtaining a target domain sample, wherein the tag of the target domain sample is a pseudo tag obtained by predicting a target domain image by using a model obtained by the previous iteration;

the uncertainty evaluation unit is used for evaluating consistency of the significance prediction probability map of the target domain image under different data enhancement by utilizing variance, performing uncertainty evaluation on the pseudo tag and performing uncertainty sorting according to the uncertainty score of each target domain sample;

the screening unit is used for carrying out picture-level screening on the pseudo tags according to the sorting result to obtain target domain samples with the uncertainty of the pseudo tags lower than a preset threshold value;

the weighting processing unit is used for carrying out pixel-level pseudo tag re-weighting processing on the target domain sample to obtain a target domain sample for the next iteration training; the pixel-level pseudo tag re-weighting weight of the target domain sample is obtained by the following formula:

Further, the data enhancement is a reversible data enhancement mode.

Further, the screening unit is further configured to:

The invention also provides a terminal device, comprising:

one or more processors;

a memory coupled to the processor for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the unsupervised learning-based salient object detection method as recited in any one of the preceding claims.

The present invention also provides a computer-readable storage medium having stored thereon a computer program for execution by a processor to implement the unsupervised learning-based salient object detection method as described in any one of the above.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a saliency object detection method based on unsupervised learning, which comprises the following steps: obtaining a target domain sample, wherein the label of the target domain sample is a pseudo label obtained by predicting a target domain image by using a model obtained by the previous iteration; performing uncertainty evaluation on the pseudo tag and performing uncertainty sorting; according to the sorting result, performing picture-level screening on the pseudo tags to obtain target domain samples with the uncertainty of the pseudo tags lower than a preset threshold; and carrying out pixel-level pseudo tag re-weighting processing on the target domain sample to obtain the target domain sample for the next iteration training.

The invention learns by using synthesized but relatively clean labels, and completes the field adaptation work of the synthesized data set and the real data by generating the pseudo labels in the real scene so as to realize unsupervised and reliable image salient object detection. The invention can obtain excellent performance on a plurality of salient object detection data sets under the condition of not depending on the artificial label, and achieves the capability which is comparable with the full supervision salient detection method, thereby greatly reducing the dependence of the salient object detection method on the pixel-level artificial label.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for detecting salient objects based on unsupervised learning according to an embodiment of the present invention;

FIG. 2 is an overall framework diagram of a salient object detection method based on unsupervised learning provided by an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a saliency object detection system based on unsupervised learning according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the step numbers used herein are for convenience of description only and are not limiting as to the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

First aspect:

referring to fig. 1, an embodiment of the present invention provides a method for detecting a salient object based on unsupervised learning, including:

s10, acquiring a target domain sample, wherein a label of the target domain sample is a pseudo label obtained by predicting a target domain image by using a model obtained by the previous iteration;

s20, performing uncertainty evaluation on the pseudo tag and performing uncertainty sorting;

s30, performing picture-level screening on the pseudo tags according to the sorting result to obtain target domain samples with the uncertainty of the pseudo tags lower than a preset threshold;

s40, performing pixel-level pseudo tag re-weighting processing on the target domain sample to obtain a target domain sample for the next iteration training.

It should be noted that, the current full-supervision significant object detection method based on the full convolutional neural network mainly depends on a large number of images or videos marked pixel by pixel to train so as to achieve good segmentation performance. Even a skilled annotator may take several to tens of minutes to annotate a pixel-level saliency map. In order to reduce the labeling variability caused by subjective factors in the labeling process, a picture often needs to be labeled and verified by a plurality of labeling persons. Therefore, the existing method consumes a great deal of manpower and material resources, and further hinders the development of the significant object detection technology. Therefore, in the implementation, the method for detecting the significant object is mainly provided with low cost and less time consumption.

As shown in fig. 2, the unsupervised field-based adaptive saliency object detection framework provided in this embodiment aims to learn saliency detection capability from synthesized but clean labels, and it can learn saliency prediction from synthesized source domain data using the existing deep learning-based saliency detection model and unsupervised adapt it to a real target domain scene.

Specifically, in step S10, the main purpose is to preliminarily acquire a target domain sample for training, which is composed of a target domain image and a target domain label. In each round of training as shown in fig. 2, the data of the training set is composed of a part of source domain samples and a part of target domain samples, wherein the labels of the source domain samples are accurate and clean labels, and the labels of the target domain samples are pseudo labels obtained by predicting the model obtained by the previous round of iteration in the target domain image. However, since the pseudo tag of the target domain sample is initially generated by a saliency detector trained on the source domain, and there is a significant data distribution difference between the source domain and the target domain, the pseudo tag inevitably contains many erroneous pixel-level predictions. Thus, to avoid false accumulation of false labels during iterative training, it is necessary to carefully screen samples of the target domain that participate in training and adaptively assign different weights to the pixels of the selected samples. Wherein the optimization objective for each sample in each round of training is as follows:

wherein y is ^(h,w) A label representing a pixel level of the sample; p is p ^(h,w) A significance probability map representing model predictions; omega ^(h,w) A weight map representing pixel levels;representing a binary cross entropy loss between pixel points; h. w represents the height and width of the image, respectively.

This optimization target aims to make the prediction result of the model output as close as possible to the highly reliable pixels in the label. In the invention, as the labels of the source domain are clean and accurate, the weight of each pixel in the weight graph is 1, and for the target domain, after each iteration is finished, the pseudo labels of the target domain samples and the weight graph are dynamically updated through an uncertainty-aware pseudo label learning strategy. Unlike naive pseudo tag learning strategies, which would train with all pseudo tags, we propose to further select and assign different pixel weights to target domain pseudo tags by an uncertainty-aware pseudo tag learning strategy (UPL).

in step S20, the target domain pseudo tags are screened primarily by uncertainty estimation based on consistency. Firstly, uncertainty estimation based on consistency is needed to be carried out on a pseudo tag of a target domain, specifically, as shown in fig. 2, a target domain with fixed parameters is givenIs to image I of each target domain _t Inputting the significance detection model to generate pseudo tag +.>

Further, in order to evaluate the uncertainty of the pseudo tag, the following aspects are mainly considered. First, the saliency detection model is robust to and not susceptible to small noise on high confidence (i.e., low uncertainty) target samples. Second, data enhancement can be considered as a noise injection approach to the image. Thus in the present embodimentAn estimation target image I is provided _t New method of pseudo tag consistency, i.e. by evaluating target image I _t The consistency of the significance prediction probability map under a variety of different data enhancements is evaluated. Wherein, in different data enhancement modesGenerated significance prediction map +.>Can be formulated as:

here the number of the elements is the number, by only passing alpha ^-1 (.) data enhancement mode α (), with reversal, and α ^-1 (-) will be applied to each saliency prediction mapTo reverse its data enhancement effect to restore it to the pseudo tag +.>The same conditions, such as picture orientation, size, etc.

Further, the variance is used to evaluate the consistency of the pseudo tag with the predicted saliency probability map for different data enhancement modes. Sample I _t The variance diagram of (a) can be formulated as:

in this embodiment, the picture level screening is mainly performed, and since the saliency detection model has weak universal ability in the early stage of training, but withThe capacity of the iteration times is gradually increased, so that 1) only the pseudo tag with lower uncertainty is selected for training; 2) Fewer pseudo tags are selected in the early stage, but the number of pseudo tags should be slowly increased with the increase of the iteration number. As shown in fig. 2, the variogram may reflect pixel-level uncertainty of the target pseudo tag, with areas of smaller gray and larger gray (near black in fig. 2) representing high uncertainty and low uncertainty, respectively. To order and filter the uncertainty of the target domain samples, we calculate an image level sample uncertainty score U from the mean of the resulting variance map. Target image I _t The uncertainty score of (2) can be expressed as:

further, each target domain sample is ranked according to its uncertainty score, and a different proportion of low uncertainty target samples are selected in each training iteration. This ratio increases with the improvement of the saliency detection model.

In one embodiment, a priori knowledge is also introduced during the actual sample screening process to remove pseudo tags that are mostly significant pixels or non-significant pixels.

In this embodiment, it should be noted that, although the target domain pseudo tags obtained in step S30 generally reflect a low uncertainty level, there are some regions of high uncertainty inside each pseudo tag. Therefore, the processing of different treatments is performed on each pixel of the pseudo tag in the training process, and specifically, a pixel-level pseudo tag re-weighting strategy Ω based on the variance diagram Var is proposed. Wherein, the target domain pixel level weight matrix w _t ∈(0,1] ^H×W I.e. can pass throughAnd (5) performing calculation.

Wherein k is R ⁺ Representing the magnitude of the drop in weight, the strategy aims to have the higher uncertainty pixels with lower weights.

According to the method provided by the embodiment of the invention, the synthesized but relatively clean label is utilized for learning, and the field adaptation work of the synthesized data set and the real data is completed by generating the pseudo label in the real scene so as to realize unsupervised and reliable image salient object detection. Meanwhile, the embodiment of the invention can obtain excellent performance on a plurality of salient object detection data sets under the condition of not depending on the artificial label, and achieves the capability which is comparable with that of a full-supervision salient object detection method, thereby greatly reducing the dependence of the salient object detection method on the pixel-level artificial label.

Second aspect:

referring to fig. 3, an embodiment of the present invention further provides a salient object detection system based on unsupervised learning, including:

the pseudo tag obtaining unit 01 is used for obtaining a target domain sample, wherein the tag of the target domain sample is a pseudo tag obtained by predicting a target domain image by using a model obtained by the previous iteration;

an uncertainty evaluation unit 02, configured to perform uncertainty evaluation on the pseudo tag, and perform uncertainty sorting;

a screening unit 03, configured to perform picture-level screening on the pseudo tag according to the sorting result, to obtain a target domain sample with uncertainty of the pseudo tag being lower than a preset threshold;

and the weighting processing unit 04 is used for carrying out pixel-level pseudo tag re-weighting processing on the target domain sample to obtain the target domain sample for the next iteration training.

In an embodiment, the uncertainty evaluation unit 02 is further configured to:

and evaluating the consistency of the pseudo tag and the significance prediction probability map under different data enhancement by using the variance, and generating a corresponding variance map.

In one embodiment, the data enhancement is a reversible data enhancement mode.

In an embodiment, the screening unit 03 is further configured to: and deleting the pseudo labels of the salient pixel areas or the non-salient pixel areas larger than a preset range by using priori knowledge.

The system provided by the embodiment of the invention is used for executing the method according to the first aspect, wherein the method learns by using synthesized but relatively clean labels, and performs field adaptation work of the synthesized data set and the real data by generating the pseudo labels in the real scene so as to realize unsupervised and reliable image saliency object detection. Meanwhile, the embodiment of the invention can obtain excellent performance on a plurality of salient object detection data sets under the condition of not depending on the artificial label, and achieves the capability which is comparable with that of a full-supervision salient object detection method, thereby greatly reducing the dependence of the salient object detection method on the pixel-level artificial label.

Third aspect:

an embodiment of the present invention further provides a terminal device, including:

one or more processors;

a memory coupled to the processor for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the unsupervised learning-based salient object detection method as described above.

The processor is used for controlling the overall operation of the terminal device to complete all or part of the steps of the significance object detection method based on the unsupervised learning. The memory is used to store various types of data to support operation at the terminal device, which may include, for example, instructions for any application or method operating on the terminal device, as well as application-related data. The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk or optical disk.

The terminal device may be implemented by one or more application specific integrated circuits (Application Specific1ntegrated Circuit, abbreviated AS 1C), digital signal processor (Digital Signal Processor, abbreviated AS DSP), digital signal processing device (Digital Signal Processing Device, abbreviated DSPD), programmable logic device (Programmable Logic Device, abbreviated AS PLD), field programmable gate array (Field Programmable Gate Array, abbreviated FPGA), controller, microcontroller, microprocessor or other electronic component for performing the method for detecting a salient object based on unsupervised learning according to any of the above embodiments, and achieving technical effects consistent with the method AS described above.

An embodiment of the present invention also provides a computer-readable storage medium including program instructions which, when executed by a processor, implement the steps of the method for saliency object detection based on unsupervised learning as described in any one of the embodiments above. For example, the computer-readable storage medium may be the above memory including the program instructions executable by the processor of the terminal device to perform the method for detecting a salient object based on unsupervised learning according to any one of the above embodiments, and achieve technical effects consistent with the method as described above.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A method for detecting a salient object based on unsupervised learning, comprising:

2. The method for unsupervised learning based salient object detection of claim 1, wherein the data enhancement is a reversible data enhancement.

3. The method for detecting a salient object based on unsupervised learning according to claim 1, further comprising, before said picture-level filtering of said pseudo tag according to the sorting result:

4. A salient object detection system based on unsupervised learning, comprising:

an uncertainty score representing the target domain image; h represents the height of the target domain image; w (W)Representing the width of the target domain image; h represents a coordinate value of the target domain image in the vertical direction; w represents a coordinate value of the target domain image in the horizontal direction;representing the value of the variance diagram of the target domain image at coordinates (h, w);

5. The unsupervised learning based salient object detection system of claim 4, wherein the data enhancement is a reversible data enhancement.

6. The unsupervised learning based salient object detection system according to claim 4, wherein the screening unit is further configured to:

7. A terminal device, comprising:

one or more processors;

a memory coupled to the processor for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the unsupervised learning-based salient object detection method of any one of claims 1 to 3.

8. A computer-readable storage medium having stored thereon a computer program, wherein the computer program is executed by a processor to implement the unsupervised learning-based salient object detection method according to any one of claims 1 to 3.