CN110443257B

CN110443257B - Significance detection method based on active learning

Info

Publication number: CN110443257B
Application number: CN201910609780.2A
Authority: CN
Inventors: 张立和; 闵一璠
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2022-04-12
Anticipated expiration: 2039-07-08
Also published as: CN110443257A

Abstract

The invention belongs to the technical field of artificial intelligence, and provides a significance detection method based on active learning. And then, in order to optimize the target boundary of the saliency map, a super-pixel-level post-processing method is designed to further improve the performance. The invention reduces the marking cost and simultaneously reduces the redundancy of the training set, thereby greatly improving the experimental effect compared with the original KSR model. Meanwhile, comparative experiments show that the performance of the method is superior to that of many classical algorithms.

Description

Significance detection method based on active learning

Technical Field

The invention belongs to the technical field of artificial intelligence, relates to computer vision, and particularly relates to an image saliency detection method.

Background

The economic and technological levels of today's society are rapidly developing and a variety of different fragmented information is received by humans all at once, and image and video information are the most important of these information. How to process image data quickly and effectively becomes a difficult problem to be solved. Usually, one only focuses on the areas of the image that are most attractive to human eyes, i.e. foreground areas or salient objects, while ignoring background areas. Therefore, one uses a computer to simulate the human visual system for saliency detection. At present, the research on significance can be widely applied to various fields of computer vision, including image retrieval, image classification, target recognition, image segmentation and the like.

The target of the saliency detection is to accurately detect the saliency target from the image. The significance detection algorithm based on supervised learning generally has a problem that a large amount of manual marking data is generally needed in the model training process, a large amount of resources are needed for marking a significant region, and redundant information exists in a plurality of training samples, and the redundant information adversely affects the model precision.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method makes up the defects of the existing method, provides an image significance detection method based on active learning, and achieves the purpose of obtaining higher model precision by using fewer training samples.

The technical scheme of the invention is as follows:

a significance detection method based on active learning comprises the following steps:

(1) firstly, randomly selecting 500 images from an MSRA database, adding the images into a training set L as an initial training set, respectively generating region candidate segmentation (Propusals) of all the images, and extracting the CNN characteristics of the regions of all the region candidate segmentation;

(2) defining positive and negative samples of the area candidate segmentation, and designing a confidence score (confidence score) to measure the scoring of the samples relative to the foreground and the background of a true value graph, wherein the confidence score is as follows:

the score calculates in advance two scores A and C, A being the accuracy score and C being the coverage score, where

While

Wherein O is_iRepresenting the target candidate segmentation of the ith sample, G representing the true value map of the image; where ξ is the weight used to balance the accuracy score and the coverage score; in the method, a sample with a confidence value higher than 0.9 is set as a positive sample, and a sample with a confidence value less than 0.6 is set as a negative sample; because the number of positive samples is found to be much smaller than the number of negative samples when calculating the confidence valueThe number of the samples is that all the positive samples are used, and the negative samples with the same number as the positive samples are randomly selected; for training a sequencing support vector machine, forming positive and negative sample pairs by all positive and negative samples, and defining a positive sample minus negative sample as a positive sample pair, otherwise, a negative sample pair;

using the formula:

performing sequencing support vector machine and subspace learning combined training to obtain a sequencer KSR, wherein the sequencer performs significance sequencing on the region candidate segmentation of the sample, and the similarity between the front-ranked region candidate segmentation and the foreground is high; wherein w is the ordering coefficient of the ordering support vector machine; in the formula

Is a logic loss function, a is a loss function parameter, e is an exponential function; phi (x)_i) Characteristic x representing a sample_iFeatures after mapping by kernel; p is the sample pair x_inAnd x_jnThe number of constraints of (2); (in, jn) indicates that the sample pair is a subscript to the nth pair constraint; y is_nE { +1, -1} indicates whether the sample belongs to the same class or different classes, or whether the sample belongs to the foreground or the background at the same time; l is belonged to R^l×d(l < d) is the learned mapping matrix, l is the initial feature dimension, d is the mapped feature dimension, and μ and λ represent regularization parameters;

selecting training samples by active learning; firstly, the model generated by the initialization is used for carrying out significance ordering on target candidate segmentation of all samples in an unmarked sample pool, and the segmentation is carried out according to s_i＝w^TPk_iObtaining a ranking score, and introducing L ═ P phi to simplify the calculation of joint training^T(X) wherein P ∈ R^l×NN is the number of samples, phi^T(X) is a kernel operation, and a kernel function is introduced in the simplification process; using formulas

All ranking scores s_iThe normalization is carried out, and the normalization is carried out,

normalized score, s, representing rank score_minRepresenting the smallest ranking score, s, of the set of ranking scores for the image_maxRepresenting the largest ranking score in the set of ranking scores for the image; finding images with normalized ranking scores between 0.4 and 0.9 for target candidate segmentation of all images

X_pRepresenting the selected target candidate segmentation component set, and X represents all target candidate segmentation component sets of the image; by the formula

Calculating the ratio beta of the number of the target candidate segmentation scores of the image between 0.4 and 0.9 to all the target candidate segmentations, wherein card (X) represents the number of the target candidate segmentations of the image, and card (X)_P) Representative set X_pThe target candidate segmentation number of (1); taking the score as an uncertain value of each image; in this way, a set of indeterminate values for the unlabeled samples in all pools is obtained, and its beta ═ beta is selected₁,β₂,…,β_nManually marking samples with medium and high uncertainty, adding a training set, and performing formula

Each selection is made where mu₀To not determine the mean of the set of values B, δ is the standard deviation of the set, λ₀Is a weight parameter, selecting λ₀1.145; design each selection uncertainty β is greater than μ₀+λ₀Sample composition set Q of δ_uc(ii) a For image Q_ucApplying a density clustering algorithm to obtain an optimal parameter epsilon of 0.05, setting the size of MinPts to be 2, and classifying samples into one class when the number of samples in the circle center neighborhood is 2 or more; after clustering, a high density of sample clusters C ═ { C ═ C is obtained₁,c₂,…c_nAnd a cluster of only 1 isolated sample O ═ O₁,o₂,…o_m}, final image set Q_ucIs divided into: q_uc＝{c_i,i＝1,2,...n}∪{o_iI ═ 1,2,. m }; by the formula

From each high-density cluster c_tTo select the sample U with the greatest uncertainty_tIn addition, all isolated samples are selected and added into the candidate set Q, and the sample points can increase the generalization capability of the training model; the final candidate set is Q ═ U_t,t＝1,...n}∪{O_iI ═ 1.. m }; the sample set Q represents a sample selected by a selection model which considers uncertainty and diversity design at the same time, and is added into the training set L after being manually marked;

(3) the sample set Q selected by the work is marked manually, then the marked sample set Q is added into a training set L, a sequencer KSR is trained again by using the updated training set L, then the performance of the model trained at this time is verified on a verification set, the step (2) is repeated continuously until the performance of the model is changed slightly or the performance is reduced, the training set selected by the last iteration is selected as a final training set, the trained model is used as a final training model, the model is used for carrying out significance sorting on the region candidate segmentation of each test image, the region candidate segmentation of 16 bits before ranking is selected for carrying out weighted fusion, and a significance map M of the image is obtained_p；

(4) The saliency map M obtained from step (3)_pThe edge detail processing of the target is still insufficient, so the invention provides a processing method on a super-pixel level, and the purpose of optimizing the boundary is achieved. Firstly, the super-pixel segmentation algorithm of SLIC is utilized to set the number of segmented super-pixel blocks to be 100, 150 and 200 respectively, and the segmented super-pixel blocks are used for forming a super-pixel set SP of an image i_iSeparately extracting CNN feature x of each superpixel block_j(ii) a The saliency map M obtained from step (3) is used_pBinarization as a prior saliency map E_i(ii) a Determining positive and negative samples of the superpixel, and completely locating the superpixel in the prior saliency map E in order to make the confidence coefficient highest_iSuperpixels of foreground region form positive sample set PO_iCompletely locate the super pixel in the firstSignificance test chart E_iThe super pixels of the background region of (2) constitute a negative sample set N_i(ii) a Forming positive and negative sample pairs from the positive and negative sample sets, and using formula

To train a model KSR for image i_i(ii) a For this model KSR_iUsing the formula s_i＝w^TPk_iAll superpixels of the image are scored, and all superpixels are sorted into S ═ S₁,s₂,…s_nThe higher the score, the closer to the foreground, whereas the lower the score, the closer to the background. Using formulas

Obtaining the score of each pixel in the superpixel, normalizing all the scores to be between 0 and 1, and finally obtaining the saliency map M synthesized at the superpixel level through weighting and fusion_s. The final saliency map is given by the formula M ═ w₁×M_p+w₂×M_SObtaining, wherein M is the final saliency map, M_pIs an original significant picture, M_sIs a super-pixel level saliency map.

The invention has the beneficial effects that: the significance detection algorithm based on active learning provided by the invention applies the idea of active learning to the significance detection field, selects the sample which is most beneficial to model training from the unlabeled sample set by considering the uncertainty and diversity of the sample, adds the sample into the training set, trains to obtain the final KSR model, and outputs the initial significance map of the test sample by the model. And then, in order to optimize the target boundary of the saliency map, a super-pixel-level post-processing method is designed to further improve the performance. The invention reduces the marking cost and simultaneously reduces the redundancy of the training set, thereby greatly improving the experimental effect compared with the original KSR model. Meanwhile, comparative experiments show that the performance of the method is superior to that of many classical algorithms.

Drawings

FIG. 1 is a basic flow diagram of the present invention.

Fig. 2 is an initial saliency map resulting from the application of active learning to the KSR model.

Fig. 3 is a final saliency map resulting from applying super-pixel level post-processing fusion to an initial saliency map.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

The conception of the invention is as follows: since the supervised learning training process usually requires a large amount of manually labeled data, labeling salient regions requires a large amount of resources, and redundant information exists in many training samples, which adversely affects the model accuracy. The active learning utilizes a selection mechanism to select samples with larger information content for training, and the purpose of obtaining higher model precision by using fewer training samples is achieved. Based on this, the invention combines the idea of Active Learning (AL) with the Kernel Subspace Ranking (KSR) algorithm. The invention designs a pool-based active learning strategy, namely, samples with larger information amount are selected to participate in training by considering the uncertainty and diversity of unmarked samples, so that the aims of reducing the number of training samples and reducing the marking cost are fulfilled.

The method extracts the CNN characteristics of the convolutional neural network of the target-level region candidate segmentation (propusals), utilizes subspace mapping and a sequencing support vector machine to jointly learn a sequencer, performs significance sequencing on the region candidate segmentation of a test image by the sequencer, and performs weighted fusion on the region candidate segmentation in the front of the sequencing to obtain a significance map. Finally, in order to optimize the target boundary of the saliency map, the invention designs a super-pixel-level post-processing method to further improve the performance.

The invention is implemented as follows:

(1) firstly, 500 images are randomly selected from an MSRA database and added into a training set L as an initial training set, region candidate segmentation (propusals) of all the images is respectively generated, and the CNN characteristics of the regions of all the region candidate segmentation are extracted.

(2) Defining positive and negative samples of the region candidate segmentation, and designing a confidence score (confidence score) to measure the sample phase by the algorithmFor the scoring of the truth map foreground and background, the confidence values are:

While

Wherein O is_iRepresenting the target candidate segmentation of the ith sample, G representing the true value map of the image; where ξ is the weight used to balance the accuracy score and the coverage score; in the present algorithm, samples with confidence values higher than 0.9 are set as positive samples, and samples with confidence values less than 0.6 are determined as negative samples; because the number of the positive samples is far less than that of the negative samples when the confidence value is calculated, all the positive samples are used, and the negative samples with the same number as the positive samples are randomly selected; for training a sequencing support vector machine, forming positive and negative sample pairs by all positive and negative samples, and defining a positive sample minus negative sample as a positive sample pair, otherwise, a negative sample pair;

using the formula:

performing combined training of a sequencing support vector machine and subspace learning, and performing training to obtain a sequencer KSR, wherein the sequencer performs significance sequencing on the region candidate segmentation of the sample, and the similarity between the front ranking and the foreground is high; wherein w is the ordering coefficient of the ordering support vector machine; in the formula

Is a logic loss function, a is a loss function parameter, e is an exponential function; phi (x)_i) Characteristic x representing a sample_iFeatures after mapping by kernel; p is the sample pair x_inAnd x_jnThe number of constraints of (2); (in, jn) indicates that the sample pair is a subscript to the nth pair constraint; y is_nE { +1, -1} indicates that the sample belongs to the same classOr whether they are not of the same class, or both, foreground or background; l is belonged to R^l×d(l < d) is the learned mapping matrix, l is the initial feature dimension, d is the mapped feature dimension, and μ and λ represent the regularization parameters.

Selecting training samples by active learning; firstly, the model generated by the initialization is used for carrying out significance ordering on target candidate segmentation of all samples in an unmarked sample pool, and the segmentation is carried out according to s_i＝w^TPk_iObtaining ranking scores, and introducing for simplifying calculation of joint training ^TL＝Pφ(X)Wherein N is the number of samples, ^Tφ(X)the method is a kernel operation, and a kernel function is introduced in the method in the simplification process. Using formulas

X_pRepresents the selected set of target candidate segmentation components, and X represents the set of all target candidate segmentation components of the image. By the formula

Calculating the ratio beta of the number of the target candidate segmentation scores of the image between 0.4 and 0.9 to all the target candidate segmentations, wherein card (X) represents the number of the target candidate segmentations of the image, and card (X)_P) Representative set X_pThe target candidate segmentation number in (1). Taking the score as an uncertain value of each image; in this way, an uncertainty value for all unlabeled samples in the sample pool is obtainedAggregate of beta ═ beta₁,β₂,…,β_nSelecting samples with high uncertainty, manually marking the samples, adding the samples into a training set, and performing formula analysis

Each selection is made where mu₀To not determine the mean of the set of values B, δ is the standard deviation of the set, λ₀Is a weight parameter, selecting λ₀1.145; design each selection uncertainty β is greater than μ₀+λ₀Sample composition set Q of δ_uc. For image Q_ucBy applying a density clustering algorithm, the optimal parameter epsilon is obtained through experiments and is 0.05, the size of MinPts is set to be 2, and samples in the neighborhood of the center of a circle are 2 or more, so that the samples can be classified into one class. After clustering, a high density of sample clusters C ═ { C ═ C can be obtained₁,c₂,…c_nAnd some clusters O of only 1 isolated sample O ═ O₁,o₂,…o_m}, final image set Q_ucCan be divided into: q_uc＝{c_i,i＝1,2,...n}∪{o_iI 1,2,. m }. By the formula

From each high-density cluster c_tTo select the sample U with the greatest uncertainty_tBesides, all isolated samples are selected to be added into the candidate set Q, and such sample points can increase the generalization capability of the training model. The final sum candidate set is Q ═ U_t,t＝1,...n}∪{O_iI 1.. m }. The sample set Q represents samples selected by a selection model designed by considering uncertainty and diversity at the same time, and the samples are added into the training set L after being manually marked.

(3) Manually marking the sample set Q selected by the work, adding the sample set Q into a training set L, training a sequencer KSR again by using the updated training set L, verifying the performance of the model trained at this time on a verification set, continuously repeating the step (2) until the performance of the model is changed little or the performance is reduced, selecting the training set selected by the last iteration as a final training set, and training the modelTaking the model as a final training model, carrying out significance ordering on the region candidate segmentation of each test image by the model, selecting the region candidate segmentation with 16 bits at the top of the ranking for weighting and fusing to obtain a significance map M of the image_p。

(4) The saliency map M obtained from step (3)_pThe edge detail processing of the target is still insufficient, so the invention provides a processing method on a super-pixel level, and the purpose of optimizing the boundary is achieved. Firstly, the super-pixel segmentation algorithm of SLIC is utilized to set the number of segmented super-pixel blocks to be 100, 150 and 200 respectively, and the segmented super-pixel blocks are used for forming a super-pixel set SP of an image i_iSeparately extracting CNN feature x of each superpixel block_j(ii) a The saliency map M obtained from step (3) is used_pBinarization as a prior saliency map E_i(ii) a Determining positive and negative samples of the superpixel, and completely locating the superpixel in the prior saliency map E in order to make the confidence coefficient highest_iSuperpixels of foreground region form positive sample set PO_iCompletely locate the super-pixel in the prior saliency map E_iThe super pixels of the background region of (2) constitute a negative sample set N_i(ii) a Forming positive and negative sample pairs from the positive and negative sample sets, and using formula

Obtaining the score of each pixel in the superpixel, normalizing all the scores to be between 0 and 1, and finally obtaining the saliency map M synthesized at the superpixel level through weighting and fusion_s. The final saliency map is given by the formula M ═ w₁×M_p+w₂×M_SObtaining, wherein M is the final saliency map, M_pIs an original significant picture, M_sFor super pixel level saliency maps, w₁Is 1, w₂Take 0.3.

Claims

1. A significance detection method based on active learning is characterized by comprising the following steps:

(1) firstly, randomly selecting 500 images from an MSRA database, adding the images into a training set L as an initial training set, respectively generating region candidate segmentation of all the images, and extracting the CNN characteristics of the regions of the region candidate segmentation;

(2) defining positive and negative samples of the region candidate segmentation, and designing a confidence value to measure the scoring of the samples relative to the foreground and the background of the truth diagram, wherein the confidence value is as follows:

While

Wherein O is_iRepresenting the target candidate segmentation of the ith sample, G representing the true value map of the image; where ξ is the weight used to balance the accuracy score and the coverage score; in the method, a sample with a confidence value higher than 0.9 is set as a positive sample, and a sample with a confidence value less than 0.6 is set as a negative sample; because the number of the positive samples is far less than that of the negative samples when the confidence value is calculated, all the positive samples are used, and the negative samples with the same number as the positive samples are randomly selected; for training a sequencing support vector machine, forming positive and negative sample pairs by all positive and negative samples, and defining a positive sample minus negative sample as a positive sample pair, otherwise, a negative sample pair;

using the formula:

performing rank support vector machine and subspace learningPerforming combined training, wherein a sequencer KSR is obtained through training, the sequencer performs significance sequencing on the region candidate segmentation of the sample, and the similarity between the foreground and the foreground in the front ranking is high; wherein w is the ordering coefficient of the ordering support vector machine; in the formula

Is a logic loss function, a is a loss function parameter, e is an exponential function; phi (x)_i) Characteristic x representing a sample_iFeatures after mapping by kernel; p is the sample pair x_inAnd x_jnThe number of constraints of (2); (in, jn) indicates that the sample pair is a subscript to the nth pair constraint; y is_nE { +1, -1} indicates whether the samples belong to the same class or different classes; l is belonged to R^l×d(l < d) is the learned mapping matrix, l is the initial feature dimension, d is the mapped feature dimension, and μ and λ represent regularization parameters;

Calculating the ratio beta of the number of the target candidate segmentation scores of the image between 0.4 and 0.9 to all the target candidate segmentations, wherein card (X) represents the number of the target candidate segmentations of the image, and card (X)_P) Representative set X_pThe target candidate segmentation number of (1); taking the sorting score as an uncertain value of each image; in this way, a set of indeterminate values beta is obtained for the unlabeled samples in all the cuvettes₁,β₂,…,β_nSelecting beta ═ beta₁,β₂,…,β_nManually marking samples corresponding to medium-high uncertainty and then adding the samples into a training set, namely adding the samples into the training set through a formula

Each selection is made where mu₀To not determine the mean of the set of values B, δ is the standard deviation of the set B, λ₀Is a weight parameter, selecting λ₀1.145; such that each selection uncertainty β is greater than μ₀+λ₀Sample composition set Q of δ_uc(ii) a For image Q_ucApplying a density clustering algorithm to obtain a parameter epsilon of 0.05, setting the size of MinPts to be 2, and classifying samples into a class when the number of samples in the circle center neighborhood is 2 or more; after clustering, a high density of sample clusters C ═ { C ═ C is obtained₁,c₂,…c_nAnd a cluster of only 1 isolated sample O ═ O₁,o₂,…o_m}, final image set Q_ucIs divided into: q_uc＝{c_i,i＝1,2,...n}∪{o_iI ═ 1,2,. m }; by the formula

From each high-density cluster c_tTo select the sample U with the greatest uncertainty_tIn addition to this, all isolated samples plusIn a candidate set Q, the sample points can increase the generalization capability of the training model; the final candidate set is Q ═ U_t,t＝1,...n}∪{O_iI ═ 1.. m }; the sample set Q represents a sample selected by a selection model which considers uncertainty and diversity design at the same time, and is added into the training set L after being manually marked;

(4) Providing a processing method on a super-pixel level to achieve the aim of optimizing a boundary; firstly, the super-pixel segmentation algorithm of SLIC is utilized to set the number of segmented super-pixel blocks to be 100, 150 and 200 respectively, and the segmented super-pixel blocks are used for forming a super-pixel set SP of an image i_iSeparately extracting CNN feature x of each superpixel block_j(ii) a The saliency map M obtained from step (3) is used_pBinarization as a prior saliency map E_i(ii) a Determining positive and negative samples of the superpixel, and completely locating the superpixel in the prior saliency map E in order to make the confidence coefficient highest_iSuperpixels of foreground region form positive sample set PO_iCompletely locate the super-pixel in the prior saliency map E_iThe super pixels of the background region of (2) constitute a negative sample set N_i(ii) a Forming positive and negative sample pairs from the positive and negative sample sets, and using formula

To train a model KSR for image i_i(ii) a For this model KSR_iUsing the formula s_i＝w^TPk_iAll superpixels of the image are scored, and all the superpixels are classifiedSuper-pixel sorting S ═ S₁,s₂,…s_nThe higher the score is, the closer the score is to the foreground, and otherwise, the lower the score is, the closer the score is to the background; using formulas

Obtaining the score of each pixel in the superpixel, normalizing all the scores to be between 0 and 1, and finally obtaining the saliency map M synthesized at the superpixel level through weighting and fusion_s(ii) a The final saliency map is given by the formula M ═ w₁×M_p+w₂×M_SObtaining, wherein M is the final saliency map, M_pIs an original significant picture, M_sIs a super-pixel level saliency map.