CN109360191B

CN109360191B - Image significance detection method based on variational self-encoder

Info

Publication number: CN109360191B
Application number: CN201811113241.1A
Authority: CN
Inventors: 孙正兴; 徐峻峰; 李博; 胡佳高
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2020-06-12
Anticipated expiration: 2038-09-25
Also published as: CN109360191A

Abstract

The invention discloses an image significance detection method based on a variational self-encoder, which comprises the following steps: firstly, an input image is segmented into superpixels, the boundary connectivity of each superpixel is calculated, and a background sample set is screened out according to connectivity values; then constructing a variable molecular encoder depth network model, and training the network by using the obtained background sample data in a random gradient descent mode; and finally, reconstructing all superpixels of the whole image by using the trained variational self-encoder network, and calculating reconstruction errors to obtain a significance value.

Description

Image significance detection method based on variational self-encoder

Technical Field

The invention belongs to the technical field of image processing, and relates to an image saliency detection method based on a variational self-encoder.

Background

Recently, saliency detection, which aims to find important parts in images, has become a hot problem in research in the field of computer vision. The saliency detection is an important ring of visual media processing, and provides effective help for many visual applications such as object segmentation, image retrieval, image content editing and the like.

Generally, in the field of significance detection, existing methods can be classified into top-down or bottom-up. The top-down approach is task-driven and requires supervised training with manually labeled truth images. To better discriminate salient objects from the background, we need to apply advanced information and surveillance methods to improve the accuracy of the saliency map. In contrast, bottom-up approaches typically construct saliency maps using low-level cues, such as features, colors, and spatial distances. One of the most applied principles is the contrast-first principle. This principle is mainly to calculate the color contrast and spatial contrast of a region and the surrounding environment to obtain the saliency value of the region. However, the saliency value of an image region is obtained by calculating the contrast of features in RGB and LAB color spaces of the image region according to the method for detecting a saliency region (201510297326. X). When a complex image is encountered, the contrast of the low-layer features cannot reflect a significant difference, resulting in a poor detection result. In addition, there are also methods based on edge precedence. They assume that regions at the edges of the image are more likely to be background, as in the patent "a saliency measure of an image object" (201711124512.9), where a saliency value of an arbitrary object is calculated by calculating the matching difference of a boundary image block and the nearest object image block. Indeed, there is a great possibility that the edge of the image becomes a background, which is described in document 1: borji, D.N.Sihite, and L.Itti.Salient object detection, Abenchmark.In Computer Vision- -ECCV 2012, pages 414- -429.Springer,2012.1, all of which are identified herein. However, as with most previous methods, it is not reasonable to classify all points on the edge of an image as background points. If the target object appears at the edge of the image, the seed chosen as the background will be inaccurate and directly result in false salient detection results. Meanwhile, when the existing method is used for discovering the mode of the background seed point, the generalization expression capability of the background is poor due to the limitation of the model, and the higher significance of part of the background area is detected.

As can be seen from the research and application in the field of image saliency, the current single image saliency detection method has the following disadvantages:

(1) and only low-level image features are used for saliency calculation, so that the saliency detection result has errors on the semantic level of the image.

(2) When edge priority is utilized, a target object is selected as a background seed point by mistake, resulting in a false significant detection result. The pattern recognition model used is deficient in generalization expression capability, resulting in false detection of background regions as salient objects.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing an image saliency detection algorithm based on a variational self-encoder for supporting effective detection of a salient object in an image aiming at the defects of the prior art.

In order to solve the technical problem, the invention discloses an image significance detection algorithm based on a variational self-encoder, which comprises the following steps:

the method comprises the following steps:

step 1, inputting an image and screening a background sample;

step 2, training a variation self-encoder depth network model through a background sample;

and 3, calculating a superpixel reconstruction error through a variational self-encoder depth network model to obtain a significance value of the image.

The step 1 comprises the following steps:

step 1-1, dividing the whole input image into N by using SLIC (simple Linear Iterative Cluster) simple Linear Iterative clustering method₁(generally taking a value of 300) super-pixels, and calculating an average CIELab color statistical characteristic value of each super-pixel;

step 1-2, clustering super-pixel CIELab color statistical characteristic values by using a K-means clustering algorithm to obtain image areas, and calculating the boundary connectivity of each image area;

step 1-3, calculating the boundary connectivity of the superpixels in the region;

and 1-4, calculating the probability of the super pixel belonging to the background according to the boundary connectivity of the super pixel.

In step 1-2, the boundary connectivity of each image region is calculated using the following formula:

wherein RG_mRepresenting the mth area in the picture, bdconn (RG)_m) Representing the boundary connectivity of the m-th area, SP_iRepresenting the ith superpixel in the image, and BD representing the border area, i.e. the area composed of the outermost superpixels of the image.

In step 1-3, the boundary connectivity of the superpixels in the image region is calculated using the following formula:

bdconn(SP_i)＝bdconn(RG_m),SP_i∈RG_m

wherein, bdconn (SP)_i) Representing the boundary connectivity of the ith super pixel, the boundary connectivity of the ith super pixel is the same as the boundary connectivity of the image area where the ith super pixel is located,

in the steps 1-4, the probability that the super-pixel belongs to the background is calculated by the following formula:

wherein bgPro (SP)_i) Representing the probability that the ith super pixel belongs to the background,

representing a balance weight, selecting a probability greater than or equal to N that belongs to the background₂The (typically 0.8) superpixels constitute the background sample set B.

The step 2 comprises the following steps of,

step 2-1, constructing a variation self-encoder depth network model with the depth of 5, wherein the variation self-encoder depth network model comprises an input layer, a hidden layer I, a variation layer, a hidden layer III and an output layer, two adjacent layers are connected in a full connection mode, and the input layer is connected with the output layerNumber of cells N₃(generally 400) and the number of hidden layer elements is N corresponding to the RGB values of the pixels contained in the super-pixel₄(generally taking the value as 300), the network is in a symmetrical design form, namely the network structures of an encoder part and a decoder part in the variational self-encoder are symmetrical and consistent;

step 2-2, training a variational self-coder depth network model by using a back propagation and random gradient descent method: the parameter to be trained for each hidden layer is W_jAnd b_jThe input is a vector x which represents a superpixel SP in a background sample set B_iMiddle pixel RGB value, output as vector

When the reverse propagation is performed, the input is vector y and the output is vector

A non-linear activation function used in the network; in the training process, a loss function L is defined as the sum of the cross entropy of encoding and decoding and KL divergence, and the formula is as follows:

where q (z | x) is the Gaussian distribution of the input vector x in the variational space z, and p (x | z) is the output vector

Gaussian distribution in variational space z, loss function antecedent

Representing the cross-entropy loss, the consequent D_KL(q (z | x) | p (z)) is KL (Kullback-Leibler Divergence), and for each hidden layer, a random gradient descent method is respectively used to minimize L to obtain a parameter W_jAnd b_j。

The step 3 comprises the following steps of,

step 3-1, for the ith superpixel SP in the image_i，x_iShow itRGB pixel value, obtaining x through training obtained variational self-coder depth network model_iCorresponding coding and decoding results

Thereby calculating the super pixel SP_iVariation reconstruction error of

Step 3-2, passing the ith super pixel SP_iVariation reconstruction error of

Calculating to obtain the ith super pixel SP_iSignificance value of

Step 3-3, obtaining N by adopting the method from step 3-1 to step 3-2₁And obtaining the significance value of the image according to the significance value of the super pixel.

In step 3-1, the ith super pixel SP is calculated by the following formula_iVariation reconstruction error of

In step 3-2, the ith super pixel SP is calculated by the following formula_iSignificance value of

According to the method, the situation that the generalization capability of the model is insufficient in the modeling of the prior background mode is overcome by introducing the variation self-encoder depth network to try to calculate the image significance from the depth space; meanwhile, the selection process of the edge background seed points is optimized by utilizing the boundary connectivity with the superpixel, the possibility that the target object is mistakenly selected as the background sample is greatly reduced, and the accuracy of significance detection is improved. Therefore, the invention has higher application value.

Has the advantages that: the invention has the following advantages: firstly, the method screens background samples based on boundary connectivity detection, obtains a pure background sample set with less errors, and improves the modeling precision of an image non-significant background area; secondly, the method constructs a new variational self-encoder depth network and efficiently trains the variational self-encoder depth network in a random gradient descent mode, so that the remarkable information of the image is explored in a depth space, and the generalization capability of background modeling is improved; finally, the invention provides a graph saliency method based on variational reconstruction error calculation, which can obtain a relatively accurate saliency detection result of an input image and improve the detection precision.

Drawings

The foregoing and other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a schematic process flow diagram of the present invention.

Fig. 2 is a schematic diagram of a variational self-encoder structure.

Fig. 3a is a schematic diagram comparing the results of saliency detection on a part of livestock images with the results of detection of other saliency methods.

Fig. 3b is a schematic diagram comparing the result of saliency detection on a part of face image with the detection result of other saliency methods.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

As shown in fig. 1, the present invention discloses an image saliency detection algorithm based on a variational self-encoder, which comprises the following steps:

step 1, screening a background sample.

Step 1.1, using document 2: achanta, A.Shaji, K.Smith, A.Lucchi, P.Fua, and S.S.major. Slic superpixels compounded to state-of-the-art superpixel methods.T-PAMI,34(11):2274 and 2282,2012, SLIC (simple Linear iterative clustering) method divides the whole input image into 300 superpixels and calculates the average CIELab color statistical characteristic value of each superpixel.

Step 1.2, using reference 3: zhu, S.Liang, Y.Wei, and J.Sun, "promotion from robust background detection," in Proceedings of the IEEEcomputer Society Conference on Computer Vision and Pattern Recognition,2014, pp.2814-2821, calculating the probability that a superpixel belongs to the background using the following formula:

wherein RG_mRepresenting the mth area in the picture, bdconn (RG)_m) Representing the boundary connectivity of the m-th area, SP_iRepresenting the ith superpixel in the image and BD the bounding area. The boundary connectivity of the superpixels in the region can then be calculated using the following formula:

bdconn(SP_i)＝bdconn(RG_m),SP_i∈RG_m

wherein the boundary connectivity of the super-pixel i is the same as the boundary connectivity of the region in which it is located.

Then, the probability of the super pixel belonging to the background can be calculated according to the boundary connectivity of the super pixel, and the calculation formula is as follows:

wherein bgPro (SP)_i) Representing the background probability of the ith super pixel,

belonging to the balance weight. Selecting background probability greater than or equal toA super-pixel equal to 0.8 constitutes the background sample set B.

As shown in fig. 2, for step 2 of the present invention, the training of the variational self-encoder specifically includes:

step 2.1, constructing a variational self-encoder model: a variation self-encoder depth network model with the depth of 5 is constructed, and the variation self-encoder depth network model comprises an input layer, a hidden layer I, a variation layer, a hidden layer III and an output layer, wherein the adjacent two layers are connected in a full connection mode. Where the number of input layer elements is 400, corresponding to the pixel RGB values contained in the super-pixel. The number of hidden layer units is 300, and the network is in a symmetrical design form.

Step 2.2, training a neural network model by using a back propagation and random gradient descent method: use document 4: hinton G E, Salakhutdinov R.reducing the dimensional of data with neural networks [ J ]]504-. The parameter to be trained for each hidden layer is W_jAnd b_jThe input is a vector x which represents a superpixel SP in a background sample set B_iMiddle pixel RGB value, output as vector

Is a non-linear activation function used in the network; during the training process, the loss function is defined as the sum of the encoding-decoding cross entropy and the KL divergence, and the formula is as follows:

In changeThe gaussian distribution in the subspace z is passed, the front term of the loss function is the cross entropy, and the back term is the KL divergence. For each hidden layer, a random gradient descent method is respectively used, L is minimized, and a parameter W is obtained_jAnd b_j；

And 3, calculating a super-pixel reconstruction error to obtain a significance value.

For all superpixels SP in the image_i，x_iRepresenting its RGB pixel values. Training derived variational self-coder depth network for each x_iCan obtain the corresponding encoding-decoding result

Then the metameric reconstruction error for the superpixel can be computed:

the saliency value of the super-pixel can then be found by the following formula:

using document 5: dingwen Zhang, Deyu Meng, and Junwei Han.2017.Co-saliencydetection via a self-contained multiple-interaction learning frame. IEEEtransactions on pattern analysis and machine interaction 39,5(2017), 865-.

Use document 6: dingwen Zhang, Junweii Han, Chao Li, Jingdong Wang, and Xuelong Li.2016.detection of Co-medical Objects by Looking Deep and Wide International Journal of Computer Vision 120,2(2016), 215-232. https:// doi. org/10.1007/s11263-016-

Use document 7: the results of the publications such as Xiaochun Cao, Zhijiang Tao, Bao Zhuang, Huazhu Fu, and WeiFeng.2014.Selfadaptive weighted co-preservation video transmission and Processing 23,9(2014), 4175-4186. https:// doi. org/10.1109/TIP.2014. are comparative results to further illustrate the advantages of the effects of the present invention.

FIG. 3a is a schematic diagram showing the comparison of the results of saliency detection of a portion of an animal image with the results of other saliency detection methods; in fig. 3a, the first line is the original image, the second line is the true value image, the third line is the result of the present invention, the fourth line is the result of document 5, the fifth line is the result of document 6, and the sixth line is the result of document 7, so that the effect is seen, compared with the existing significant detection method, the significant detection result of the present invention can better mark the complete contour of the significant object (livestock), the distribution of the significant values in the significant object is more balanced, and simultaneously the significant values of the background area (grassland, shrub, forest, etc.) can be better suppressed.

FIG. 3b is a schematic diagram comparing the result of saliency detection on a part of face image with the detection results of other saliency methods; in fig. 3b, the first line is the original image, the second line is the true value image, the third line is the result of the present invention, the fourth line is the result of document 5, the fifth line is the result of document 6, and the sixth line is the result of document 7, so that the effect is seen, compared with the existing significant detection method, the significant detection result of the present invention can better mark the complete contour of the significant object (human face), the distribution of the significant values in the significant object is more balanced, the significant values in the background area can be better suppressed, and the interference of the non-significant area (clothes and neck under the human face) is reduced.

The present invention provides a method for detecting image saliency based on a variational self-encoder, and a plurality of methods and approaches for implementing the technical solution, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. An image saliency detection method based on a variational self-encoder is characterized by comprising the following steps:

step 1, inputting an image and screening a background sample;

step 3, calculating a superpixel reconstruction error through a variational self-encoder depth network model to obtain a significance value of the image;

the step 1 comprises the following steps:

step 1-1, dividing the whole input image into N by using SLIC simple linear iterative clustering method₁Calculating the average CIELab color statistical characteristic value of each super pixel;

step 1-3, calculating the boundary connectivity of the superpixels in the image area;

1-4, calculating the probability of the super pixel belonging to the background according to the boundary connectivity of the super pixel;

wherein RG_mRepresenting the mth area in the picture, bdconn (RG)_m) Representing the boundary connectivity of the m-th area, SP_iRepresenting the ith super pixel in the image, and BD represents a boundary area, namely an area formed by the super pixels at the outermost periphery of the image;

bdconn(SP_i)＝bdconn(RG_m)，SP_i∈RG_m，

wherein, bdconn (SP)_i) The boundary connectivity of the ith super pixel is represented, and the boundary connectivity of the ith super pixel is the same as the boundary connectivity of the image area where the ith super pixel is located;

representing a balance weight, selecting a probability greater than or equal to N that belongs to the background₂The super pixels form a background sample set B;

the step 2 comprises the following steps of,

step 2-1, constructing a variation self-encoder depth network model with the depth of 5, wherein the variation self-encoder depth network model comprises an input layer, a hidden layer I, a variation layer, a hidden layer III and an output layer, the adjacent two layers are connected in a full connection mode, and the number of input layer units is N₃The number of hidden layer units is N corresponding to the RGB values of the pixels contained in the super-pixel₄The network is in a symmetrical design form, namely the network structures of an encoder part and a decoder part in the variational self-encoder are symmetrical and consistent;

Is a non-linear activation function used in the network; in the training process, a loss function L is defined as the sum of the cross entropy of encoding and decoding and KL divergence, and the formula is as follows:

Gaussian distribution in variational space z, loss function antecedent

Representing the cross-entropy loss, the consequent D_KL(q (z | x) | p (z)) is KL divergence, and for each hidden layer, a random gradient descent method is respectively used to minimize L to obtain a parameter W_jAnd b_j。

2. The method of claim 1, wherein step 3 comprises the steps of,

step 3-1, for the ith superpixel SP in the image_i，x_iRepresenting RGB pixel values, obtaining x through a variation self-coder depth network model obtained through training_iCorresponding coding and decoding results

Thereby calculating the super pixel SP_iVariation reconstruction error of

Step 3-2, passing the ith super pixel SP_iVariation reconstruction error of

Calculating to obtain the ith super pixel SP_iSignificance value of

Step 3-3, extractingObtaining N by the method of step 3-1 to step 3-2₁And obtaining the significance value of the image according to the significance value of the super pixel.

3. The method according to claim 2, wherein in step 3-1, the ith super pixel SP is calculated by the following formula_iVariation reconstruction error of

4. The method according to claim 3, wherein in step 3-2, the ith super pixel SP is calculated by the following formula_iSignificance value of