CN111611999B

CN111611999B - Saliency detection method and terminal fusing small-size depth generation model

Info

Publication number: CN111611999B
Application number: CN202010443235.3A
Authority: CN
Inventors: 叶锋; 陈星宇; 陈利文; 郑子华; 陈家祯; 翁彬; 黄添强; 林新棋; 吴献; 蒋佳龙
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2023-04-07
Anticipated expiration: 2040-05-22
Also published as: CN111611999A

Abstract

The invention discloses a significance detection method and a terminal fusing a small-sized depth generation model, wherein a background block reselection process is carried out on an obtained two-layer significance map, so that the selected background block has higher reliability, the background block forms a background seed vector, a diffusion matrix is constructed by the background vector, significance information carried by the background vector is better diffused through a diffusion process, and thus the background prior-based background significance map is obtained. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S _{_f} . Meanwhile, a saliency map S generated by the trained small generator model _{_d} And a saliency map S _{_f} And fusing through a designed fusion algorithm to obtain a final saliency map S. The method has better performance when the image boundary is touched by the salient region, the obtained final salient image has more complete foreground and brighter foreground, and the background is more effectively inhibited.

Description

Saliency detection method and terminal fusing small-size depth generation model

Technical Field

The invention relates to the field of image processing technology and deep neural network, in particular to a significance detection method and a terminal fusing a small-sized depth generation model.

Background

In the face of an image, one can quickly focus the eye on the areas of the image that are most appealing, while excluding other less important areas. The field of computer vision, by simulating the human visual system, extracts areas of an image that can attract human attention, called saliency detection. In 1998, itti originally proposed a significance detection algorithm based on the Koch framework. Subsequently, saliency detection is receiving more and more attention as a powerful way to accelerate computer processing speed, and is increasingly applied to tasks such as image retrieval, image classification, image segmentation, image compression, and object detection and recognition. The existing significance detection methods can be divided into two categories, namely a bottom-up method and a top-down method, in terms of design modes. The former method is mainly to design a corresponding detection model by using the texture, color, position, object outline and other underlying features of the image to calculate the saliency value of each position area in the image, and this method is data-driven as it is. The latter is designed according to a specific computing task and generally requires supervised training in combination with a specific target, which method can be said to be task driven. For application, the saliency detection method can be divided into focus prediction and saliency region detection. The former task is to predict the attention points of human vision, and the latter task is to completely display the salient regions in the image and effectively cover the background regions.

Many recently proposed saliency detection methods utilize color contrast information in an image for saliency detection. In view of the problem that the saliency region cannot be completely detected by the saliency detection method based on color information alone, yang et al proposed a popularity ranking method based on a graph to detect a saliency target in 2013. And under the condition that the selected foreground or background seed vectors are used as queries, the ranking value of the similarity between each region and the seed vectors in the map is used as the significance value of the region, and then the significance map is generated. Jiang et al in 2015 proposed an improved method that allowed the performance of significance detection methods based on the prevalent diffusion process to be improved. Through deep analysis of the internal relation between the diffusion process and the spectral clustering method, the diffusion matrix is reconstructed, so that the significance information carried by the seed vector can be better transmitted out through the diffusion matrix. Leaf et al propose a method for saliency detection using multi-level features of images based on the work of Jiang et al. These methods can highlight the inside of the salient region to some extent, but still have the problems of incomplete salient detection region, low confidence of salient region, and the like. Meanwhile, the significance detection also can improve the performance of the algorithm by means of some high-level priors, and many significance detection methods use the high-level priors of background priors. However, most methods (including those enumerated above) simply apply a background prior, i.e., the edge region around the image is used as the background region, with the result that false detections occur if a salient object touches the image boundary.

With the rapid development of deep learning, more and more scholars apply the method related to machine learning to significance detection. Zhao et al proposed a multi-context depth significance detection method. A Convolutional Neural Network (CNN) is used for extracting high-level features of the image, a method for detecting a salient region is combined with local and global context environments, and the influence of four different pre-training strategies on a final result is discussed. Hou et al propose a method for deep supervised significance detection in conjunction with short junctions. According to the method, a short connecting structure is added on the basis of an HED (head-aided Edge Detector) model to adapt to significance detection, the interior and the boundary of a detected significance object are consistent, and the model is well represented on time overhead. Lee et al propose an efficient deep learning framework for accurate saliency detection, combining artificially designed bottom-level features in the traditional method with high-level abstract features extracted by a deep neural network for saliency detection. Li et al propose a saliency detection method that combines a saliency detection task with an image segmentation task. According to the method, the convolution layer parameters of the shared part in the model are updated by simultaneously optimizing two task targets, so that a better significance detection effect is obtained. Ji et al attempted significance detection with the generation of countermeasure Networks (GANs). The generation countermeasure network comprises two parts of a generator and a discriminator. The generator is used for extracting high-level features of the image and generating a saliency map according to the high-level features, and the discriminator is trained by taking the obtained saliency map and a corresponding truth map as input so as to distinguish which is the generation map and which is the truth map. The generators and the discriminators are respectively and continuously improved in mutual confrontation, namely the generator generates better and better saliency maps and the discriminators have higher and higher discrimination capability. Finally, the trained generator model is used for significance detection. At present, compared with the traditional method, the method based on the depth model has a good detection effect, but has the problems of difficult model training, large final model volume, slow detection speed and the like.

Disclosure of Invention

The invention aims to provide a significance detection method fusing a small depth generation model and a terminal.

The technical scheme adopted by the invention is as follows:

a significance detection method fusing a small depth generation model comprises the following steps:

s1: dividing the image into super-pixel images by using a SLIC (simple linear iterative cluster) algorithm, wherein each super-pixel block serves as a graph node, and the color characteristic difference between every two super-pixels serves as the weight of a graph edge, so that the original image is converted into a graph structure;

s2: obtaining a seed vector by utilizing simple background prior, central prior and color distribution characteristics, and establishing a diffusion matrix by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle; diffusing the obtained seed vectors through a diffusion matrix to obtain a primary saliency map;

s3: taking the obtained preliminary saliency map as an input, repeating the step S2, and obtaining a two-layer saliency map through a diffusion process;

s4: establishing background blocks based on the two-layer saliency map according to the Fisher criterion, then selecting, forming background vectors by the selected background blocks, constructing a diffusion matrix according to the background vectors, and obtaining a background saliency map by a diffusion method;

s5: generating a saliency map S through a nonlinear fusion algorithm according to the two-layer saliency map and the background saliency map _{_f} 。

S6: based on the generation countermeasure network framework, a brand-new discriminator network and a small generation countermeasure network are designed, and the designed network is trained manually in stages according to a specified process.

S7: inputting the original image into a trained small generator model to obtain a saliency map S _{_d} . Fusion of saliency maps S by designed fusion algorithms _{_f} And S _{_d} A final saliency map S is obtained.

Further, the specific steps of S4 are:

s4-1, defining a background block search interval as [ l, r ], wherein the values of l and r are given by equation (1):

wherein l is the minimum value which can be obtained by the number of background blocks in the image; r is the maximum value which can be obtained by the number of background blocks in the image, sp represents the total number of superpixels generated after an image is segmented by the SLIC algorithm, and delta is a parameter for controlling the number of the background blocks which can be obtained;

s4-2, a position indication variable p, an inter-class difference ratio variable f and a variable f for storing the maximum value of f _mx And the variable v of the storage vector element is initialized to 0, and the variable Bg of the number of the background blocks is initialized to l-1;

s4-3, performing ascending sequencing on the input two-layer significance vector y, and storing an obtained result as a vector y';

s4-4, the value of Bg is increased by 1, and when Bg is larger than r, the step S4-8 is switched to; otherwise, executing the step S4-5;

s4-5, assigning the Bg-th element of the vector y 'to a variable v, wherein elements smaller than or equal to v in the vector y' form a vector m, and elements larger than v form a vector n;

s4-6, based on the idea of Fisher criterion, a calculation formula for giving the f value is as follows

Wherein ag () is the average of the samples in the class, va () is the variance of the samples in the class,

s4-7, when the value of f is larger than f _mx If so, update f _mx Updating the value of p to be Bg, and turning to the step S4-4; when the value of f is not more than f _mx Then, directly turning to the step S4-4;

s4-8, assigning the value of p to Bg, forming a background vector b by the first Bg elements of the vector y', and simultaneously returning a variable Bg and the vector b;

seed vector s and diffusion matrix A constructed by background vector b ^-1 Diffusing according to the formula (3) to obtain a background significance vector y _{_b} Will vector y _{_b} The value of each element in the background saliency map is given to a corresponding super pixel block, and then a background saliency map can be generated; the expression of formula (3) is as follows:

y＝s×A ^-1 (3)

wherein s represents a seed vector, A ^-1 Representing the diffusion matrix constructed by the background vector b.

Furthermore, the value of sp in the formula (1) is far greater than 12, and the value of delta is 12; the numerator in the formula (2) represents the difference between classes, the denominator represents the difference in classes, and the final ratio f is used as the basis to carry out secondary classification on the significance row vector y'.

Further, the specific steps of step S5 are: the background saliency vector y _{_b} And two-level saliency vector y _{_sc} A significance vector y obtained by carrying out nonlinear fusion as shown in formula (4) _{_fn} Significance vector y _{_fn} The value of each element in the saliency map is assigned to a corresponding superpixel block to obtain the saliency map S _{_f} The expression of formula (4) is as follows:

y _{_fn} ＝(0.5y _{_b} +0.5y _{_sc} )×e ^-5y_sc (4)

wherein, 0.5y _{_b} +0.5y _{_sc} Comprehensively considering the background saliency map and the two-layer saliency map, e ^-5y_sc And adjusting the fusion result of the background saliency map and the two-layer saliency map as an introduced nonlinear factor.

Further, formula (4) e ^-5y_sc Medium fixed parameter-5 was determined experimentally.

Further, the step S6 specifically includes:

s6-1, constructing a discriminator network and a small generator network aiming at significance detection based on a generation countermeasure network framework;

the discriminator model comprises 11 convolution modules and 5 pooling modules; each convolution module comprises a convolution layer, a batch normalization layer and a nonlinear activation function layer; the core size of each pooling layer is 2 x 2, the step length is set to be 2, and the pooling function adopts a maximum pooling function;

the small-sized depth generation model comprises 15 convolution modules and 5 deconvolution modules, wherein the step length of all convolution layers in the network and the upsampling rate of all transposed convolution layers are set to be 1; each convolution module comprises a convolution layer, a batch normalization layer and a nonlinear activation function layer; each of the transpose convolution modules includes a transpose convolution layer, a batch normalization layer and a nonlinear activation function layer,

s6-2, performing manual staged training through a given training process; the training process for generating the countermeasure network is performed alternately in such a manner that the fixed one trains the other.

Further, in the step S6-1, the number of convolution kernels of all convolution layers and the transposed convolution layer in the generator model is set to 64, and the sizes of the adopted convolution kernels are all 3 × 3; the number of channels of the first layer of convolution kernels is set to 3, the number of channels in the subsequent convolution layer is doubled, namely the number of convolution kernels in the pooled convolution layer needs to be doubled.

Further, in the first round of network training in the step S6-2, the generator needs to be fixed to train the discriminator, and then the alternate training is performed; setting the labels corresponding to the truth maps of the input discriminators to be 0.9 and setting the labels corresponding to the saliency maps generated by the generator to be-0.9 in the training algorithm; and organizing the images in the data set into a matrix array form, and inputting the matrix array form into a network for training.

Further, in S7, the saliency map S is referenced _{_f} And a saliency map S _{_d} The specific steps of the fusion algorithm when the final saliency map S is obtained by fusion are as follows:

s7-1, inputting a saliency map S obtained by small-scale generation network detection _{_d} And significant graph S _{_f} ；

S7-2, initializing a variable i to be 0, and initializing matrix variables S and temp to be null;

s7-3, calling MCA2 algorithm to S _{_d} And S _{_d} Self-fusion is carried out, and the output significant graph is stored in a variable temp;

s7-4, the value of i is increased by 1, and when i is greater than 4, the step S7-7 is switched to; otherwise, executing the next step;

s7-5, calling MCA2 algorithm to S _{_f} Fusing with temp, and storing the output significant graph in matrix variable S;

s7-6, updating the temp value to S and returning to the step S7-4;

and S7-7, outputting a matrix variable S, and after the matrix variable S is processed by a fusion algorithm, outputting the variable S, namely the final saliency map.

The invention also provides a significance detection terminal fusing the small-size depth generation model, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor carries and executes the computer program generated by the significance detection method fusing the small-size depth generation model.

By adopting the technical scheme, the invention enables the selected background block to have higher reliability by carrying out the background block re-selection process on the obtained two-layer saliency map, and constructs the background blockAnd forming a background seed vector, constructing a diffusion matrix by the background vector, and better diffusing the significance information carried by the background vector through a diffusion process, thereby obtaining a background significance map based on background prior. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S _{_f} . Meanwhile, a saliency map S generated by the trained small generator model _{_d} And a saliency map S _{_f} And fusing through a designed fusion algorithm to obtain a final saliency map S. Compared with a recent partial salient object detection algorithm on a common data set, the method has better performance when a salient area touches an image boundary, and due to better application of background prior, a final detection result is improved in both a subjective feeling level and an objective index level, the specific performance is that the foreground of the obtained final salient image is more complete and brighter, the background is more effectively inhibited, and the problems that salient object detection is outstanding and incomplete, the confidence coefficient of the salient area is not high and false detection is generated when the salient object touches the image edge area in the prior art are solved. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. By combining the detection result obtained by the depth model, the final detection result of the algorithm is obviously improved on each objective evaluation index, and the problems of large size, slow detection speed and the like of the depth neural network method-based model in the prior art are solved.

Drawings

The invention is described in further detail below with reference to the drawings and the detailed description;

fig. 1 is a flowchart of an image saliency detection method fused with a small depth generation model according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image saliency detection system fused with a small depth generation model according to an embodiment of the present invention;

description of reference numerals: 1. a memory; 2. a processor.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

As shown in fig. 1, the most critical concept of the present invention is: 1) The background block reselection process is carried out on the obtained two-layer saliency map, so that the selected background block has higher reliability, the background block is formed into a background seed vector, a diffusion matrix is formed by the background vector, and saliency information carried by the background vector is diffused better through the diffusion process, so that the background prior-based background saliency map is obtained. 2) The resulting small generator model is designed and trained for significance detection. 3) Saliency map S generated from a small generator model _{_d} And significant graph S _{_f} A fusion algorithm for performing fusion.

Referring to fig. 1, the present invention provides an image saliency detection method fused with a small depth generation model, including:

s2: seed vectors are obtained by utilizing simple background prior, center prior and color distribution characteristics, and a diffusion matrix is established by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle. Diffusing the obtained seed vectors through a diffusion matrix to obtain a primary saliency map;

s4: establishing a background block based on the two-layer saliency map according to the Fisher criterion, then selecting the background block, forming the selected background block into a background vector, constructing a diffusion matrix according to the background vector, and obtaining a background saliency map by a diffusion method;

s5: generating a saliency map through a nonlinear fusion algorithm according to the two-layer saliency map and the background saliency mapWriting picture S _{_f} 。

From the above description, it can be seen that the present invention provides a significance detection method and a terminal fusing a small depth generation model, and the method and the terminal make the selected background block have higher reliability by performing a background block reselection process on the obtained two-layer significance map, make the background block constitute a background seed vector, then construct a diffusion matrix from the background vector, and better diffuse significance information carried by the background vector through a diffusion process, thereby obtaining a background significance map based on background prior. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S _{_f} . Meanwhile, a saliency map S generated by the trained small generator model _{_d} And significant graph S _{_f} And (5) fusing through a designed fusion algorithm to obtain a final saliency map S. Compared with a recent partial salient object detection algorithm on a common data set, the method has better performance when a salient region touches the boundary of an image, and due to better application of background prior, a final detection result is improved in both a subjective perception level and an objective index level, the specific performance is that the foreground of the obtained final salient image is more complete and brighter, the background is more effectively inhibited, and the problems that in the prior art, salient object detection is outstanding and incomplete, the confidence coefficient of the salient region is not high, and false detection is caused when the salient object touches the edge region of the image are solved. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. Combining the detection result obtained by the depth model, and taking the final detection result of the algorithm from each objective evaluation indexThe method is remarkably improved, and meanwhile, the problems that a model based on a deep neural network method in the prior art is large in size, slow in detection speed and the like are solved.

Further, S1 specifically is: the image is divided into the superpixel images through an SLIC algorithm, each superpixel block serves as a graph node, and the color feature difference between every two superpixels serves as the weight of a graph edge, so that the superpixel images are converted into graph structures.

Further, the S2 specifically is: seed vectors are obtained by utilizing simple background prior, center prior and color distribution characteristics, and a diffusion matrix is established by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle. And diffusing the obtained seed vector by a diffusion matrix to obtain a primary saliency map.

Further, the S3 specifically is: and (3) taking the obtained preliminary saliency map as an input, repeating the step of S2, and obtaining a two-layer saliency map through a diffusion process.

Further, the S4 specifically is: the Fisher criterion refers to separating the sample sets of two different classes by making the difference between the samples of the different classes as large as possible, while the difference between the samples of the same class is as small as possible, i.e. making the ratio of the difference between the classes to the difference within the classes as large as possible. Likewise, the key issue to be solved here is how to divide the two-layer saliency vector y into two vectors, background and foreground, as accurately as possible (denoted by vector m and vector n, respectively, without loss of generality). Therefore, the background block reselection algorithm is designed by utilizing the idea of the Fisher criterion, and finally returns the number Bg of the background blocks and the background vector b. The specific steps of the algorithm are given below.

(1) Defining a background block search interval as [ l, r ], wherein the values of l and r are given by formula (1)

In the formula (1), l is the minimum value that the number of background blocks in the image can obtain, and experience shows that most images cannot be all foreground, and we assume that at least one background block exists in the image after superpixel segmentation, so that the initial value is 1.r (r is rounded downwards) is the maximum value which can be obtained by the number of background blocks in the image, sp (sp is far larger than 12) represents the total number of superpixels generated after an image is segmented by the SLIC algorithm, and delta is a parameter for controlling the number of the background blocks which can be obtained. As is apparent from equation (1), the larger the value of δ, the smaller the value of r, and the smaller the selectable range of the background block. Because the number of background blocks existing in each image is not constant, a large number of experiments are carried out on different values of delta on 5 general data sets, and the result shows that the algorithm effect is optimal when the delta is 12, so that the delta value in the text is 12, and the value of the right boundary r is further determined.

(2) The position indication variable p, the difference ratio variable f between the classes, and the variable f for storing the maximum value of f _mx And the variable v of the storage vector element is initialized to 0, and the variable Bg of the number of the background blocks is initialized to l-1.

(3) And sorting the input two-layer significance vector y in an ascending order, and storing the obtained result as a vector y'.

(4) Bg is increased by 1. And (5) if Bg is larger than r, turning to the step (8).

(5) And assigning the Bg-th element of the vector y 'to the variable v, wherein the elements less than or equal to v in the vector y' form a vector m, and the elements more than v form a vector n.

(6) Based on the idea of Fisher's criterion, the definition of the f value is given as follows

In the formula (2), ag (·) is an average value of the intra-class samples, and va (·) is a variance of the intra-class samples. Therefore, the numerator represents the inter-class difference, the denominator represents the intra-class difference, and the significance row vector y' is subjected to two classifications based on the final ratio f. As can be easily seen from the equation (2), the larger the f value is, the more accurate the number of the selected background blocks is, so that the background and the foreground can be well separated. The value of the variable f is calculated according to equation (2).

(7) If f is greater than f _mx Then f is updated _mx Updating the value of p to be Bg, and turning to the step (4);otherwise, directly turning to the step (4).

(8) And assigning the value of p to Bg, wherein the first Bg elements of the vector y' form a background vector b, and simultaneously returning the variable Bg and the vector b.

Seed vector s and diffusion matrix A constructed by background vector b ^-1 Diffusing according to the formula (3) to obtain a background significance vector y _{_b} Will vector y _{_b} The value of each element in the background saliency map is assigned to the corresponding super-pixel block.

y＝s×A ^-1 (3)

Further, the S5 specifically is: the background saliency vector y _{_b} And two-level saliency vector y _{_sc} Nonlinear fusion is performed as shown in equation (4). In the formula, 0.5y _{_b} +0.5y _{_sc} Comprehensively considering the background saliency map and the two-layer saliency map, e ^-5y_sc The fusion result of the background saliency map and the two-layer saliency map is adjusted as an introduced nonlinear factor, and a fixed parameter-5 is determined by experiments. The obtained significance vector y _{_fn} The value of each element in the saliency map is given to a corresponding super-pixel block to obtain a saliency map S _{_f} 。

Further, S6 specifically is: (1) A completely new discriminator network and small generator network are designed for significance detection based on the generation confrontation network framework. The specific structures of the two networks are shown in tables 1 and 2, respectively.

TABLE 1 discriminator model details

/>

TABLE 2 detailed architecture of the mini-generator model

As can be seen from table 1, the discriminator model contains 11 convolution modules and 5 pooling modules. Specifically, each convolution module includes three parts, namely a convolution layer (Conv), a batch normalization layer (BN), and a nonlinear activation function layer (leak ReLU). The kernel size of the pooling layer is 2 x 2, the step sizes are set to be 2, and the pooling function adopts a maximum pooling function. Therefore, the length and width of the output new feature map are reduced to half of the original size after each pooling operation. Specifically, if the length and width of the feature map input to the pooling layer is odd, the length and width of the new feature map output after the pooling operation is rounded down. In addition, since the feature map changes in length and width dimensions due to the pooling operation, the number of convolution kernels in the pooled convolutional layer needs to be doubled, i.e., the number of channels of the new feature map is doubled. It is noted that the number of convolution kernels in a convolution layer increases to 512. To facilitate the design of the network, the size of the images input to the discriminator network are unified as 224 × 224, and the size of the signature graph output from each layer in the network is shown in the last column of the table. The input of the discriminator comprises a true value graph and a corresponding label, and a saliency graph and a corresponding label generated by the generator. And the last Loss layer of the model is used for calculating the Loss value of the discrimination result of the discriminator.

As can be seen from Table 2, the small-sized depth generation model comprises 15 convolution modules and 5 deconvolution modules, the total parameter number of the model is only 67 ten thousand, and the volume size after training is only 2.4M. The step size of all the convolutional layers in the network and the up-sampling rate of all the transposed convolutional layers are set to 1. It can be seen that each convolution module contains three parts, namely a convolution layer (Conv), a batch normalization layer (BN) and a nonlinear activation function layer (leakage ReLU). Similar to convolutional layers, each of the transposed convolutional modules comprises three parts, namely a transposed convolutional layer (Convt), a batch normalization layer and a nonlinear activation function layer. In consideration of the VGG network model excellent in performance in the image classification taskThe number of convolution kernels of the first layer is 64, and meanwhile, in order to strictly control the size of the model, the number of convolution kernels of all the convolution layers and the transposed convolution layers in the generator model is set to be 64, and the size of the adopted convolution kernels is 3 x 3. Since the input to the generator model is an RGB color image of an arbitrary size, the number of channels of the first layer convolution kernel is set to 3. In addition, since the length and width of the new feature map are reduced to half of the original length and width after the pooling operation, the number of channels in the subsequent convolutional layer needs to be doubled, that is, the number of convolutional cores in the pooled convolutional layer needs to be doubled. However, since the generator model does not include pooling operations, the number of convolution kernels per convolutional layer in the model remains 64. It can also be seen from the table that every third convolution module in the generator model can be considered as a group, and can be divided into 5 groups in total. The first convolution layer in each group does not fill the feature map by 0 before convolution operation, and the other two layers all fill a single row of 0 pixels around the input feature map and then carry out convolution operation, so that the feature maps output by the convolution layers in each group have the same size. The convolution group is followed by a transposed convolution layer, after 5 times of transposition convolution operation with the up-sampling rate of 1, the length and width of the output feature graph are restored to the size of the original image, the number of channels is changed to 1, and the output feature graph is a gray graph with the size consistent with the size of the original image. Let the original RGB image size of the input model be m x n x 3, the feature map size of each layer output is shown in the last column of table 1. Finally, sigmoid binary classification layer processing is carried out, and a saliency map S is obtained by carrying out foreground or background binary classification on each pixel point of the gray map and then outputting _{_d} 。

(2) Manual staged training is performed through a given training process.

The training process to generate the countermeasure network is alternated in such a way that a fixed party (generator or discriminator) trains the other party (discriminator or generator). The network begins the first round of training, requiring the fixed generator to train the discriminator, and then alternating training. In addition, the labels corresponding to the truth map input to the discriminator are all set to 0.9 in the training algorithm, while the labels corresponding to the saliency map generated by the generator are all set to-0.9. For a training data set, the images in the data set need to be organized into a matrix array form and then input into a network for training. Due to experimental training environmental constraints, we divided 5 common datasets for significance testing into 2 sets of batch input networks for training. The 2 sets of data sets are a set a consisting of MSRA10K data sets alone (10000) and a set B consisting of the remaining four data sets (DUT-OMRON, ECSSD, SOD and SED2, 6568 in total). The specific process of training is to randomly initialize the parameter values in the network. Then, the A group data set is used as a training set, 5 rounds of training are carried out at a learning rate of 10-6, and then 1 round of training is carried out after the learning rate is adjusted to 3 x 10-6. Then, the B group data set is used as a training set, and the network continues to be trained for 10 rounds after the learning rate is set to 10-6 again. And finally, taking the A group data set as a training set, continuing training the network for 8 rounds at the learning rate of 10-6, and stopping training. The training process and the setting of the training label value are finally determined through a large number of experimental comparisons. It should be noted that, because the random values of the network initialization are different each time, even if the models initialized in different batches are trained in the same way, the performance of the finally obtained models is slightly different.

Further, S7 specifically is: significance map S by designed fusion algorithm _{_f} And significant graph S _{_d} And carrying out fusion to obtain a final saliency map S. The fusion algorithm is based on a multi-saliency-map fusion algorithm MCA provided by Qin et al in saliency detection based on a cellular automaton, and the MCA algorithm is modified by the method, so that only 2 input saliency maps are fused in each fixation, and the algorithm MCA2 is called. The specific steps of the fusion algorithm of the present invention are given below:

(1) Inputting a saliency map S detected by a small-scale generation network _{_d} And significant graph S _{_f} 。

(2) The initialization variable i is 0 and the initialization matrix variables S and temp are null.

(3) Calling MCA2 Algorithm pair S _{_d} And S _{_d} Self-fusion is carried out, and the output significant graph is stored in a variable temp.

(4) And (5) increasing the value of i by 1, if i is larger than 4, turning to the step (7), and otherwise, executing the next step.

(5) Calling MCA2 Algorithm pair S _{_f} And fusing with temp, and storing the output significance map in a matrix variable S.

(6) And updating the temp value to S and returning to the step (4).

(7) And outputting the matrix variable S.

After the fusion algorithm processing, the output variable S is the final saliency map of the algorithm.

Referring to fig. 2, the present invention further provides a saliency detection terminal fused with a small depth generation model, including a memory 1, a processor 2 and a computer program stored in the memory 1 and executable on the processor 2, where the processor 2 executes the computer program generated according to the saliency detection method fused with a small depth generation model.

By adopting the technical scheme, the background block reselection process is carried out on the obtained two-layer saliency map, so that the selected background block has higher reliability, the background block forms a background seed vector, the background vector forms a diffusion matrix, and the saliency information carried by the background vector is better diffused through the diffusion process, so that the background prior-based background saliency map is obtained. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S _{_f} . Meanwhile, a saliency map S generated by the trained small generator model _{_d} And significant graph S _{_f} And (5) fusing through a designed fusion algorithm to obtain a final saliency map S. Compared with a recent partial salient object detection algorithm on a common data set, the method has better performance when a salient area touches the boundary of an image, and due to better application of background prior, the final detection result is improved in both a subjective experience level and an objective index level, the specific performance is that the foreground of the obtained final salient image is more complete and brighter, the background is more effectively inhibited, and the problems that in the prior art, the salient object detection is outstanding and incomplete, the confidence coefficient of the salient area is not high, and when the salient object touches the edge area of the image, the salient object touches the edge area of the image are solvedThe problem of false detection occurs. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. By combining the detection result obtained by the depth model, the final detection result of the algorithm is obviously improved on each objective evaluation index, and the problems of large size, slow detection speed and the like of the depth neural network method-based model in the prior art are solved.

It should be apparent that the embodiments described are some, but not all embodiments of the present application. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims

1. A significance detection method fusing a small depth generation model is characterized by comprising the following steps: which comprises the following steps:

s1: the image is divided into super-pixel images through an SLIC algorithm, each super-pixel block serves as a graph node, and the color characteristic difference between every two super-pixels serves as the weight of a graph edge, so that the original image is converted into a graph structure;

s2: obtaining seed vectors by utilizing simple background prior, center prior and color distribution characteristics, establishing a diffusion matrix according to an inverse matrix of Laplace and a graph structure obtained by spectral clustering principle conversion, and diffusing the obtained seed vectors by the diffusion matrix to obtain a primary saliency map;

s3: repeating the step of S2 on the obtained primary saliency map, and then obtaining a two-layer saliency map through a diffusion process;

s4: creating background blocks for the two-layer saliency map according to the idea of Fisher criterion, then selecting the background blocks, forming the selected background blocks into background vectors, constructing a diffusion matrix, and obtaining a background saliency map by a diffusion method;

s5: generating a saliency map S by a nonlinear fusion algorithm through the two-layer saliency map and the background saliency map _{_f} ；

S6: constructing a discriminator network and a small-sized generation countermeasure network based on the generation countermeasure network framework, and performing manual staged training according to a specified process;

s7: inputting the original image into a trained small generator model to obtain a saliency map S _{_d} Fusion algorithm fusion saliency map S _{_f} And S _{_d} A final saliency map S is obtained.

2. The significance detection method fused with the small depth generation model according to claim 1, wherein: the specific steps of S4 are as follows:

s4-1, defining a background block search interval as [ l, r ], wherein the values of l and r are given by formula (1):

s4-4, increasing the value of Bg by 1, and turning to the step S4-8 when the Bg is greater than r; otherwise, executing the step S4-5;

s4-5, assigning Bg-th element of the vector y 'to a variable v, wherein elements smaller than or equal to v in the vector y' form a vector m, and elements larger than v form a vector n;

s4-6, based on the idea of Fisher criterion, a calculation formula for giving an f value is as follows

s4-7, when the value of f is larger than f _mx If so, update f _mx Updating the value of p to be Bg, and turning to the step S4-4; when the value of f is not more than f _mx Directly turning to the step S4-4;

s4-8, assigning the value of p to Bg, forming a background vector b by the front Bg elements of the vector y', and returning a variable Bg and the vector b; seed vector s and diffusion matrix A constructed by background vector b ^-1 Diffusing according to the formula (3) to obtain a background significance vector y _{_b} Will vector y _{_b} The value of each element in the background saliency map is given to a corresponding super pixel block, and then a background saliency map can be generated; the expression of formula (3) is as follows:

y＝s×A ^-1 (3)

wherein s represents a seed vector, A ^-1 Representing the diffusion matrix constructed from the background vector b.

3. The significance detection method fused with the small depth generation model according to claim 2, wherein: the value of sp in the formula (1) is far more than 12, and the value of delta is 12; the numerator in the formula (2) represents the inter-class difference, the denominator represents the intra-class difference, and the significance row vector y' is subjected to secondary classification by taking the final ratio f as the basis.

4. The saliency detection method fused with a small depth generation model according to claim 1 is characterized in that: the specific steps of step S5 are: the background saliency vector y _{_b} And two-level saliency vector y _{_sc} The significance vector y obtained by carrying out nonlinear fusion according to the formula (4) _{_fn} Display, displayBirth vector y _{_fn} The value of each element in the saliency map is given to a corresponding super-pixel block to obtain a saliency map S _{_f} The expression of the formula (4) is as follows:

5. The significance detection method fused with the small depth generation model according to claim 4, wherein: equation (4) e ^-5y_sc Medium fixed parameter-5 was determined experimentally.

6. The significance detection method fused with the small depth generation model according to claim 1, wherein: the step S6 comprises the following specific steps:

the small-sized depth generation model comprises 15 convolution modules and 5 deconvolution modules, wherein the step length of all convolution layers in the network and the up-sampling rate of all transposed convolution layers are set to be 1; each convolution module comprises a convolution layer, a batch normalization layer and a nonlinear activation function layer; each of the transpose convolution modules includes a transpose convolution layer, a batch normalization layer, and a nonlinear activation function layer,

7. The saliency detection method fused with a small depth generation model according to claim 6 is characterized in that: in the step S6-1, the number of convolution kernels of all the convolution layers and the transposed convolution layers in the generator model is set to be 64, and the sizes of the adopted convolution kernels are 3 x 3; the number of channels of the first layer of convolution kernels is set to be 3, the number of channels in the subsequent convolution layer is doubled, namely the number of convolution kernels in the pooled convolution layer needs to be doubled.

8. The saliency detection method fused with a small depth generation model according to claim 6 is characterized in that: in the first round of network training in the step S6-2, a generator training discriminator needs to be fixed, and then alternate training is carried out; setting labels corresponding to a truth map input into the discriminator to be 0.9 in the training algorithm, and setting labels corresponding to a saliency map generated by the generator to be-0.9; and organizing the images in the data set into a matrix array form, and inputting the matrix array form into a network for training.

9. The significance detection method fused with the small depth generation model according to claim 6, wherein: to the saliency map S in S7 _{_f} And a saliency map S _{_d} The specific steps of the fusion algorithm when the final saliency map S is obtained by fusion are as follows:

s7-1, inputting a saliency map S obtained by small-scale generation network detection _{_d} And a saliency map S _{_f} ；

s7-5, calling MCA2 algorithm to S _{_f} Fusing with temp, storing the output significant map in matrix variable S；

S7-6, updating the temp value to S and returning to the step S7-4;

10. A saliency detection terminal fused with a small depth generative model, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that: the processor carries out a computer program generated by the significance detection method fusing the small depth generation model according to any one of claims 1 to 9.