CN111611999A

CN111611999A - Saliency detection method and terminal fusing small-size depth generation model

Info

Publication number: CN111611999A
Application number: CN202010443235.3A
Authority: CN
Inventors: 叶锋; 陈星宇; 陈利文; 郑子华; 陈家祯; 翁彬; 黄添强; 林新棋; 吴献; 蒋佳龙
Original assignee: Fujian Normal University
Current assignee: Fujian Normal University
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2020-09-01
Anticipated expiration: 2040-05-22
Also published as: CN111611999B

Abstract

The invention discloses a significance detection method and a terminal fusing a small-sized depth generation model, wherein a background block reselection process is carried out on an obtained two-layer significance map, so that the selected background block has higher reliability, the background block forms a background seed vector, a diffusion matrix is constructed by the background vector, significance information carried by the background vector is better diffused through a diffusion process, and thus the background prior-based background significance map is obtained. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S_{_f}. Meanwhile, a saliency map S generated by the trained small generator model_{_d}And a saliency map S_{_f}By fusion of designsAnd fusing the algorithm to obtain a final saliency map S. The method has better performance when the image boundary is touched by the salient region, the obtained final salient image has more complete foreground and brighter foreground, and the background is more effectively inhibited.

Description

Saliency detection method and terminal fusing small-size depth generation model

Technical Field

The invention relates to the field of image processing technology and deep neural network, in particular to a significance detection method and a terminal fusing a small-sized depth generation model.

Background

In the face of an image, one can quickly focus the eye on the areas of the image that are most appealing, while excluding other less important areas. The field of computer vision, by simulating the human visual system, extracts areas of an image that can attract human attention, called saliency detection. In 1998, Itti originally proposed a significance detection algorithm based on the Koch framework. Subsequently, saliency detection is receiving more and more attention as a powerful way to accelerate computer processing speed, and is increasingly applied to tasks such as image retrieval, image classification, image segmentation, image compression, and target detection and recognition. The existing significance detection methods can be divided into two main categories from the design mode, namely a bottom-up method and a top-down method. The former method is mainly to design a corresponding detection model by using the texture, color, position, object outline and other underlying features of the image to calculate the saliency value of each position area in the image, and this method is driven by data as it is. The latter is designed according to a specific computing task and generally requires supervised training in combination with a specific target, which method can be said to be task driven. For application, the saliency detection method can be divided into focus prediction and saliency region detection. The former task is to predict the attention points of human vision, and the latter task is to completely display the salient regions in the image and effectively cover the background regions.

Many recently proposed saliency detection methods utilize color contrast information in an image for saliency detection. In view of the above problem, Yang et al proposed a graph-based popularity ranking method to detect saliency targets in 2013. And under the condition that the selected foreground or background seed vector is used as a query, taking the sequencing value of the similarity of each region and the seed vector in the graph as the significant value of the region, and further generating the significant graph. Jiang et al in 2015 proposed an improved method that allowed the performance of significance detection methods based on the prevalent diffusion process to be improved. Through deep analysis of the internal relation between the diffusion process and the spectral clustering method, the diffusion matrix is reconstructed, so that the significance information carried by the seed vector can be better transmitted through the diffusion matrix. Leaf et al propose a method for saliency detection using multi-level features of images based on the work of Jiang et al. These methods can highlight the inside of the salient region to some extent, but still have the problems of incomplete salient detection region, low confidence of salient region, and the like. Meanwhile, the significance detection can improve the performance of the algorithm by means of some high-level priors, and many significance detection methods use the background priors. However, most methods (including those enumerated above) simply apply a background prior, i.e., the edge region around the image is taken as the background region, with the result that false detections are made in the event that a salient object touches the image boundary.

With the rapid development of deep learning, more and more scholars apply the method related to machine learning to significance detection. Zhao et al proposed a multi-context depth significance detection method. A Convolutional Neural Network (CNN) is used for extracting high-level features of the image, a method for detecting a salient region is combined with local and global context environments, and the influence of four different pre-training strategies on a final result is discussed. Hou et al proposed a method of deep supervised significance detection in conjunction with short connections. According to the method, a short connecting structure is added on the basis of an HED (head-aided Edge Detector) model to adapt to significance detection, the interior and the boundary of a detected significance object are consistent, and the model is well represented on time overhead. Lee et al propose an efficient deep learning framework for accurate saliency detection, combining artificially designed bottom-level features in the traditional method with high-level abstract features extracted by a deep neural network for saliency detection. Li et al propose a saliency detection method that combines a saliency detection task with an image segmentation task. According to the method, the convolution layer parameters of the shared part in the model are updated by simultaneously optimizing two task targets, so that a better significance detection effect is obtained. Ji et al attempted significance detection with the generation of countermeasure Networks (GANs). The generation countermeasure network comprises two parts, a generator and a discriminator. The generator is used for extracting high-level features of the image and generating a saliency map according to the high-level features, and the discriminator is trained by taking the obtained saliency map and a corresponding truth map as input so as to distinguish which is the generation map and which is the truth map. The generators and the discriminators are respectively and continuously improved in mutual confrontation, namely the generator generates better and better saliency maps and the discriminators have higher and higher discrimination capability. Finally, the trained generator model is used for significance detection. At present, compared with the traditional method, the method based on the depth model has a good detection effect, but has the problems of difficult model training, large final model volume, slow detection speed and the like.

Disclosure of Invention

The invention aims to provide a significance detection method fusing a small depth generation model and a terminal.

The technical scheme adopted by the invention is as follows:

a significance detection method fusing a small depth generation model comprises the following steps:

s1: dividing the image into super-pixel images by using an SLIC (simple linear iterative cluster) algorithm, wherein each super-pixel block serves as a graph node, and the color characteristic difference between every two super-pixels serves as the weight of a graph edge, so that the original image is converted into a graph structure;

s2: obtaining a seed vector by utilizing simple background prior, central prior and color distribution characteristics, and establishing a diffusion matrix by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle; diffusing the obtained seed vectors through a diffusion matrix to obtain a primary saliency map;

s3: taking the obtained preliminary saliency map as an input, repeating the step of S2, and obtaining a two-layer saliency map through a diffusion process;

s4: establishing a background block based on the two-layer saliency map according to the Fisher criterion, then selecting the background block, forming the selected background block into a background vector, constructing a diffusion matrix according to the background vector, and obtaining a background saliency map by a diffusion method;

s5: generating a saliency map S through a nonlinear fusion algorithm according to the two-layer saliency map and the background saliency map_{_f}。

S6: based on the generation countermeasure network framework, a brand-new discriminator network and a small generation countermeasure network are designed, and the designed network is trained manually in stages according to a specified process.

S7: inputting the original image into a trained small generator model to obtain a saliency map S_{_d}. Fusion of saliency maps S by designed fusion algorithms_{_f}And S_{_d}A final saliency map S is obtained.

Further, the specific step of S4 is:

s4-1, defining the background block search interval as [ l, r ], wherein the values of l and r are given by the following formula (1):

wherein l is the minimum value which can be obtained by the number of background blocks in the image; r is the maximum value which can be obtained by the number of background blocks in the image, sp represents the total number of superpixels generated after an image is segmented by the SLIC algorithm, and sp is a parameter for controlling the number of the background blocks which can be obtained;

s4-2, a position indication variable p, an inter-class difference ratio variable f and a variable f for storing the maximum value of f_mxAnd the variable v of the storage vector element is initialized to 0, and the variable Bg of the number of the background blocks is initialized to l-1;

s4-3, performing ascending sequencing on the input two-layer significance vector y, and storing the obtained result as a vector y';

s4-4, increasing the Bg value by 1, and turning to the step S4-8 when the Bg is larger than r; otherwise, executing step S4-5;

s4-5, assigning the Bg-th element of the vector y 'to the variable v, wherein the elements smaller than or equal to v in the vector y' form a vector m, and the elements larger than v form a vector n;

s4-6, based on the idea of Fisher' S criterion, a calculation formula for giving f value is as follows

Wherein ag () is the average of the samples in the class, va () is the variance of the samples in the class,

s4-7, when the value of f is larger than f_mxIf so, update f_mxUpdating the value of p to be Bg, and turning to the step S4-4; when f is not more than f_mxIf yes, directly turning to the step S4-4;

s4-8, assigning the value of p to Bg, forming a background vector b by the first Bg elements of the vector y', and simultaneously returning a variable Bg and the vector b;

seed vector s and diffusion matrix A constructed by background vector b^-1Diffusing according to the formula (3) to obtain a background significance vector y_{_b}Will vector y_{_b}The value of each element in the background saliency map is given to a corresponding super pixel block, and then a background saliency map can be generated; of formula (3)The expression is as follows:

y＝s×A^-1(3)

wherein s represents a seed vector, A^-1Representing the diffusion matrix constructed by the background vector b.

Furthermore, the value of sp in the formula (1) is far greater than 12, and the value is 12; the numerator in the formula (2) represents the inter-class difference, the denominator represents the intra-class difference, and the significance row vector y' is subjected to secondary classification by taking the final ratio f as the basis.

Further, the specific step of step S5 is: the background saliency vector y_{_b}And two-level saliency vector y_{_sc}A significance vector y obtained by carrying out nonlinear fusion as shown in formula (4)_{_fn}Significance vector y_{_fn}The value of each element in the saliency map is given to a corresponding super-pixel block to obtain a saliency map S_{_f}The expression of the formula (4) is as follows:

y_{_fn}＝(0.5y_{_b}+0.5y_{_sc})×e^-5y_sc(4)

wherein, 0.5y_{_b}+0.5y_{_sc}Comprehensively considering the background saliency map and the two-layer saliency map, e^-5y_scAnd adjusting the fusion result of the background saliency map and the two-layer saliency map as an introduced nonlinear factor.

Further, formula (4) e^-5y_scMedium fixed parameter-5 was determined experimentally.

Further, step S6 specifically includes:

s6-1, constructing a discriminator network and a small generator network aiming at significance detection based on the generation countermeasure network framework;

the discriminator model comprises 11 convolution modules and 5 pooling modules; each convolution module comprises a convolution layer, a batch normalization layer and a nonlinear activation function layer; the core size of each pooling layer is 2 x 2, the step length is set to be 2, and the pooling function adopts a maximum pooling function;

the small-sized depth generation model comprises 15 convolution modules and 5 deconvolution modules, wherein the step length of all convolution layers in the network and the up-sampling rate of all transposed convolution layers are set to be 1; each convolution module comprises a convolution layer, a batch normalization layer and a nonlinear activation function layer; each of the transpose convolution modules includes a transpose convolution layer, a batch normalization layer and a nonlinear activation function layer,

s6-2, performing manual staged training through a given training process; the training process for generating the countermeasure network is performed alternately in such a manner that the fixed one trains the other.

Further, in step S6-1, the number of convolution kernels of all convolution layers and the transposed convolution layer in the generator model is set to 64, and the sizes of the adopted convolution kernels are all 3 × 3; the number of channels of the first layer of convolution kernels is set to be 3, the number of channels in the subsequent convolution layer is doubled, namely the number of convolution kernels in the pooled convolution layer needs to be doubled.

Further, in the first round of the network training in step S6-2, the generator needs to be fixed to train the discriminator, and then the alternate training is performed; setting the labels corresponding to the truth maps of the input discriminators to be 0.9 and setting the labels corresponding to the saliency maps generated by the generator to be-0.9 in the training algorithm; and organizing the images in the data set into a matrix array form, and inputting the matrix array form into a network for training.

Further, in S7, the saliency map S_{_f}And a saliency map S_{_d}The specific steps of the fusion algorithm when the final saliency map S is obtained by fusion are as follows:

s7-1, inputting a saliency map S obtained by small-scale generation network detection_{_d}And a saliency map S_{_f}；

S7-2, initializing a variable i to be 0, and initializing matrix variables S and temp to be null;

s7-3, calling MCA2 algorithm to S_{_d}And S_{_d}Self-fusion is carried out, and the output significant graph is stored in a variable temp;

s7-4, the value of i is increased by 1, when i is larger than 4, the step S7-7 is executed; otherwise, executing the next step;

s7-5, calling MCA2 algorithm to S_{_f}Fusing with temp, and storing the output significant map in a matrix variable S;

s7-6, updating the temp value to S and returning to the step S7-4;

and S7-7, outputting a matrix variable S, and after the matrix variable S is processed by a fusion algorithm, outputting the variable S, namely the final saliency map.

The invention also provides a saliency detection terminal fused with the small depth generation model, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor carries and executes the computer program generated by the saliency detection method fused with the small depth generation model.

By adopting the technical scheme, the background block reselection process is carried out on the obtained two-layer saliency map, so that the selected background block has higher reliability, the background block forms a background seed vector, the background vector forms a diffusion matrix, and the saliency information carried by the background vector is better diffused through the diffusion process, so that the background prior-based background saliency map is obtained. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S_{_f}. Meanwhile, a saliency map S generated by the trained small generator model_{_d}And a saliency map S_{_f}And fusing through a designed fusion algorithm to obtain a final saliency map S. Compared with a recent partial salient object detection algorithm on a common data set, the method has better performance when a salient area touches an image boundary, and due to better application of background prior, a final detection result is improved in both a subjective feeling level and an objective index level, the specific performance is that the foreground of the obtained final salient image is more complete and brighter, the background is more effectively inhibited, and the problems that salient object detection is outstanding and incomplete, the confidence coefficient of the salient area is not high and false detection is generated when the salient object touches the image edge area in the prior art are solved. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. The detection result obtained by combining the depth model is combined, the final detection result of the algorithm is obviously improved on each objective evaluation index, and meanwhile, the problem of the method based on the depth neural network in the prior art is solvedThe model has large volume, slow detection speed and the like.

Drawings

The invention is described in further detail below with reference to the accompanying drawings and the detailed description;

fig. 1 is a flowchart of an image saliency detection method fused with a small depth generation model according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an image saliency detection system fused with a small depth generation model according to an embodiment of the present invention;

description of reference numerals: 1. a memory; 2. a processor.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

As shown in fig. 1, the most critical concept of the present invention is: 1) the background block reselection process is carried out on the obtained two-layer saliency map, so that the selected background block has higher reliability, the background block forms a background seed vector, a diffusion matrix is constructed by the background vector, and saliency information carried by the background vector is better diffused through the diffusion process, so that the background prior-based background saliency map is obtained. 2) The resulting small generator model is designed and trained for significance detection. 3) Saliency map S generated from a small generator model_{_d}And a saliency map S_{_f}A fusion algorithm for performing fusion.

Referring to fig. 1, the present invention provides an image saliency detection method fused with a small depth generation model, including:

s2: seed vectors are obtained by utilizing simple background prior, center prior and color distribution characteristics, and a diffusion matrix is established by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle. Diffusing the obtained seed vectors through a diffusion matrix to obtain a primary saliency map;

From the above description, the invention provides a significance detection method and a terminal for fusing a small depth generation model, which enable the selected background blocks to have higher reliability by performing a background block reselection process on the obtained two-layer significance map, configure the background blocks into background seed vectors, configure diffusion matrices by the background vectors, and better diffuse significance information carried by the background vectors through the diffusion process, thereby obtaining the background significance map based on background prior. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S_{_f}. Meanwhile, a saliency map S generated by the trained small generator model_{_d}And a saliency map S_{_f}And fusing through a designed fusion algorithm to obtain a final saliency map S. Compared with the recent partial salient object detection algorithm on a common data set, the method has better performance when the salient region touches the image boundary and is better appliedThe background is prior, so that the final detection result is improved in a subjective experience level or an objective index level, the foreground of the obtained final saliency map is more complete and brighter, the background is more effectively inhibited, and the problems that in the prior art, the detection of saliency objects is outstanding and incomplete, the confidence of saliency areas is not high, and false detection is caused when the saliency objects touch the edge areas of the image are solved. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. By combining the detection result obtained by the depth model, the final detection result of the algorithm is obviously improved on each objective evaluation index, and the problems of large size, slow detection speed and the like of the depth neural network method-based model in the prior art are solved.

Further, the S1 specifically includes: the image is divided into the super-pixel images through the SLIC algorithm, each super-pixel block serves as a graph node, and the color characteristic difference between every two super-pixels serves as the weight of a graph edge, so that the super-pixel images are converted into graph structures.

Further, the S2 specifically includes: seed vectors are obtained by utilizing simple background prior, center prior and color distribution characteristics, and a diffusion matrix is established by using a graph structure obtained by conversion according to an inverse matrix of Laplace and a spectral clustering principle. And diffusing the obtained seed vector by a diffusion matrix to obtain a primary saliency map.

Further, the S3 specifically includes: and taking the obtained preliminary saliency map as an input, repeating the step of S2, and obtaining a two-layer saliency map through a diffusion process.

Further, the S4 specifically includes: the Fisher criterion refers to separating the sample sets of two different classes by making the difference between the samples of the different classes as large as possible, while the difference between the samples of the same class is as small as possible, i.e. making the ratio of the difference between the classes to the difference within the classes as large as possible. Likewise, the key issue to be solved here is how to divide the two-layer saliency vector y into two vectors, background and foreground, as accurately as possible (denoted by vector m and vector n, respectively, without loss of generality). Therefore, the background block reselection algorithm is designed by utilizing the idea of the Fisher criterion, and finally returns the number Bg of the background blocks and the background vector b. The specific steps of the algorithm are given below.

(1) Defining a background block search interval as [ l, r ], wherein the values of l and r are given by formula (1)

In the formula (1), l is the minimum value that the number of background blocks in the image can obtain, and experience shows that most images cannot be all foreground, and we assume that at least one background block exists in the image after superpixel segmentation, so that the initial value is 1. r (r is rounded downwards) is the maximum value which can be obtained by the number of background blocks in the image, sp (sp is far larger than 12) represents the total number of superpixels generated after an image is segmented by the SLIC algorithm, and sp is a parameter for controlling the number of the background blocks which can be obtained. As is apparent from equation (1), the larger the value of r, the smaller the selectable range of the background block. Because the number of background blocks existing in each image is not constant, a large number of experiments are carried out on different values on 5 universal data sets, and the result shows that the algorithm effect is optimal when the value is 12, so that the value of the text is 12, and the value of the right boundary r is further determined.

(2) The position indication variable p, the difference ratio variable f between the classes, and the variable f for storing the maximum value of f_mxAnd the variable v of the storage vector element is initialized to 0, and the variable Bg of the number of the background blocks is initialized to l-1.

(3) And sorting the input two-layer significance vector y in an ascending order, and storing the obtained result as a vector y'.

(4) Bg is increased by 1. And (5) if Bg is larger than r, turning to the step (8).

(5) And assigning the Bg-th element of the vector y 'to the variable v, wherein the elements less than or equal to v in the vector y' form a vector m, and the elements more than v form a vector n.

(6) Based on the idea of Fisher's criterion, the definition of the f value is given as follows

In the formula (2), ag (·) is an average value of the intra-class samples, and va (·) is a variance of the intra-class samples. Therefore, the numerator represents the inter-class difference, the denominator represents the intra-class difference, and the significance row vector y' is subjected to two classifications based on the final ratio f. As is apparent from equation (2), the larger the f value is, the more accurate the number of selected background blocks is, and thus the background can be better separated from the foreground. The value of the variable f is calculated according to equation (2).

(7) If f is greater than f_mxThen f is updated_mxUpdating the value of p to be Bg, and turning to the step (4); otherwise, directly turning to the step (4).

(8) And assigning the value of p to Bg, wherein the first Bg elements of the vector y' form a background vector b, and simultaneously returning a variable Bg and the vector b.

Seed vector s and diffusion matrix A constructed by background vector b^-1Diffusing according to the formula (3) to obtain a background significance vector y_{_b}Will vector y_{_b}The value of each element in the background saliency map is assigned to the corresponding super-pixel block.

y＝s×A^-1(3)

Further, the S5 specifically includes: the background saliency vector y_{_b}And two-level saliency vector y_{_sc}Nonlinear fusion was performed as shown in equation (4). In the formula, 0.5y_{_b}+0.5y_{_sc}Comprehensively considering the background saliency map and the two-layer saliency map, e^-5y_scThe fusion result of the background saliency map and the two-layer saliency map is adjusted as an introduced nonlinear factor, and a fixed parameter-5 is determined by experiments. The significance vector y obtained_{_fn}The value of each element in the saliency map is given to a corresponding super-pixel block to obtain a saliency map S_{_f}。

Further, the S6 specifically includes: (1) a completely new discriminator network and small generator network are designed for significance detection based on the generation confrontation network framework. The specific structures of the two networks are shown in tables 1 and 2, respectively.

TABLE 1 detailed construction of discriminator model

TABLE 2 detailed architecture of the mini-generator model

As can be seen from table 1, the discriminator model contains 11 convolution modules and 5 pooling modules. Specifically, each convolution module includes three parts, namely, a convolution layer (Conv), a batch normalization layer (BN), and a nonlinear activation function layer (leak ReLU). The core size of each pooling layer is 2 x 2, the step sizes are set to be 2, and the pooling function adopts a maximum pooling function. Therefore, the length and width of the output new feature map are reduced to half of the original size after each pooling operation. Specifically, if the length and width of the feature map input to the pooling layer is odd, the length and width of the new feature map output after the pooling operation is rounded down. In addition, since the feature map changes in the length and width dimensions due to the pooling operation, the number of convolution kernels in the pooled convolutional layer needs to be doubled, that is, the number of channels of the new feature map is doubled. It is noted that the number of convolution kernels in a convolution layer increases to 512. To facilitate the design of the network, the size of the images input to the discriminator network are unified as 224 × 224, and the size of the signature graph output from each layer in the network is shown in the last column of the table. The input of the discriminator comprises a true value graph and a corresponding label, and a saliency graph and a corresponding label generated by the generator. And the last Loss layer of the model is used for calculating the Loss value of the discrimination result of the discriminator.

As can be seen from Table 2, the small depth-generating model consists of 15 convolution modules and 5 deconvolution modules, and the total number of parameters of the model is only67 thousands, the size of the training volume is only 2.4M. The step size of all the convolutional layers in the network and the up-sampling rate of all the transposed convolutional layers are set to 1. It can be seen that each convolution module contains three parts, namely a convolution layer (Conv), a batch normalization layer (BN) and a nonlinear activation function layer (leakage ReLU). Similar to convolutional layers, each of the transposed convolutional modules comprises three parts, namely a transposed convolutional layer (Convt), a batch normalization layer and a nonlinear activation function layer. Considering that the number of convolution kernels of the first layer in the VGG network model which is excellent in the image classification task is 64, and in order to strictly control the size of the model, the number of convolution kernels of all the convolution layers and the transposed convolution layer in the generator model is set to be 64, and the size of the adopted convolution kernels is 3 × 3. Since the input to the generator model is an RGB color image of an arbitrary size, the number of channels of the first layer convolution kernel is set to 3. In addition, since the length and width of the new feature map are reduced to half of the original length and width after the pooling operation, the number of channels in the subsequent convolutional layer needs to be doubled, that is, the number of convolutional cores in the pooled convolutional layer needs to be doubled. However, since the generator model does not include pooling operations, the number of convolution kernels per convolutional layer in the model remains 64. It can also be seen from the table that every third convolution module in the generator model can be considered as a group, and can be divided into 5 groups in total. The first convolution layer in each group does not fill the feature map by 0 before convolution operation, and the other two layers all fill a single row of 0 pixels around the input feature map and then carry out convolution operation, so that the feature maps output by the convolution layers in each group have the same size. The convolution group is followed by a transposed convolution layer, after 5 times of transposition convolution operation with the up-sampling rate of 1, the length and width of the output feature graph are restored to the size of the original image, the number of channels is changed to 1, and the output feature graph is a gray graph with the size consistent with the size of the original image. Let the original RGB image size of the input model be m x n x 3, the feature map size of each layer output is shown in the last column of table 1. Finally, sigmoid binary classification layer processing is carried out, and a saliency map S is obtained by carrying out foreground or background binary classification on each pixel point of the gray map and then outputting_{_d}。

(2) Manual staged training is performed by a given training process.

The training process to generate the countermeasure network is alternated in such a way that a fixed party (generator or discriminator) trains the other party (discriminator or generator). The network begins the first round of training, requiring the fixed generator to train the discriminator, and then alternating training. In addition, the labels corresponding to the truth map input to the discriminator are all set to 0.9 in the training algorithm, while the labels corresponding to the saliency map generated by the generator are all set to-0.9. For a training data set, the training algorithm needs to organize the images in the data set into a matrix array form and then input the matrix array form into a network for training. Due to experimental training environmental constraints, we divided 5 common datasets for significance testing into 2 sets of batch input networks for training. The 2 sets of data sets were a set consisting of MSRA10K data sets alone (10000 sheets) and B set consisting of the remaining four data sets (DUT-OMRON, ECSSD, SOD and SED2, 6568 sheets total). The specific process of training is to randomly initialize the parameter values in the network. Then, the A group data set is used as a training set, 5 rounds of training are carried out at the learning rate of 10-6, and then 1 round of training is carried out after the learning rate is adjusted to 3 x 10-6. Then, the B group data set is used as a training set, and the network continues to be trained for 10 rounds after the learning rate is set to 10-6 again. And finally, taking the A group data set as a training set, continuing training the network for 8 rounds at the learning rate of 10-6, and stopping training. The training process and the setting of the training label value are finally determined through a large number of experimental comparisons. It should be noted that, because the random values of the network initialization are different each time, even if the models initialized in different batches are trained in the same way, the performance of the finally obtained models is slightly different.

Further, the S7 specifically includes: significance map S by designed fusion algorithm_{_f}And a saliency map S_{_d}And carrying out fusion to obtain a final saliency map S. The fusion algorithm is based on a multi-saliency-map fusion algorithm MCA provided by Qin et al in saliency detection based on cellular automata, and the MCA algorithm is modified by the method, so that only 2 input saliency maps are fused in each fixation, and the algorithm MCA2 is called. The following is givenThe fusion algorithm of the invention comprises the following specific steps:

(1) inputting a saliency map S detected by a small-scale generation network_{_d}And a saliency map S_{_f}。

(2) The initialization variable i is 0 and the initialization matrix variables S and temp are null.

(3) Calling MCA2 Algorithm pair S_{_d}And S_{_d}Self-fusion is carried out, and the output significant image is stored in the variable temp.

(4) And (5) increasing the value of i by 1, if i is larger than 4, turning to the step (7), and otherwise, executing the next step.

(5) Calling MCA2 Algorithm pair S_{_f}And fusing with temp, and storing the output significance map in a matrix variable S.

(6) And updating the temp value to S and returning to the step (4).

(7) And outputting the matrix variable S.

After the fusion algorithm processing, the output variable S is the final saliency map of the algorithm.

Referring to fig. 2, the present invention further provides a saliency detection terminal fused with a small depth generation model, including a memory 1, a processor 2 and a computer program stored in the memory 1 and executable on the processor 2, where the processor 2 executes the computer program generated according to the saliency detection method fused with a small depth generation model.

By adopting the technical scheme, the background block reselection process is carried out on the obtained two-layer saliency map, so that the selected background block has higher reliability, the background block forms a background seed vector, the background vector forms a diffusion matrix, and the saliency information carried by the background vector is better diffused through the diffusion process, so that the background prior-based background saliency map is obtained. Carrying out nonlinear fusion on the background saliency map and the two-layer saliency map to obtain a saliency map S_{_f}. Meanwhile, a saliency map S generated by the trained small generator model_{_d}And a saliency map S_{_f}And fusing through a designed fusion algorithm to obtain a final saliency map S. Compared to recent partial salient object detection algorithms on commonly used datasets,the method has better performance when the salient region touches the image boundary, and the background prior is better applied, so that the final detection result is improved in both a subjective feeling level and an objective index level, the specific expression is that the foreground of the obtained final salient image is more complete and brighter, the background is more effectively inhibited, and the problems of the prior art that the detection of the salient object is outstanding and incomplete, the confidence coefficient of the salient region is not high, and the false detection is generated when the salient object touches the image edge region are solved. In addition, because a small generator network is designed and trained for significance detection, the size of the model is only 2.4M, and the number of parameters is only about 67 thousands. By combining the detection result obtained by the depth model, the final detection result of the algorithm is obviously improved on each objective evaluation index, and the problems of large size, slow detection speed and the like of the depth neural network method-based model in the prior art are solved.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims

1. A significance detection method fusing a small depth generation model is characterized by comprising the following steps: which comprises the following steps:

s1: the image is divided into super-pixel images through an SLIC algorithm, each super-pixel block serves as a graph node, and the color characteristic difference between every two super-pixels serves as the weight of a graph edge, so that the original image is converted into a graph structure;

s2: obtaining seed vectors by utilizing simple background prior, central prior and color distribution characteristics, establishing a diffusion matrix according to an inverse matrix of Laplace and a graph structure obtained by spectral clustering principle conversion, and diffusing the obtained seed vectors by using the diffusion matrix to obtain a primary saliency map;

s3: repeating the step of S2 on the obtained primary saliency map, and then obtaining a two-layer saliency map through a diffusion process;

s4: creating background blocks for the two-layer saliency map according to the idea of Fisher criterion, then selecting the background blocks, forming the selected background blocks into background vectors, constructing a diffusion matrix, and obtaining a background saliency map by a diffusion method;

s5: generating a saliency map S by a nonlinear fusion algorithm through the two-layer saliency map and the background saliency map_{_f}；

S6: constructing a discriminator network and a small-sized generation countermeasure network based on the generation countermeasure network framework, and performing manual staged training according to a specified process;

s7: inputting the original image into a trained small generator model to obtain a saliency map S_{_d}Fusion algorithm fusion saliency map S_{_f}And S_{_d}A final saliency map S is obtained.

2. The significance detection method fused with the small depth generation model according to claim 1, wherein: the specific steps of S4 are:

s4-2, a position indication variable p, an inter-class difference ratio variable f and a variable f for storing the maximum value of f_mxAnd variables storing vector elementsv is all initialized to 0, and the variable Bg of the number of the background blocks is initialized to l-1;

s4-8, assigning the value of p to Bg, forming a background vector b by the first Bg elements of the vector y', and simultaneously returning a variable Bg and the vector b; seed vector s and diffusion matrix A constructed by background vector b^-1Diffusing according to the formula (3) to obtain a background significance vector y_{_b}Will vector y_{_b}The value of each element in the background saliency map is given to a corresponding super pixel block, and then a background saliency map can be generated; expression of formula (3) is as follows:

y＝s×A^-1(3)

3. The significance detection method fused with the small depth generation model according to claim 2, wherein: the value of sp in the formula (1) is far more than 12, and the value is 12; the numerator in the formula (2) represents the inter-class difference, the denominator represents the intra-class difference, and the significance row vector y' is subjected to secondary classification by taking the final ratio f as the basis.

4. The significance detection method fused with the small depth generation model according to claim 1, wherein: the specific steps of step S5 are: the background saliency vector y_{_b}And two-level saliency vector y_{_sc}A significance vector y obtained by carrying out nonlinear fusion as shown in formula (4)_{_fn}Significance vector y_{_fn}The value of each element in the saliency map is given to a corresponding super-pixel block to obtain a saliency map S_{_f}The expression of the formula (4) is as follows:

5. The significance detection method fused with the small depth generation model according to claim 4, wherein: equation (4) e^-5y_scMedium fixed parameter-5 was determined experimentally.

6. The significance detection method fused with the small depth generation model according to claim 1, wherein: the step S6 includes the following steps:

7. The significance detection method fused with the small depth generation model according to claim 6, wherein: in the step S6-1, the number of convolution kernels of all the convolution layers and the transposed convolution layer in the generator model is set to 64, and the sizes of the adopted convolution kernels are all 3 × 3; the number of channels of the first layer of convolution kernels is set to be 3, the number of channels in the subsequent convolution layer is doubled, namely the number of convolution kernels in the pooled convolution layer needs to be doubled.

8. The significance detection method fused with the small depth generation model according to claim 6, wherein: in the first round of network training in the step S6-2, the generator needs to be fixed to train the discriminator, and then alternate training is carried out; setting the labels corresponding to the truth maps of the input discriminators to be 0.9 and setting the labels corresponding to the saliency maps generated by the generator to be-0.9 in the training algorithm; and organizing the images in the data set into a matrix array form, and inputting the matrix array form into a network for training.

9. The significance detection method fused with the small depth generation model according to claim 6, wherein: to the saliency map S in S7_{_f}And a saliency map S_{_d}The specific steps of the fusion algorithm when the final saliency map S is obtained by fusion are as follows:

S7-3，calling MCA2 Algorithm pair S_{_d}And S_{_d}Self-fusion is carried out, and the output significant graph is stored in a variable temp;

s7-6, updating the temp value to S and returning to the step S7-4;

10. A saliency detection terminal fused to a small depth generative model comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that: the processor carries out a computer program generated by the significance detection method fusing the small depth generation model according to any one of claims 1 to 9.