CN111626306A

CN111626306A - Saliency map fusion method and system

Info

Publication number: CN111626306A
Application number: CN201910229519.XA
Authority: CN
Inventors: 梁晔; 马楠; 李大伟; 孙晨昊; 徐俊; 张磊; 周航; 王楠
Original assignee: Beijing Union University
Current assignee: Beijing Union University
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2020-09-04
Anticipated expiration: 2039-03-25
Also published as: CN111626306B

Abstract

The invention provides a saliency map fusion method and a saliency map fusion system, wherein the method comprises the following steps: preparing a training set; and searching neighbors of the test image X in the training set, and fitting the saliency map of the test image X through the saliency map of the neighbor image to obtain a final saliency map. The invention provides a saliency map fusion method and a saliency map fusion system, which take the difference of different extraction methods on the extraction effects of different images into consideration, and greatly improve the performance of fusion compared with the performance of a single method before fusion.

Description

Saliency map fusion method and system

Technical Field

The invention relates to the field of computer vision and the field of image processing, in particular to a saliency map fusion method and a saliency map fusion system.

Background

The image saliency detection aims at finding out the most important part in an image, is an important preprocessing step for reducing the computational complexity in the field of computer vision, and has wide application in the fields of image compression, target recognition, image segmentation and the like. Meanwhile, the method is a challenging problem in computer vision, the methods have own advantages and disadvantages, and even if the same significance detection method is used, the detection effect on different pictures is greatly different. Therefore, the results of a plurality of significance detection methods can be fused, and the method for obtaining a better significance map is particularly important. There are some traditional significant image fusion methods, which mostly simply add and average or simply multiply and average a plurality of significant images, and this significant image fusion method treats various significant images equally and sets the weights of various significance detections to the same value, which is unreasonable in practice because the detection effects of various significance detection methods are different for one image or even each pixel point, and therefore the weights of the significance detection methods should be set differently. Some methods for fusing multiple saliency maps also exist currently, for example, Mai et al uses Conditional Random Fields (CRF) to fuse multiple saliency maps to obtain good effect, but the effect on recall rate is not satisfactory.

Research [ L.Mai, Y.Niu, and F.Liu.Saliency Aggregation: A Data-driven N applications, IEEE Computer Society, CVPR 2013, page 1131 and 1138 ]) shows that the extraction performance of different extraction methods is different, and the extraction effect of different images by the same extraction method is different. However, in the case of no reference binary label, it is very difficult to determine how to determine the extraction effect of the saliency map, that is, how to select a saliency map with a good extraction effect from a plurality of saliency maps for fusion, and research is very rare.

The document [ Long M, Liu F. computing sales object detection Results with out group Truth [ C ]. European Conference on computing Vision. Springer International Publishing,2014:76-91 ] performs fusion of multiple saliency maps without reference binary labeling. This work defined 6 criteria for evaluating a good saliency map: the coverage of the salient region, the compactness of the salient map, the histogram of the salient map, the color separability of the salient region, the segmentation quality and the boundary quality of the salient map are sequenced according to the 6 rules, and finally the fused salient map is obtained. The method has large calculation amount and complicated processing process.

The invention patent application with the application number of CN106570851A discloses a saliency map fusion method based on a weighting distribution DS evidence theory, and solves the effective fusion problem of saliency maps obtained by a plurality of saliency detection methods. First, respective saliency maps are generated using a saliency detection method to be fused. And secondly, taking the obtained saliency maps as evidences, and defining the identification frameworks and the mass functions corresponding to the saliency detection methods according to the obtained saliency maps. And then, calculating the similarity coefficient and the similarity matrix of each evidence, and further obtaining the support degree and the trust degree of each evidence. And then, carrying out weighted average on the mass function values by taking the credibility as weight to obtain a saliency map. The weighted average evidence is then synthesized using a D-S synthesis rule to obtain another saliency map. And finally, weighting and summing the two obtained saliency maps again to obtain the final saliency map. In the method, a mass function is adopted for weighted average, but the application of the mass function in the D-S synthesis rule may cause that the size change of the conflict degree of the mass function affects the synthesis effect, so that the final saliency map is unclear.

The invention application with the application number of CN106780422A discloses a significant map fusion method based on Choquet integration, and solves the problem of effective fusion of significant maps obtained by a plurality of significance detection methods. First, respective saliency maps are generated using a saliency detection method to be fused. And secondly, calculating a similarity coefficient and a similarity matrix among the saliency maps so as to obtain the supported degree and the credibility of each saliency map. The confidence of each saliency map is then taken as the measure of blur value in the Choquet integral. At the same time, the saliency maps to be fused are sorted at the pixel level, and the sorted discrete saliency values are taken as non-negative real measurable functions in the Choquet integral. And finally, calculating a Choquet integral value to obtain a final saliency map. The method uses a Choquet integral method for significant map fusion, has larger workload, needs more calculation and is inconvenient to use.

Disclosure of Invention

In order to solve the technical problem, the invention provides a saliency map fusion method, which considers the difference of different extraction methods on different image extraction effects, and the fusion performance is greatly improved compared with that of a single method before fusion.

The first purpose of the invention is to provide a saliency map fusion method, which comprises the following steps:

step 1: preparing a training set;

step 2: and searching neighbors of the test image X in the training set, and fitting the saliency map of the test image X through the saliency map of the neighbor image to obtain a final saliency map.

Preferably, the training set includes a training image set D, corresponding reference binary label sets G, M extraction methods, and saliency map extraction results a of the M extraction methods.

In any of the above schemes, preferably, the step 2 includes the following sub-steps:

step 21: calculating the chi-square distance of the 256-dimensional color histograms of the test image X and the training set image;

step 22: k nearest neighbor obtained after retrieval

Each of the neighboring images X_kThe corresponding standard binary value is labeled α_k，

Representing the detection results of M methods of the adjacent images, wherein K is more than or equal to 1 and less than or equal to K;

step 23: calculating a vector beta;

step 24: calculating to obtain a final saliency map

In any of the above schemes, preferably, the step 23 is to calculate the vector β according to an objective function.

In any of the above schemes, preferably, the objective function formula is as follows:

the first term is a reconstruction error marked by the fusion result and the reference binary value, and the second term is a regular term.

In any of the above solutions, it is preferable that the vector β varies with a variation of the scale parameter λ.

In any of the above schemes, preferably, the closed-form solution formula of the vector β is as follows:

wherein ,P_k and B_kIs a matrix associated only with the K nearest neighbor image, I represents the identity matrix.

In any of the above aspects, it is preferable that the matrix P_k and B_kObtained in the training.

In any of the above solutions, preferably, the step 24 includes using the test image X and its corresponding M saliency maps

The calculation method of the saliency map obtained by fusion is as follows:

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

The coefficients representing the fusion are represented by the coefficients,

representing the significance map results obtained from the fusion.

In any of the above schemes, preferably, the step 24 is to apply the vector

Transforming into a matrix to obtain the final saliency map

The second purpose of the invention is to provide a saliency map fusion system, which comprises the following modules:

a training set and an image fitting module are provided,

the image fitting module is used for searching neighbors of the test image X in the training set and fitting the saliency map of the test image X through the saliency map of the neighbor image to obtain a final saliency map.

In any of the above schemes, preferably, the image fitting module works as follows:

step 22: k nearest neighbor obtained after retrieval

step 23: calculating a vector beta;

step 24: calculating to obtain a final saliency map

The calculation method of the saliency map obtained by fusion is as follows:

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

The coefficients representing the fusion are represented by the coefficients,

representing the significance map results obtained from the fusion.

In any of the above schemes, preferably, the step 24 is to apply the vector

Transforming into a matrix to obtain the final saliency map

The saliency map fusion method and the saliency map fusion system are simple in concept, and beneficial to developing a saliency region extraction method with high robustness, and the universality of the detection method is improved.

Drawings

Fig. 1 is a flow chart of a preferred embodiment of a saliency map fusion method according to the present invention.

FIG. 1A is a test image artwork of the embodiment shown in FIG. 1 according to the saliency map fusion method of the present invention.

FIG. 1B is a test image artwork of the embodiment shown in FIG. 1 according to the saliency map fusion method of the present invention.

FIG. 1C is a saliency map of various methods of the embodiment shown in FIG. 1 of a saliency map fusion method according to the present invention.

Fig. 1D is a graph of the fusion results of the embodiment shown in fig. 1 of the saliency map fusion method according to the present invention.

FIG. 2 is a PR graph of one embodiment of performance comparison results of the saliency map fusion method according to the present invention.

Fig. 2A is a ROC graph of the embodiment shown in fig. 2 for a saliency map fusion method according to the present invention.

Fig. 3 is a comparison diagram of an embodiment of the visual effects of the saliency map fusion method according to the present invention.

Fig. 4 is a comparison diagram of another embodiment of the visual effects of the saliency map fusion method according to the present invention.

Fig. 5 is a block diagram of a preferred embodiment of a saliency map fusion system according to the present invention.

Detailed Description

The invention is further illustrated with reference to the figures and the specific examples.

Example one

And step 100 is executed, a training process is performed, and a training set is prepared, wherein the training set comprises a training image set D, corresponding G, M extraction methods of the reference binary label set and saliency map extraction results A of the M extraction methods. Step 110 is performed to calculate the chi-squared distance of the 256-dimensional color histograms of the test image X and the training set image given a test image X as shown in fig. 1A. Step 120 is executed, the K nearest neighbor obtained after the retrieval

Each neighbor image X_kThe corresponding standard binary value is labeled α_k，

The results of M methods of detection of neighboring images are represented, and K is more than or equal to 1 and less than or equal to K, as shown in FIG. 1B. Step 130 is performed to formulate the fusion problem as a ridge regression problem according to the above assumptions, with the objective function as follows:

the first term is a reconstruction error marked by a fusion result and a reference binary value, the second term is a regular term, and the vector beta changes along with the change of the scale parameter lambda.

The closed-form solution for vector β is formulated as follows:

wherein ,

and

the matrix is only related to the K nearest neighbor image, the matrices can be obtained during training, and I represents an identity matrix. Step 140 is executed to use the test image X and its corresponding M saliency maps

(as shown in fig. 1C), the method for calculating the saliency map obtained by fusion is as follows:

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

β＝{β₁，β₂，…，β_MThe coefficients representing the fusion are represented by,

representing the fused saliency map results from said vector

Transforming into a matrix to obtain the final saliency map

As shown in fig. 1D.

Example two

The application belongs to the technical field of computer vision and the field of image processing, and discloses a saliency map fusion method. The invention observes that the extraction performance of different extraction methods is different, and the extraction effect of different images is also different even if the same extraction method is used. The saliency map fusion method provided by the invention considers the difference of different extraction methods on the extraction effects of different images, and the fusion performance is greatly improved compared with that of a single method before fusion.

Due to individual differences of images, each method cannot guarantee that the extraction performance on each image is better than that of all other methods. In order to overcome the problem, the application provides an image-dependent saliency map fusion model, so that the methods complement each other, and the performance of the extraction result is further improved.

Since the detection performance varies from image to image, the fusion method should be image-dependent, i.e. the parameters of the fusion are adaptive and vary from image to image.

Assume that there are M significant extraction methods. Input image X, predict M saliency maps

The basic assumption of the fusion method is that the fusion result can be obtained by linear combination of saliency maps.

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

representing the significance map results obtained from the fusion.

EXAMPLE III

In quantitative performance evaluation, the currently popular performance evaluation indexes are adopted:

(1) precision and recall curves (PR curves);

(2) receiver operating characteristic Curve (ROC Curve);

the inventive method is abbreviated as FBS and PR graphs are shown in FIG. 2. by comparing it with the other 14 popular methods (HS, MR, DRFI, PCA, HM, GC, MC, DSR, SBF, BD, SMD, MCDL, LEGS and RFCN), it can be seen that the PR curve of FBS is higher than that of all other methods.

ROC curves As shown in FIG. 2A, it can be seen that the ROC curves for FBS are higher than for all other methods by comparison with the other 14 popular methods (HS, MR, DRFI, PCA, HM, GC, MC, DSR, SBF, BD, SMD, MCDL, LEGS, and RFCN).

Example four

Typical images were selected for visual effect comparison of the FBS method and the MDCL method, as shown in fig. 3, the order of the images being: original image, standard binary annotation, FBS, MDCL. The MDCL method is a method that solely depends on deep learning features, and it can be seen that the arms, legs, and heads of a person extracted by the FBS method are more complete, have clearer boundaries, and have better detail processing than the arms, legs, and heads extracted by the MDCL method.

EXAMPLE five

Some typical images were selected for visual effect comparison of the FBS method and the DRFI method, and as shown in FIG. 4, the images in each column appear in the following order: original image, standard binary annotation, FBS extraction result and DRFI extraction result. The DRFI method is a method which only depends on artificial design characteristics, and it can be seen that the extraction results of fish, butterflies and flowers extracted by the FBS method are more complete, the boundaries are clearer and the details are better processed than those extracted by the DRFI method.

EXAMPLE six

As shown in fig. 5, the saliency map fusion system includes a training set 500 and an image fitting module 510.

The training set 500 includes a training image set D, corresponding G, M extraction methods of reference binary label sets, and saliency map extraction results a of the M extraction methods.

The image fitting module 510 works as follows:

step 21: the chi-squared distance of the 256-dimensional color histograms of the test image X and the training set image is calculated.

Step 22: k nearest neighbor obtained after retrieval

Representing the detection results of M methods of the adjacent images, and K is more than or equal to 1 and less than or equal to K.

Step 23: the vector β is calculated according to an objective function. The objective function is formulated as follows:

the first term is a reconstruction error marked by the fusion result and the reference binary value, and the second term is a regular term. The vector beta changes along with the change of the scale parameter lambda, and the closed solution formula of the vector beta is as follows:

wherein ,P_k and B_kIs a matrix related to K nearest neighbor images only, I represents an identity matrix, and P is a matrix_k and B_kObtained in training

Step 24: using the test image X and its corresponding M saliency maps

The calculation method of the saliency map obtained by fusion is as follows:

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

The coefficients representing the fusion are represented by the coefficients,

representing the significance map results obtained from the fusion. The vector is processed

Transforming into a matrix to obtain the final saliency map

For a better understanding of the present invention, the foregoing detailed description has been given in conjunction with specific embodiments thereof, but not with the intention of limiting the invention thereto. Any simple modifications of the above embodiments according to the technical essence of the present invention still fall within the scope of the technical solution of the present invention. In the present specification, each embodiment is described with emphasis on differences from other embodiments, and the same or similar parts between the respective embodiments may be referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims

1. A saliency map fusion method comprising the steps of:

step 1: preparing a training set;

2. The saliency map fusion method of claim 1, characterized by: the training set comprises a training image set D, corresponding reference binary label sets G, M extraction methods and saliency map extraction results A of the M extraction methods.

3. The saliency map fusion method of claim 2, characterized by: the step 2 comprises the following substeps:

step 22: k nearest neighbor obtained after retrieval

step 23: calculating a vector beta;

step 24: calculating to obtain a final saliency map

4. The saliency map fusion method of claim 3, characterized by: the step 23 is to calculate the vector β according to an objective function.

5. The saliency map fusion method of claim 4, characterized by: the objective function formula is as follows:

6. The saliency map fusion method of claim 5, characterized in that: the vector β varies with the variation of the scale parameter λ.

7. The saliency map fusion method of claim 6, characterized by: the closed-form solution formula of the vector β is as follows:

8. The saliency map fusion method of claim 7, characterized by: the matrix P_k and B_kObtained in the training.

9. The saliency map fusion method of claim 8, characterized by: the step 24 comprises using the test image X and its corresponding M saliency maps

The calculation method of the saliency map obtained by fusion is as follows:

wherein ,

a matrix of saliency maps representing the M predictions,

showing saliency map

representing the significance map results obtained from the fusion.

10. A saliency map fusion system comprising the following modules:

a training set and an image fitting module are provided,