WO2015180527A1

WO2015180527A1 - Image saliency detection method

Info

Publication number: WO2015180527A1
Application number: PCT/CN2015/075514
Authority: WO
Inventors: 袁春; 陈刚彪
Original assignee: 清华大学深圳研究生院
Priority date: 2014-05-26
Filing date: 2015-03-31
Publication date: 2015-12-03
Also published as: CN103996195B; CN103996195A

Abstract

Disclosed is an image saliency detection method, comprising the steps of: 1) conducting segmentation processing on an image to divide same into K image blocks with the size of M × N, where the values of M and N are set by a user; 2) calculating a characteristic value of each image block, the characteristic value comprising a brightness characteristic value, a colour characteristic value, a direction characteristic value, a depth characteristic value and a sparseness characteristic value; 3) quantifying various characteristic values of each image block to the same interval range, conducting fusion calculation on various characteristic values to obtain difference values between each image block and the remaining image blocks; and 4) determining a weighting coefficient, conducting weighted summation on the difference values between each image block and the remaining image blocks to obtain a saliency value of each image block. By introducing a depth characteristic and a sparseness characteristic based on the traditional characteristic value, the image saliency detection method of the present invention meets the characteristic of image observation by vision systems of human beings, thereby guaranteeing that saliency images obtained by processing meet the vision systems of human beings, and the saliency images are accurate.

Description

Image saliency detection method

[Technical Field]

The present invention relates to the field of computer vision, and in particular to an image saliency detection method.

【Background technique】

When humans observe an image, they usually focus on only a small, more significant part of the entire image or the entire video. Therefore, when a computer simulates a human visual system, it is mainly simulated by detecting a significant region in the image. Significant detection has gradually become a very important research topic in the field of computer vision. Significant detection has great development prospects in human-computer interaction, intelligent monitoring, image segmentation, image retrieval and automatic labeling. In this research field, how to use an effective method to accurately detect significant regions from images is a very important issue. There are many traditional methods for detecting saliency, but for some images, such as images with close and distant views in the image and distant views from the observer, the saliency detection of such images is not in line with the human visual system. The test results are not very accurate.

[Summary of the Invention]

The technical problem to be solved by the present invention is to make up for the deficiencies of the above prior art, and to provide an image saliency detection method, which is more in line with the human visual system and has more accurate detection results.

The technical problem of the present invention is solved by the following technical solutions:

An image saliency detecting method comprises the following steps: 1) performing block processing on an image, and dividing into K image blocks of size M×N; wherein values of K, M and N are set by a user; 2) Calculating feature values of each image block, the feature values including a brightness feature value, a color feature value, a direction feature value, a depth feature value, and a sparse feature value, wherein the depth feature value

Where λ ₁ and λ ₂ are constants, which are set by the user according to the range of depth values in the image and the range of quantization intervals when the eigenvalues are fused; max(deep(x, y)) represents the image block to be calculated. The maximum value of the depth value of the pixel; the sparse eigenvalue f=W×I, where W=A ⁻¹ , A represents the sparse coding unit, and the first M×N of the plurality of sparse coding units obtained by the ICA algorithm according to the independent variable analysis I represents the matrix of pixel values of M×N pixels in the image block to be calculated; 3) quantizes each feature value of the image block to the same interval range, and fuses each feature value to obtain each image block and the remaining image blocks. The difference value between the two; 4) determining the weighting coefficient, and weighting the difference value between each image block and the remaining image blocks to obtain a saliency value of each image block.

The beneficial effects of the present invention compared to the prior art are:

The image saliency detection method of the invention introduces depth features and sparse features on the basis of the traditional feature values, and introduces the depth features to distinguish the close-range and the distant view in the image, so that the close-range view closer to the observer in the saliency map is compared. The vision is more prominent, which is more in line with the observation principle that the human visual system pays more attention to the closer part of the human eye. The sparse feature is introduced and characterized by sparse coding unit, and the sparse coding unit is trained by the ICA algorithm, which is very similar to the characteristics of the human primary visual cortex receptive field, thereby further ensuring that the obtained saliency map conforms to the human visual system. The saliency map is more accurate when the resulting saliency map is more in line with the human visual system. Especially for the distant view in the image, the saliency detection method of the image of the present invention is more accurate than the conventional saliency detection method.

[Description of the Drawings]

1 is a flowchart of an image saliency detecting method according to an embodiment of the present invention;

2 is a view showing a processing result of processing an image including a close-up view by a detecting method according to an embodiment of the present invention;

3 is a view showing a processing result of processing an image including a distant view by a detecting method according to an embodiment of the present invention.

【detailed description】

The present invention will be further described in detail below in conjunction with the specific embodiments and with reference to the accompanying drawings.

The idea of the present invention is: based on the current regional contrast-based saliency detection method, the saliency value of the image block is calculated by weighted summation between the image block and the image block, and finally the significant value of the entire image is obtained. Sexual map. During the detection process, depth features and sparse features are introduced on the basis of traditional contrast feature values such as intensity, color and direction. The two visual features of depth information and sparse coding are introduced into the computational process of the saliency map, making the detection results more in line with human visual perception. Further, the present invention also introduces a central displacement method to correct the position of the center point in the image segmentation process, and divides the image with the significant center in the initial saliency map as the center point, thereby imitating the transfer process of the human eye focus, so that The resulting saliency map is more in line with the characteristics of the human visual system. Further, the human visual sharpness coefficient is used to weight the difference value of each image block, and the closer the image block is to the central block, the larger the weighting coefficient is, and the characteristics of the human visual system are more consistent, so that the detection result is more accurate. .

As shown in FIG. 1 , a flowchart of an image saliency detection method in the specific embodiment includes the following steps:

P1) Block processing: The image is subjected to block processing and divided into K image blocks of size M×N; wherein the values of K, M and N are set by the user. If M×N is set to be small and K is large, that is, the more detailed the divided blocks are, the subsequent calculation results are more accurate, but the corresponding calculation amount is also larger. If M × N is set larger, K is smaller, that is, The smaller the block is, the coarser it is, the smaller the subsequent calculation will be, but the accuracy of the calculation will be worse. Preferably, according to a plurality of experimental tests, when the image block is divided into sizes of 8×8, the calculation amount is not too large, and the calculation accuracy can also be satisfied.

Preferably, when the block processing is performed, the image is segmented by the region growing method, and the region growing method is used to block the significant central block of the image as the center. When this preferred setting is adopted, it is necessary to obtain an initial saliency map of the image in advance, and the block having the largest saliency value in the initial saliency map is taken as the center. In the traditional area growing method, when the block processing is performed, the physical center of the image is generally used as a center for segmentation, and the preferred setting uses a significant center as a center, which can imitate the transfer process of the human eye focus, so that the final remarkableness is obtained. The figure is more in line with the characteristics of the human visual system. This is because when looking for a specific target in the scene, the distribution of the focus of the human eye shifts from the center of the image to other positions according to the distribution law of the image features. Therefore, the center of the field, that is, the center of significance, has an important position, not the physical center of the image. Dividing the image blocks according to the significant center makes the division of the image blocks more in line with the characteristics of the significant regions in the image that the human visual system pays more attention to. Compared with the saliency map calculated by the image center division, the saliency map calculated by the image block is significantly centered. The accuracy is higher and the effect is better.

P2) Calculate the feature values of each image block. Specifically, the feature values include: a brightness feature value, a color feature value, a direction feature value, a depth feature value, and a sparse feature value.

In this step, the brightness feature, the color feature and the directional feature belong to the traditional contrast feature, which can be extracted by using the Gaussian pyramid and the center-surround operator. The corresponding mature calculation method can be calculated, and the following calculation formulas are only used as an example. The specific calculation process of the three eigenvalues will not be detailed.

Luminance characteristic value: M = (r + g + b) / 3;

Red and green color eigenvalues:

Blue and yellow color eigenvalues:

Directional feature value: M ₀ (σ)=||M*G ₀ (θ)||+||M*G _π/2 (θ)||;

In the above calculation formula, r, g, b respectively represent the r channel pixel value, the g channel pixel value and the b channel pixel value of the image block to be calculated. σ represents the number of Gaussian pyramid layers and is an integer between 0 and 8. θ represents the angle and takes values of 0°, 45°, 90° or 135°. G ₀ (θ) represents a Gabor filter operator in the 0 degree direction, and Gπ _/2 (θ) represents a Gabor filter operator in the 90 degree direction.

The introduced depth feature values are calculated as follows:

Here, λ ₁ and λ ₂ are constants, and are set by the user according to the range of the depth value in the image and the range of the range when the eigenvalues are fused. For example, in the image to be processed, the depth value d ranges from 0 to 255, and exp(-1/d) takes a value between 0 and 0.996, and all the feature values are quantized to 0 to 255 when the feature values are fused. In the range of the interval, when λ ₁ = 255 and λ ₂ =1 are set, the range of the depth eigenvalue is adjusted to 0 to 255, which satisfies the requirement. If the quantization interval ranges from 0 to 1, the values of λ ₁ and λ ₂ can be adjusted according to the above principle. For example, the depth value d in the image to be processed is concentrated in other interval ranges, and the values of λ ₁ and λ ₂ are also adjusted to adjust the range of the quantization interval to 0 to 255, thereby satisfying the requirements of the expected quantization interval range. In general, in a specific application process, λ ₁ and λ ₂ are comprehensively set by the user according to the range of depth values in the image and the range of quantization intervals when the eigenvalues are fused.

Where max(deep(x, y)) represents the maximum value of the depth value of the pixel in the image block to be calculated. For example, when the depth feature value of the image block p is calculated, that is, the maximum value of the depth value of the pixel brought into the image block p is calculated as max(deep(x, y)). When the image block q is calculated, the maximum value of the depth value of the pixel brought into the image block q is calculated as max(deep(x, y)).

The introduced sparse feature values are calculated according to the following formula: f = W × I;

Where W=A ^-1 , A denotes a sparse coding unit, and the first M×N of the plurality of sparse coding units obtained by analyzing the ICA algorithm according to the independent variable. If M=N=8 when the above block is used, then the first 64 are taken here. I represents a matrix of pixel values of M×N pixel points in the image block to be calculated. For example, when the image block p is calculated, that is, a matrix composed of pixel values of M × N pixel points of the image block p is taken. If the image block q is calculated, a matrix of pixel values of corresponding pixels in the image block q is correspondingly introduced.

The above-mentioned sparse feature is an attempt to find an ideal reversible weighting matrix W so that the image I can be expressed by the matrix W using sparse features. On the basis of the linear transformation of the picture, the ICA algorithm decomposes the picture into independent components, ie sparse coding units, and the image can be represented as a linear combination of a set of sparse coding units, I=∑f×A, where the sparse coding unit A passes A large number of image blocks are trained by the ICA algorithm. According to the ICA algorithm, W = A ^-1 can be determined, thereby determining that the reversible weighting matrix W is obtained.

There are various specific methods for determining the sparse coding unit A by the ICA algorithm. Preferably, a fixed point algorithm is adopted, and a fixed number algorithm is used to train a large number of image blocks to obtain 192 data, and the first M×N data is used as the sparse coding. unit.

In summary, after each feature value of each image block is calculated, the process proceeds to step P3).

P3) Quantifying each feature value of the image block to the same interval range, and fusing each feature value to obtain a difference value between each image block and the remaining image blocks.

In this step, specifically, the difference value D _pq between the current image block p and the image block q can be calculated according to the following formula:

Here, Fi(p) represents the quantized feature value of the current image block p when the feature i, and Fi(q) represents the quantized feature value of the image block q when the feature i. Specifically, the image feature p, the luminance feature value, the color feature value, the direction feature value, the depth feature value, and the sparse feature value of the image block q are quantized to the same interval range, and then the feature values of the image block p under each feature are The absolute value of the eigenvalue difference value of the image block q is summed and calculated to obtain a difference value between the image block p and the image block q. Centering on the current image block p, traversing the remaining (K-1) image blocks in the image, and calculating the difference value between the current image block p and the remaining (K-1) image blocks.

P4) determining a weighting coefficient, and weighting the difference value between each image block and the remaining image blocks to obtain a saliency value of each image block.

In this step, the Euclidean distance Ds(pq) between the image block and the image block can generally be used as the weighting coefficient, and the significance value Sp=∑Ds(pq)×D _{pq of the} image block p can be calculated. Preferably, the human visual sharpness coefficient is used as the weighting coefficient, so that the saliency calculation result of the image is more consistent with the salient region of the real reaction image. specifically,

Defining human visual sharpness

Where T(f, e) represents the contrast threshold, and based on the experimental results, the contrast threshold can be expressed as a function of spatial frequency and retinal eccentricity.

In the formula, T ₀ is the minimum value of the contrast threshold, T ₀ =1/64; α is the spatial frequency attenuation constant, α=0.106; f is the spatial frequency, f=4; e ₂ is the half-resolution eccentricity, e ₂ = 2.3; e is the retinal eccentricity, determined by the center point of the two image blocks.

After using the human visual sharpness coefficient as the weighting coefficient, the significance value is calculated as:

Where e _pq represents the retinal eccentricity of the center point of the image block q relative to the center point of the image block p, and the contrast threshold is calculated by taking the function T(f, e), thereby calculating the image block p and the image block. The weighting coefficient C(f, e) of the difference value of q. D _pq represents the difference value between the image block p and the image block q. Centering on the current image block p, traversing the remaining (K-1) image blocks in the image, and calculating the corresponding human visual sharpness coefficient according to the retinal eccentricity of the current image block p and the remaining (K-1) image blocks. The coefficient is used to weight the difference value between the current image block p and the remaining (K-1) image blocks, and the saliency value of the current image block p is obtained by weighting. Similarly, the significance value of each image block is calculated.

Introducing the above-mentioned visual sharpness coefficient, according to the law of retinal eccentricity, the image block closer to the current image block p has a lower retinal eccentricity e, and accordingly, the lower the contrast threshold T(f, e), setting the human visual sharpness coefficient

The closer the image block is, the higher the visual sharpness factor, and the farther the image block has the lower visual sharpness factor. The visual sharpness coefficient is introduced to weight the difference value between different image blocks. The visual sharpness coefficient accords with the principle that the human eye vision pays more attention to the salient region. Compared with the Euclidean distance, the weight is more consistent with the biological characteristics. The calculated significance value of the current image block p is closer to the result observed by the human eye, and the calculation is more accurate.

In summary, through the steps P1) to P4), the saliency values of the image blocks are calculated, and the saliency values of the image blocks are integrated to obtain the saliency map of the original image. When the saliency value is calculated in the specific embodiment, the depth feature and the sparse feature are introduced, and the depth feature is introduced, so that the detection result is more in line with the characteristics of the region where the human visual system is closer to the human eye, and the sparse feature is introduced. The sparse coding unit is calculated by the ICA algorithm. The coefficient coding unit is very similar to the characteristics of the human primary visual cortex receptive field, which can simulate the characteristics of the human primary visual cortex receptive field, and also make the result more in line with the human visual system. In the specific embodiment, two kinds of visual features, depth information and sparse coding are introduced, so that the detection result is more in line with human visual perception, and the saliency map is more accurate. Especially for the distant view in the image, the saliency detection method of the image of the present invention is more accurate than the conventional saliency detection method.

As shown in FIG. 2 and FIG. 3, the test results of the near view and the distant view image are respectively processed by the method of the present embodiment. Fig. 2a is an original image containing a close-up view, and Fig. 2b is a saliency map obtained after the process. Fig. 3a is an original image containing a distant view, and Fig. 3b is a saliency map obtained after the process. From the processing results, accurate and significant region detection results can be obtained, and the sparse targets farther away from the observer can be better segmented into the background, and have better segmentation effects for sparse targets farther away from the observer. . Even if it is not in the significant area of the center of the image, it can be accurately detected, which is more in line with the human visual system. The salient region detection method in the specific embodiment can be well applied in image segmentation, retrieval, target recognition and the like.

The above content is a further detailed description of the present invention in combination with specific preferred embodiments, and cannot be determined. The specific implementation of the invention is limited only to these descriptions. It will be apparent to those skilled in the art that <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt;

Claims

An image saliency detecting method, comprising: the following steps:

1) The image is subjected to block processing and divided into K image blocks of size M×N; wherein the values of K, M and N are set by the user;

2) calculating feature values of each image block, the feature values including a brightness feature value, a color feature value, a direction feature value, a depth feature value, and a sparse feature value, wherein the depth feature value
Where λ 1 and λ 2 are constants, which are set by the user according to the range of depth values in the image and the range of quantization intervals when the eigenvalues are fused; max(deep(x, y)) represents the image block to be calculated. The maximum value of the depth value of the pixel; the sparse eigenvalue f=W×I, where W=A −1 , A represents the sparse coding unit, and the first M×N of the plurality of sparse coding units obtained by the ICA algorithm according to the independent variable analysis ;I represents a matrix of pixel values of M×N pixels in the image block to be calculated;

3) quantizing each feature value of the image block to the same interval range, and fusing each feature value to obtain a difference value between each image block and the remaining image blocks;

4) Determine the weighting coefficient, and calculate the saliency value of each image block by weighting and summing the difference values between the image blocks and the remaining image blocks.
The image saliency detecting method according to claim 1, wherein in the step 1), the image is subjected to block processing by using a region growing method, and the significant central block of the image is selected as a center for blocking processing. The significant center block of the image is the block with the most significant value in the initial saliency map of the image.
The image saliency detecting method according to claim 1, wherein in the step 4), the human visual sharpness coefficient is used as a weighting coefficient, and the human visual sharpness coefficient is used.
Where T(f, e) represents the contrast threshold,
Where T 0 is the minimum value of the contrast threshold, T 0 =1/64; α is the spatial frequency attenuation constant, α=0.106; f is the spatial frequency, f=4; e is the retinal eccentricity; e 2 is the half resolution Eccentricity, e 2 = 2.3; significance value of the current image block p
Here , e pq represents the retinal eccentricity of the center point of the image block q with respect to the center point of the image block p, and D pq represents the difference value between the image block p and the image block q.
The image saliency detecting method according to claim 1, wherein in the step 2), the depth value in the image ranges from 0 to 255, and the quantization interval in the eigenvalue fusion ranges from 0 to 255. Set λ 1 = 255, λ 2 =1.
The image saliency detecting method according to claim 1, wherein in the step 3), the difference value D pq between the current image block p and the image block q is calculated according to the following formula:
Here, Fi(p) represents the quantized feature value of the current image block p when the feature i, and Fi(q) represents the quantized feature value of the image block q when the feature i.
The image saliency detecting method according to claim 1, wherein in the step 1), M=8 and N=8 are set; in the step 2), A is obtained by analyzing the ICA algorithm according to the independent variable. The first 64 of the sparse coding units.
The image saliency detection method according to claim 1, wherein the independent variable analysis ICA algorithm in the step 2) is a fixed point algorithm.