WO2020107717A1

WO2020107717A1 - Visual saliency region detection method and apparatus

Info

Publication number: WO2020107717A1
Application number: PCT/CN2019/075206
Authority: WO
Inventors: 陈沅涛; 王进; 王磊; 张建明; 陈曦; 王志; 邝利丹; 谷科
Original assignee: 长沙理工大学
Priority date: 2018-11-30
Filing date: 2019-02-15
Publication date: 2020-06-04
Also published as: CN109543701A

Abstract

Disclosed are a visual saliency region detection method and apparatus, wherein the method comprises: first using a random search method to select, from an image to be processed, multiple candidate image region blocks for each pixel point and then select, from the candidate image region blocks, a target image region block having the highest similarity with the pixel point; according to similarities between target image blocks and pixel points, calculating visual saliency values of the pixel points; based on the visual saliency values of the pixel points of the image to be processed, generating visual saliency maps of various layers of the image to be processed; and finally, according to the visual saliency values of visual saliency maps of two adjacent layers, calculating weighted values of the visual saliency maps of various layers, and fusing the visual saliency maps of various layers according to the weighted values. The present application improves the efficiency of calculation of a visual saliency value of a pixel point so as to effectively improve detection efficiency for a visual saliency region of an image to be processed, and further improves detection accuracy and precision for the visual saliency region in the image to be processed.

Description

Visual salient region detection method and device

Technical field

The embodiments of the present invention relate to the field of machine vision technology, and in particular, to a method and device for detecting a visually significant area.

Background technique

With the rapid development of machine vision technology, visual saliency area detection based on image processing has been an urgent development as a key link in the implementation of this technology.

The visual attention mechanism is an important sensory feature possessed by humans and high IQ animals. This mechanism can quickly find target objects with visual significance and interest from massive visual information, and can quickly "ignore" other uninteresting target objects. Reduce the amount of calculation in the information processing process. Therefore, introducing the visual attention mechanism into the image segmentation process can significantly improve the efficiency of the image processing process. In the process of image segmentation and image parsing, we usually maintain interest in certain areas on the image to be processed, and these area parts are called areas of interest or target areas. In general, the target is the region of interest with special properties in the image. In order to extract the target, it is necessary to perform a segmentation operation from the image, and after the segmentation, further analysis and related processing work can be performed. The image segmentation process is to divide the image into several independent areas with specific uniformity, so as to achieve the image processing method of extracting the area of interest from the complex background of the image.

How to detect the region of interest from the image to be processed involves the target detection method of the image and video. This method changes the image sequence or video resolution, so that the image sequence and video can be perfectly displayed, and try to save the image and The key content in the video. In the process of image and video target detection based on image content, how to quickly detect the visual saliency area is a problem that needs to be solved.

The visual saliency area detection algorithm in the prior art focuses on finding fixed image pixels or target objects that human vision first notices, and can focus on understanding human vision pixel points and related applications, such as autofocus applications. According to the context content of the region block to express the position of the pixel, it is proposed that the visual saliency of the image pixel is expressed by the relevant area in the center of the pixel. For example, if the difference between the relevant image area block of the center pixel point and the other image area blocks in the original image is large, the pixel point can be regarded as a pixel point with high visual significance.

The area detection methods of visual saliency in the related art are all traditional methods that calculate the system visual saliency value of individual pixels every time. The number of pixels will increase the computational complexity, and even a high-dimensional vector search tree structure can be executed. This method has high time complexity and space complexity, and is suitable for small objects in the original image.

Summary of the invention

The embodiments of the present disclosure provide a method and a device for detecting a visual saliency area, which can quickly and accurately detect the visual saliency area in an image to be processed.

To solve the above technical problems, the embodiments of the present invention provide the following technical solutions:

An aspect of an embodiment of the present invention provides a visual saliency area detection method, including:

Use a random search method to select multiple candidate image area blocks for the current pixel from the image to be processed, and select target image area blocks that satisfy the similarity condition from each candidate image area block;

Calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel;

Generating a visual saliency map at each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

Based on the visual saliency values of the two adjacent layers of visual saliency maps, the weight values of the visual saliency maps of each layer are calculated to be used to fuse the visual saliency maps of each layer according to the weight values of the visual saliency maps of each layer.

Optionally, after calculating the weight value of the visual saliency map of each layer according to the visual saliency value of the visual saliency maps of two adjacent layers, the method further includes:

Select multiple preset position points in the visual saliency map obtained after fusion;

For each preset position point, according to the visual saliency value of each pixel, it is divided into an enhanced pixel point set and a weakened pixel point set, and the visual saliency value of the pixel point in the enhanced pixel point set is greater than the weakened The visual significance value of the pixels in the pixel set;

Use the image enhancement method to perform enhancement processing on each pixel in the enhanced pixel set;

The image weakening method is used to weaken each pixel in the weakened pixel set.

Optionally, based on the visual saliency value of each pixel of the image to be processed, generating a visual saliency map at each level of the image to be processed includes:

Generating rough visual saliency maps at various levels of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

Perform detailed operations on each rough visual saliency map to remove image noise signals, and obtain respective corresponding detailed visual saliency maps.

Optionally, the selecting a target image area block satisfying the similarity condition from each candidate image area block includes:

Calculating the similarity metric values of the current pixel point and each candidate image area block separately;

Delete the candidate image area blocks corresponding to the similarity metric value lower than the similarity threshold, and use the remaining candidate image area blocks as the target image area blocks.

Optionally, calculating the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel includes:

Use the following formula to calculate the visual significance value S of the current pixel:

among them,

Where dist(r _i , r _k ) is the dissimilarity measure between the current pixel position and the Kth target image block, K is the total number of target image blocks, dist _color (r _i , R _k ) is the Euclidean distance in the HSV color space between the target image area block subjected to vectorization and the location of the current pixel, and dist _pos (r _i , r _k ) is the location of the current pixel The Euclidean distance between the Kth target image block.

Calculating the local system saliency value of each pixel of the image to be processed based on the frequency domain to generate a local system saliency map;

Use the following formula to calculate the global system saliency value V _Global (x, y) of each pixel of the image to be processed to generate a global system saliency map:

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f(x, y) is the significance function for solving the pixel (x, y), and f _average (x, y) is f(x, y). Arithmetic average, the size of the image to be processed is M*N;

Use the following formula to calculate the scarcity saliency value V _Scarcity (x,y) of each pixel of the image to be processed to generate a scarcity saliency map:

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f _average (x, y) is the arithmetic average value of f (x, y), and h (f _average (x, y)) is the image to be processed The generated feature histogram;

The local system saliency map, the global system saliency map, and the scarcity saliency map are fused to obtain visual saliency maps at various levels of the image to be processed.

Optionally, the fusing the local system saliency map, the global system saliency map, and the scarcity saliency map to obtain visual saliency maps at various levels of the image to be processed includes:

For each layer of the image to be processed, the visual saliency value V _final of each pixel of the image to be processed is calculated using the following formula to generate a visual saliency map:

In the formula, V _Local , V _Global , and V _Scarcity are the local system saliency value, global system saliency value, and scarcity saliency value of the pixels whose horizontal and vertical coordinates are x and y, and v ₁ is the local system saliency value. The weight value of, v ₂ is the weight value of the global system saliency value, v ₃ is the weight value of the scarcity saliency value; i=1, V _i = V ₁ is the local system saliency value, i=2, V _i = V ₂ is the significance value of all the systems, i = 3, V _i = V ₃ is the scarcity significance value, and V ₁ , V ₂ , and V ₃ are based on

Calculated.

Optionally, the calculating the local system saliency value of each pixel of the image to be processed based on the frequency domain includes:

Use the following formula to calculate the local system saliency value V _Local (x,y) of each pixel of the image to be processed:

In the formula, x and y are the horizontal and vertical coordinates of the pixel, FFT(u,v) is the characteristic value of the pixel, |FFT(u,v)e ^jψ(u,v) | is after the fast Fourier transform The obtained image amplitude spectrum, ψ(u,v) is the phase spectrum of the image to be processed.

Optionally, the weight values of the visual saliency maps of each layer are calculated according to the visual saliency values of the two adjacent layers of visual saliency maps, and used to fuse the visual saliency maps of each layer according to the weight values of the visual saliency maps of each layer include:

Use the following formula to calculate the weight value of each layer of visual saliency map:

In the formula, p is the pixel position,

Is the weight value of layer i,

Is the weight value of layer i-1,

Is the visual significance value of layer i,

Is the visual significance value of layer i-1;

Use the following formula to fuse the visual saliency maps of two adjacent layers to obtain a fusion visual saliency map;

In the formula,

Is the visual saliency value of the pixel at the p position of the fusion visual saliency map.

Another aspect of an embodiment of the present invention provides a visual saliency area detection device, including:

The random search module is used to select multiple candidate image area blocks for the current pixel from the to-be-processed image using a random search method, and select target image area blocks satisfying the similarity condition from each candidate image area block;

The visual saliency value calculation module is used to calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel;

A multi-layer visual saliency map generation module, used to generate visual saliency maps of each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

The visual saliency map fusion module is used to calculate the weight value of the visual saliency map of each layer according to the visual saliency value of the adjacent two layers of visual saliency maps, which is used to visually map each layer according to the weight value of the visual saliency map of each layer Perform fusion.

An embodiment of the present invention further provides a visual saliency area detection device, including a processor, which is used to implement the steps of the visual saliency area detection method described in any one of the preceding items when the processor is used to execute a computer program stored in a memory.

An embodiment of the present invention finally provides a computer-readable storage medium that stores a visual saliency area detection program stored on the computer-readable storage medium. The visual salience area detection program is implemented by the processor as any of the foregoing The steps of the method for visually significant area detection described in the item.

The technical solution provided by the present application has the advantage that when calculating the visual saliency value of each pixel of the image to be processed, a random search method is used to select multiple candidate image area blocks from the image to be processed, and then multiple candidate areas Select the image area block with the highest similarity to the current pixel in the image block, no need to search from each pixel of the image to be processed, shorten the search time of the image area block with the highest similarity, and do not limit the scale of the image pixel , Which solves the shortcomings in the related art that it is necessary to find the image region block with the highest similarity from all pixels in the image to be processed, greatly improving the calculation efficiency of the pixel's visual saliency value, thereby effectively improving the image to be processed The detection efficiency of the visual saliency area in the image is suitable for the calculation of the visual saliency value of the pixels of the image to be processed at any scale; in addition, by combining multiple levels of visual saliency maps, it is not only beneficial to eliminate the noise signal in the visual saliency map It can also accurately merge the salient features in the image to be processed into the visual salient feature map, which is helpful to improve the detection accuracy and precision of the visual salient regions in the image to be processed.

It should be understood that the above general description and the following detailed description are only exemplary and do not limit the present disclosure.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions of the embodiments of the present invention or related technologies, the following will briefly introduce the drawings used in the description of the embodiments or related technologies. Obviously, the drawings in the following description are only the invention For some of the embodiments, for those of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.

FIG. 1 is a schematic flowchart of a method for detecting a visually significant region according to an embodiment of the present invention;

2 is a schematic flowchart of another method for detecting a visually significant region according to an embodiment of the present invention;

FIG. 3 is a schematic diagram comparing ROC curves of the technical solution of the present application and related technologies according to an exemplary embodiment of the present disclosure;

4 is a schematic diagram of comparison between ROC curves of the technical solution of the present application and related technologies according to another exemplary embodiment of the present disclosure;

5 is a schematic diagram of target contours in a schematic image extracted by using the technical solution of the present application provided by the present disclosure;

6 is a structural diagram of a specific implementation manner of a visual saliency area detection device provided by an embodiment of the present invention;

7 is a structural diagram of another specific implementation manner of a visual saliency area detection device provided by an embodiment of the present invention.

detailed description

In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

The terms “first”, “second”, “third” and “fourth” in the description and claims of this application and the above drawings are used to distinguish different objects, not to describe a specific order . In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but may include steps or units that are not listed.

After introducing the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.

Referring first to FIG. 1, FIG. 1 is a schematic flowchart of a method for detecting a visually significant region according to an embodiment of the present invention. The embodiment of the present invention may include the following:

S101: Use a random search method to select multiple candidate image area blocks for the current pixel from the image to be processed, and select target image area blocks that satisfy the similarity condition from each candidate image area block.

For each pixel in the image to be processed, a random search method is used to select multiple candidate image area blocks (for example, 2K candidate image area blocks). The current pixel here refers to the random search method at the current time. Select pixels of candidate image blocks in the processed image.

The similarity condition is to select those image area blocks that have the highest similarity to pixels from the candidate image area blocks. The similarity condition can be based on the actual application scenario (such as the number of candidate image area blocks, the candidate image area block and the pixel's Similarity value) to determine a similarity threshold, and delete the image area blocks below the threshold.

Taking the candidate image area blocks as 2K, for example, K=32, and the target image area blocks as K as an example, the implementation process of S101 is described.

The two-dimensional coordinate mapping function of the image to be processed is f(x): R→V, that is, the two-dimensional coordinate mapping function f(x) is defined on the coordinates of all pixels in the image to be processed, and V represents the result obtained after performing the normalization process Visual significance value. Set the r pixel point on the image R to be processed, the visual saliency value corresponding to the image R to be processed on [0,1] is v, and the mapping function is f(r)=v. The function f value is the visual saliency value normalized to [0,1], and the visual saliency value can be stored in a two-dimensional array of the same size as R.

Let v=f(r) be the calculation of the visual saliency value through a plurality of related image area blocks close to the position of the pixel point r. The specific method is shown in Equation 1.

r _i = r+δβ ⁱ R _i ; (1)

In formula 1, R _i is a uniformly distributed random variable, and its range is limited to [-1,1]×[-1,1]; δ is 1/2 of the size of the image to be processed; β is the window The attenuation parameter is used to increase i=1 to i=2K image area blocks until the search radius ωR _{i is} reduced to a single pixel. If i<2K, then make i=1 until the number of test candidate area blocks satisfies 2K, in the specific implementation part β=0.75.

According to Equation 1 dissimilarity measure m _i is calculated between the block and the candidate region of pixels in the r 2K block in the candidate region. In this application, only the image area blocks with a small 1/2 dissimilarity value among the 2K candidate image blocks will be retained, and the remaining 1/2 image area blocks will be discarded. Select 2K candidate image area blocks according to any position approximate to the pixel point r. These approximate image area blocks are part of all possible candidate areas in the image R to be processed. Since this is an incomplete sample acquisition process, it is incomplete Sample sampling will introduce a certain degree of sample error to the subsequent visual saliency map. However, as the number of samples 2K gradually becomes larger, the sample error will definitely decrease significantly.

Compared with the related art, only a single visually significant image area map with a width or height of 256 pixels can be detected. Because each pixel needs to perform visual saliency calculation, the amount of calculation is very large, and in order to find the most similar image area blocks at the fastest speed, a search decision tree based on high-dimensional vectors needs to be constructed on the sparse grid. Therefore, if the visual significance value of each pixel is calculated one by one, the whole process will be very slow. This application does not search for the image area block with the highest similarity among all pixels of the entire image to calculate the visual significance value of the pixel. Random search method 2K image areas randomly selected from all the blocks of pixels in the image, and the image areas to ensure wherein K r _i highest similarity blocks and domain blocks in the image, the other image area K block is discarded. This application only needs to randomly extract 2K image area blocks, and do not build an auxiliary high-dimensional vector search decision tree structure to improve search efficiency.

S102: Calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel.

dist _color (r _i , r _j ) is the Euclidean distance between the image region blocks r _i and r _j in the HSV color space after vectorization, and the normalization operation corresponds to the range of [0,1]; when When dist _color (r _i , r _j ) is relatively large with respect to any image region block r _j , the pixel i is considered to be visually significant.

dist _pos (r _i , r _j ) is the Euclidean distance between the image area blocks r _i and r _j , and the process of this Euclidean distance is normalized to correspond to the range [0,1]. The measurement method that can measure the dissimilarity between two corresponding image area blocks is shown in the following formula:

In view of this, in a specific embodiment, the visual saliency value S of the current pixel can be calculated using Equation 2:

among them,

Where dist(r _i , r _k ) is the dissimilarity measure between the current pixel position and the Kth target image block, K is the total number of target image blocks, dist _color (r _i , r _k ) is the Euclidean distance in the HSV color space between the target image area block after vectorization and the current pixel position, and dist _pos (r _i , r _k ) is the position of the current pixel point and the Kth target image block European distance between.

S103: Generate a visual saliency map at each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed.

The visual saliency measurement process represents the visual saliency map by calculating the visual saliency value of each image pixel and using the grayscale image with the same size and scale as the original input image. Among them, the visual saliency value of each pixel represents the visual saliency value of the relevant position in the original image. The larger the visual saliency value, the more the pixel of the image can highlight its visual saliency on the original input image, and the easier it is for the attention of human observers.

For each level of the image to be processed, the corresponding visual saliency map can be generated for each level according to the same method.

S104: Calculate the weight values of the visual saliency maps of each layer according to the visual saliency values of the adjacent two layers of visual saliency maps, and use them to fuse the visual saliency maps of each layer according to the weight values of the visual saliency maps of each layer.

Considering that the random search algorithm will generate a lot of noise, even with certain noise reduction methods, it is impossible to completely remove the red noise information in each visual saliency map. In order to further integrate the multi-scale visual saliency features into the final result of the visual saliency map Among them, the obtained multi-level visual saliency maps can be merged, and finally the combined multi-level visual saliency result map can be generated, for example, the following formula can be used for fusion:

Is the merged multi-scale visual saliency map on layer i,

Is the multi-scale visual saliency map of layer i,

It is the detailed visual saliency result graph of the i-1th layer closest to the ith layer. in case

versus

There is a difference in the size, so when carrying out multi-level merge, you need to put

The size of

The size is close.

In the technical solution provided by the embodiment of the present invention, when calculating the visual saliency value of each pixel of the image to be processed, a random search method is used to select multiple candidate image area blocks from the image to be processed, and then multiple candidates Select the image area block with the highest similarity to the current pixel in the area image block, there is no need to search from each pixel of the image to be processed, which shortens the search time of the image area block with the highest similarity, and does not limit the image pixel Scale, which solves the disadvantages of the related art that it is necessary to find the image region block with the highest similarity from all pixels in the image to be processed, greatly improving the calculation efficiency of the pixel visual significance value, thereby effectively improving the processing The detection efficiency of the visual saliency area in the image is suitable for the calculation of the visual saliency value of the pixels of the image to be processed at any scale; in addition, by combining multiple levels of visual saliency maps, it is not only beneficial to remove the noise in the visual saliency map The signal can also accurately integrate the salient features in the image to be processed into a visual salient feature map, which is helpful to improve the detection accuracy and precision of the visually significant regions in the image to be processed.

The degree of refinement required for the visual saliency result graph is not high, but if an application scenario that requires a higher detection speed is required, using S101-S104 of the above embodiment can quickly generate a rough visual saliency map. When it is required that the degree of refinement and the original input image completely match the application scene of the same visual saliency map, the visual saliency map obtained in the foregoing embodiment may also be enhanced and merged to generate a high-quality visual saliency result map. If a higher visual saliency value appears at a location, it indicates that the location has a higher image reliability, so you can selectively merge those visual saliency areas with high noise signals to obtain the final Graph of visual significance results.

In a specific embodiment, the implementation of the process of generating a visual saliency map according to the visual saliency value in S103 may be as follows:

The area saliency value of the image to be processed depends on the difference between its own characteristics and the surrounding environment. If a certain area in the image is a salient area, there are one or more distinguishing features between the salient area and the surrounding area. In different images, the influence of the same feature on visual saliency is also different. Both brightness and color can be used as visual saliency features, so different image visual features need to be extracted during image preprocessing. For example, visual features such as brightness, direction, and color can be extracted for measurement. After many experiments in this application, it has been found that the directional visual features do not play a significant role in the image, and at the same time increase the algorithm time complexity. In view of this, this application only extracts the color and brightness features of the image to be processed.

The HSV color space uses hue (Hue), saturation S (Saturation), and brightness V (Value) to describe specific colors. HSV is more in line with human visual characteristics than RGB. Equation 3 can be used to convert the image to be processed from RGB space to HSV space in order to extract color and brightness features.

Where h is the H channel value in the HSV color space, l is the arithmetic mean of the H channel value, s is the S channel value in the HSV color space, v is the V channel value in the HSV color space, and g is the R channel value of the RGB primary colors , B is the B channel value of the RGB primary colors, r is the R channel value of the RGB primary colors, max is the maximum value of each corresponding channel value, and min is the minimum value of each corresponding channel value.

In order to improve the accuracy of visual saliency area detection, when extracting the saliency features of the image to be processed, three attributes of local system saliency, global system saliency, and scarcity saliency can be used to form a system saliency map.

The visual saliency of each pixel on the image to be processed does not only depend on the characteristic value of the pixel, but also needs to be based on the difference between the pixel and its surrounding pixels. The greater the degree of difference, the more visually significant the pixel is. Related technologies calculate the visual saliency through the degree of difference between each pixel of the image and the surrounding area, but the size of the area is difficult to determine, and the amount of algorithm calculation is very large. In order to solve these problems in the related art, the present application can analyze and calculate the local system saliency of image pixels from the frequency domain. The amplitude spectrum and the phase spectrum have different functions in the image frequency domain features: the amplitude spectrum contains each pixel The specific size of the point feature value; the phase spectrum contains the image structure feature, which can reflect the change of the image pixel feature value.

The inventor of the present application discovered through research that the phase spectrum and the amplitude spectrum play different roles in the image construction process. Simply using the phase spectrum to construct the result image can obtain a result with a similar structure to the original image; but only using the amplitude spectrum to construct the result image, the resulting image is very different from the original image.

Therefore, Equation 3 can be used to perform local system saliency calculation for each feature image in advance. Firstly, the Fast Fourier Transform (FFT) is performed on the input image to extract the phase spectrum and the amplitude spectrum; then the phase spectrum is applied to the image to construct the local system saliency map of each feature image.

In the formula, x and y are the horizontal and vertical coordinates of the pixel, FFT(u,v) is the characteristic value of the pixel, |FFT(u,v)e ^jψ(u,v) | is after the fast Fourier transform The resulting image amplitude spectrum, ψ(u,v) is the phase spectrum of the image to be processed, and V _Local is the local system saliency value of each image pixel in the image.

If only the local system saliency value is considered, the global system saliency of the sharply changing edge or complex background area in the image will be higher, while the local system saliency of the smooth area and the target is lower. To solve the above problems, global system saliency can be introduced.

The global system saliency of pixels is used to measure the degree of visual saliency of the image pixels on the entire image. Formula 5 can be applied to generate the global system saliency of the image to be processed. V _Global (x, y) in Equation 5 is the global system saliency value of each pixel in the image to be processed.

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f(x, y) is the significance function for solving the pixel (x, y), and f _average (x, y) is f(x, y). The arithmetic average, the size of the image to be processed is M*N.

The scarcity saliency means that the probability of a certain feature value appearing on the entire image is very low, then the image pixels with this feature value are "unusual", the higher the visual significance value of the image pixel. Equation 6 can be applied to measure the scarcity saliency characteristics of each pixel in the input image. H(f _average (x, y)) in Equation 6 is the feature histogram generated by the image to be processed, and V _Scarcity (x, y) is the scarcity significance value of the pixel:

After calculating the local system saliency value, global system saliency value and scarcity saliency value of each pixel of the image to be processed, correspondingly, the local system saliency map, global system saliency map and scarcity saliency map can be generated. The final visual saliency map is the fusion of the three visual saliency maps. Optionally, formula 7 can be used to calculate the local system saliency, global system saliency, and The measurement results of scarcity saliency are fused to obtain the final system saliency map.

In the formula, V _final is the visual saliency value of each pixel of the image to be processed, V _Local (x, y), V _Global (x, y), V _Scarcity (x, y) (V _Local , V _Global , V _Scarcity ) is the local system saliency value, global system saliency value, and scarcity saliency value of pixels with x and y horizontal and vertical coordinates in sequence; v ₁ is the weight value of the local system saliency value, and v ₂ is the global system The weight value of the saliency value, v ₃ is the weight value of the scarce saliency value; i=1, V _i = V ₁ is the local system saliency value, i=2, V _i = V ₂ is the saliency value of all systems, i = 3, V _i = V ₃ is the scarcity significance value, V ₁ , V ₂ , V ₃ are based on

Calculated.

Different feature saliency maps have different contributions to image segmentation. Some visual saliency maps can effectively express visual saliency areas, while some visual saliency maps cannot achieve the desired effect. Therefore, a reasonable feature fusion strategy is needed to fuse multiple obtained feature saliency maps and finally generate a system visual saliency map. From the number, location, and distribution of visual salience points, dynamic feature selection and weighting processing are performed.

First, it is necessary to perform image threshold calculation for the feature saliency map, and extract pixel points whose visual saliency value is greater than the threshold value as salient points. According to the scarcity saliency criterion, the greater the number of saliency points on the feature saliency map, the smaller the contribution of this feature saliency map to the system saliency map. The weight W _region is defined as the number of salient points, as shown in Equation 8.

W _region = N _saliency ; (8)

The attention area of the human for the image is easily concentrated in the central area of the image. The pixels in the central area of the image and the surrounding area are more likely to be prominent areas. Therefore, the weight value W _pixel can be defined as the arithmetic average value of the Euclidean distance between each significant pixel point and the pixel point in the center of the image. The specific method is shown in Equation 9.

N in formula 9 is the number of salient points in the system feature saliency map, sp ⁱ is the ith salient point in the system, and center is the specific position of the pixel in the center of the image.

If the positions of the visual saliency points on the saliency map of the system are scattered in the areas of the saliency map, the system's saliency map is deemed to have insufficient contribution to the final saliency map of the system. Therefore, the W _effect weight can be defined as the arithmetic average of the Euclidean distance between the salient points of each system, and the center ^id is the center of the visual salient point position, as shown in Equation 10.

According to the distribution position and number of visual saliency points of the system, formula 11 can be used to calculate the correlation weight of each system feature saliency map, and the system saliency maps are fused to obtain the final system saliency map.

In formula 11, V _fi is the saliency map of the i-th system,

Is the weight of the area, pixel weight, and effect weight of the i-th system feature saliency map. W _fi is the fusion of the three weights on the i-th system feature saliency map. W _i is the i-th system feature saliency. The final weight to which the graph belongs, V is the saliency graph of the final system.

It can be seen from the above that the embodiments of the present invention use specific system saliency, global system saliency, and scarcity saliency to describe and describe specific visual saliency, which will be more conducive to the rapid image segmentation process. Through the improvement of the visual saliency mechanism, we constantly improve and perfect the application of the visual saliency mechanism for the image segmentation process in order to achieve a better image segmentation effect.

For S104, in another embodiment, the merging method of the weighted visual saliency result graph and the merging method of the average visual saliency result graph may be used to perform the merging operation of the visual saliency graph. After several experiments by the inventor of the present application, it is proved that the method of combining the weighted visual saliency result graphs is better than the average visual saliency result graph merging method. At the p-coordinate position on the i-th layer, the visual significance value is

among them

Normalize to the range of [0,1]. To calculate the combined visual saliency value on layer i, the visual saliency result graph on layer i-1 will be adjusted to the same size as the visual saliency result graph on layer i. Use visual significance

To represent the visual saliency result graph after operating on the image sequence

The visual significance value at the position of the upper p coordinate. The results of merging the refined visual saliency values of two adjacent layers are shown in Equation 12, Equation 13 and Equation 14:

In the formula, p is the pixel position,

Is the weight value of layer i,

Is the weight value of layer i-1,

Is the visual significance value of layer i,

Is the visual significance value of layer i-1;

To fuse the visual saliency value of the pixels at the position of the visual saliency map

According to Equation 13 and Equation 14, the visual saliency value is refined in the i-th layer

And the i-1 layer refined visual saliency value

It can be calculated if it has been determined

with

when

Time,

The refined visual saliency value after merging can be calculated according to formula 12.

According to the results of experiments conducted by the inventors of the present application using the method of the above embodiment, the visual saliency map after the merge operation can fuse the two levels of visual saliency multi-scale features, and the visual saliency obtained after the fusion The resulting graph is smoother and clearer.

In the process of selecting candidate image region blocks using the random search detection method, due to insufficient sample collection and random reasons such as sample error, the generated visual saliency map will contain a lot of random noise. The reason why the random noise signal is generated is that image noise is generated because the 2K image area block is randomly selected to perform Equation 1 for calculation. In order to improve the accuracy of the visual saliency map, after S103, a denoising process is performed.

In one embodiment, the eight-neighbor visual saliency value may be used to perform a refinement process on the roughened visual saliency map (the visual saliency map generated by S103). The eight-neighbor coordinate method can select neighboring pixels from the eight directions corresponding to the pixel r. The candidate image area block obtained by the eight-neighbor coordinate method is very different from the method of randomly selecting candidate image area blocks. Because the image similarity of the neighboring area at the p-coordinate is high, the eight-neighbor coordinate method may make the visual saliency value at the p-coordinate smaller than the actual image visual saliency value; if it is done according to the visual saliency value obtained for this method Normalization will cause corresponding noise on the rough visual saliency map obtained above. However, the visual saliency of neighboring coordinates is high and similar, for example. Therefore, it is necessary to perform refinement operations on the coordinate positions of pixel points that differ greatly from the visual saliency value of the neighboring coordinates. Considering the credibility factor of visual saliency, the visual saliency value with a large difference from the eight neighbors has a low credibility, so it is possible to choose to refine the visual saliency value with a large difference from the eight neighbors.

In addition, this application also provides another embodiment. Please refer to FIG. 2. FIG. 2 is a schematic flowchart of another method for detecting a visually significant region according to an embodiment of the present invention. The embodiment of the present invention may be applied to an image segmentation method, for example. Among them, the specific content may include the following:

S201: Use a random search method to select multiple candidate image area blocks for the current pixel from the image to be processed.

S202: Calculate the similarity metric values of the current pixel and each candidate image area block separately.

S203: Delete the candidate image area blocks corresponding to the similarity metric value lower than the similarity threshold, and use the remaining candidate image area blocks as the target image area blocks.

S204: Calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel.

S205: Based on the visual saliency value of each pixel of the image to be processed, generate rough visual saliency maps of each level of the image to be processed.

S206: Perform a refinement operation on each roughened visual saliency map to remove image noise signals, and obtain respective corresponding detailed visual saliency maps.

S207: Calculate the weight value of the visual saliency map of each layer according to the visual saliency value of the visual saliency maps of two adjacent layers.

S208: Integrate the visual saliency maps of each layer according to the weight value of the visual saliency maps of each layer to obtain an initial visual saliency map.

S209: Select multiple preset position points in the initial visual saliency map, and for each preset position point, divide it into an enhanced pixel point set and a weakened pixel point set according to the visual significance value of each pixel point.

The number of location points can be selected according to the size of the image to be processed and the actual situation of the visual saliency map, which is not limited in this application.

The visual saliency value of the pixels in the enhanced pixel set is greater than the visual saliency value of the pixels in the weakened pixel set.

A person skilled in the art may divide the high visual saliency value into the enhanced pixel point set according to his own experience, and divide the pixel point corresponding to the low visual saliency value into the weakened pixel point set. Or you can set a visual saliency division threshold based on the visual saliency value and total number of pixels from each position, divide the pixels above the threshold into the enhanced pixel set, and divide the pixels below the threshold The points are divided into a set of weakened pixels.

S210: Use the image enhancement method to perform enhancement processing on each pixel in the enhanced pixel set.

Any image enhancement method that can enhance the feature of the pixel point may be used, and those skilled in the art may choose according to actual needs and actual application scenarios, which is not limited in this application.

As for how to implement the process of image enhancement, please refer to the implementation scheme recorded in the related art, which will not be repeated here.

S211: Use the image weakening method to weaken each pixel in the weakened pixel set.

Any image enhancement method that can weaken the characteristics of pixel points can be used, and those skilled in the art can select according to actual needs and actual application scenarios, which is not limited in this application.

As for the process of how to weaken the image, please refer to the implementation scheme recorded in the related technology, and it will not be repeated here.

For the steps or methods in the embodiments of the present invention that are the same as those in the foregoing embodiments, refer to the implementation process described in the foregoing embodiments, and details are not described here.

It can be seen from the above that the invention of the embodiments of the present invention implements the detection of visually significant regions in four stages. The first stage is a random detection stage, which performs random search and detection processing according to the image to be processed, and uses the information of each layer of the original image to obtain each layer Roughened visual saliency map; the second stage is the refinement stage, which uses the roughened visual saliency maps at all levels to perform the refinement operation to remove image noise generated by random search and detection processing on the roughened visual saliency map ; The third stage is the multi-level merge stage of the area map. Through the detailed merge of the multi-layer visual saliency map, the final merged visual saliency map combining local features and global features; the fourth stage is to update the visual saliency value. At this stage, the visual saliency areas with high noise signals will be selectively combined to obtain the final high-quality detailed visual saliency result map. It achieves fast and real-time generation of detailed visual saliency result maps with the same size as the original input image (to-be-processed image), thereby improving the overall quality of the target segmentation results in the video image sequence.

In order to confirm that the technical solution provided by this application (generally referred to as multi-scale visual saliency detection algorithm) can quickly generate a detailed visual saliency result map of the same size as the original input image, it can be applied to the The visual saliency result graph of the image sequence, thereby improving the overall quality of the target segmentation results in the video image sequence. This application uses Matlab simulation environment to carry out a series of experiments to verify.

Under the Microsoft Windows 10 operating system environment, this multi-scale visual saliency detection algorithm is implemented. A visual saliency detection algorithm is used to generate a corresponding visual saliency result map based on the original input image. A large number of visual saliency results are obtained, which show that the multi-scale visual saliency detection method can obtain visual saliency images with good effects on the original input image compared with the existing mainstream visual saliency measurement methods. This application is related to 8 related technologies, such as AIM (Saliency detection algorithm based on information theory), GBVS (saliency detection algorithm based on graph theory), SR (saliency detection algorithm based on spectral remainder), IS (saliency labeling algorithm based on sparse salient regions), ICL ( Dynamic visual attention detection algorithm), ITTI (the most classic model algorithm), RC (global comparison algorithm based on salient region detection), SUN (significant Bayesian frame model using natural statistics) were compared and compared.

Six test images were selected from 120 original images of Bruce database to compare the method of the present invention with eight related technical algorithms. In terms of effect accuracy, this multi-scale method has the highest execution accuracy.

It can be seen from the experimental results that, for the face image of the hand-held card, the face and the hand-held card are the focus of attention of the human eye. This situation can only be better detected by this application. In the image of people riding bicycles, the cyclist is a saliency region. Both the RC and SUN methods can detect the saliency region in the image well. The method in this paper also detects the corresponding saliency region. For the pine tree image, the vacancy formed by the sparse part of the pine needle in the image is the most noticeable area of the human eye. The visual saliency of this image is difficult to detect. None of the eight related technical methods detect the range, which is extremely large. Most methods can mistake the white highlighted area with strong contrast in the lower left corner of the image as a visually significant area, and this application basically detects the approximate hollow area. This application is also optimal in terms of the contrast between the visually significant area and the visually insignificant area. For a landscape picture containing a portrait, although both IS and ICL can detect people in the image, this application has the highest contrast.

In order to objectively evaluate the specific performance of this application, the receiver operating characteristic curve (Receiver Operating Characteristic, ROC) was used to analyze the quantitative results. As shown in Figures 3 and 4, the curve of this application is the highest. The eight methods can be divided into two categories: algorithms that simply use prior knowledge of probabilities, including AIM, GBVS, IS, and SR; algorithms that simply use current observed image information, including ICL, ITTI, RC, and SUN. Then, this multi-scale visual saliency detection method is better than the method using only a priori probability knowledge or current observation image information.

Evaluation of color image segmentation algorithms At present, most of the cases are subjectively judged by the human eye. The results of applying the technical solution of this application to image segmentation are compared with the results of human eye segmentation in the Berkeley image standard library to evaluate the algorithm qualitatively. The application has good image segmentation effect. For example, for a landscape image of a bird on a tree, FIG. 5 is an image obtained by segmentation using the technical solution of the present application. As can be seen from the segmentation result of FIG. 5, the technical solution of the present application can accurately locate the salient regions in the image and use The image segmentation method of the technical solution of the present application can clearly cut out the outline of a bird and the outline of a tree, and can obtain an image segmentation result that is almost completely consistent with human eye segmentation.

It can be seen from the above that the embodiment of the present invention introduces a random search detection method for visual saliency, greatly improves the efficiency of the visual saliency detection algorithm, and obtains a detailed visual saliency map that is exactly the same size and scale as the original input image.

The embodiment of the present invention also provides a corresponding implementation device for the visual saliency area detection method, which further makes the method more practical. The visual saliency area detection device provided by the embodiment of the present invention will be described below. The visual saliency area detection device described below and the visual saliency area detection method described above can be referred to each other.

Referring to FIG. 6, FIG. 6 is a structural diagram of a visually significant area detection device according to an embodiment of the present invention in a specific implementation manner. The device may include:

The random search module 601 is used to select a plurality of candidate image area blocks for the current pixel from the image to be processed using a random search method, and select target image area blocks that satisfy the similarity condition from each candidate image area block.

The visual saliency value calculation module 602 is used to calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel.

The multi-layer visual saliency map generation module 603 is used to generate visual saliency maps at various levels of the image to be processed based on the visual saliency value of each pixel of the image to be processed.

The visual saliency map fusion module 604 is used to calculate the weight values of the visual saliency maps of each layer according to the visual saliency values of the adjacent two layers of visual saliency maps, which are used to visually highlight each layer according to the weight values of the visual saliency maps of each layer Figure fusion.

Optionally, in some implementations of this embodiment, please refer to FIG. 7, for example, the device may further include a visual saliency update module 605, and the visual saliency update module 605 is used for visual saliency obtained after fusion. Select multiple preset position points in the picture; for each preset position point, according to the visual saliency value of each pixel point, it is divided into the enhanced pixel point set and the weakened pixel point set, and the enhanced pixel points in the pixel set The visual saliency value is greater than the visual saliency value of the pixels in the weakened pixel set; the image enhancement method is used to strengthen each pixel in the enhanced pixel set; the image weakening method is used to weaken each pixel in the weakened pixel set deal with.

In a specific embodiment, the multi-layer visual saliency map generation module 603 may further include:

The rough visual saliency map generation sub-module is used to generate rough visual saliency maps at various levels of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

The detailed visual saliency map generation sub-module is used to perform a detailed operation on each rough visual saliency map to remove image noise signals and obtain respective corresponding detailed visual saliency maps.

In some other embodiments, the random search module 601 can also calculate the similarity metric values of the current pixel point and each candidate image area block separately; delete the candidate image area blocks corresponding to the similarity metric value lower than the similarity threshold, and The remaining candidate image area blocks serve as modules of the target image area block.

Optionally, in some specific embodiments, the visual saliency value calculation module 602 may be, for example, a module that calculates the visual saliency value S of the current pixel using the following formula:

among them,

In addition, the multi-layer visual saliency map generation module 603 may further include:

The local system saliency map generation sub-module is used to calculate the local system saliency value of each pixel of the image to be processed based on the frequency domain to generate a local system saliency map;

The global system saliency map generation submodule is used to calculate the global system saliency value V _Global (x,y) of each pixel of the image to be processed using the following formula to generate a global system saliency map:

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f(x, y) is the significance function for solving the pixel (x, y), and f _average (x, y) is f(x, y). The arithmetic mean value, the size of the image to be processed is M*N;

The scarcity saliency map generation submodule is used to calculate the scarcity saliency value V _Scarcity (x,y) of each pixel of the image to be processed using the following formula to generate a scarcity saliency map:

In the formula, x and y are the horizontal and vertical coordinates of the pixel, f _average (x, y) is the arithmetic average of f(x, y), and h(f _average (x, y)) is generated by the image to be processed Feature histogram;

The fusion sub-module is used to fuse the local system saliency map, the global system saliency map, and the scarcity saliency map to obtain visual saliency maps at various levels of the image to be processed.

In the embodiment of the present invention, the fusion sub-module may be, for example, a module for generating a visual saliency map by calculating the visual saliency value V _final of each pixel of the image to be processed using the following formula:

In the formula, V _Local , V _Global , and V _Scarcity are the local system saliency value, global system saliency value, and scarcity saliency value of the pixels whose horizontal and vertical coordinates are x and y, and v ₁ is the local system saliency value. The weight value of, v ₂ is the weight value of the global system saliency value, v ₃ is the weight value of the scarcity saliency value, i = 1, V _i = V ₁ is the local system saliency value, i = 2, V _i = V ₂ is the significance value of all systems, i = 3, V _i = V ₃ is the scarcity significance value, V ₁ , V ₂ , V ₃ are based on

Calculated.

Optionally, the local system saliency map generation sub-module may also be a module that calculates the local system saliency value V _Local (x, y) of each pixel of the image to be processed using the following formula:

In the formula, x and y are the horizontal and vertical coordinates of the pixel, FFT(u,v) is the characteristic value of the pixel, |FFT(u,v)e ^jψ(u,v) | is after the fast Fourier transform The resulting image amplitude spectrum, ψ(u,v) is the phase spectrum of the image to be processed.

In some embodiments, the visual saliency map fusion module 604 may also be a module that calculates the weight value of the visual saliency map of each layer using the following formula:

In the formula, p is the pixel position,

Is the weight value of layer i,

Is the weight value of layer i-1,

Is the visual significance value of layer i,

Is the visual significance value of layer i-1;

In the formula,

To fuse the visual saliency value of pixels at the position of p of the visual saliency map.

The functions of the functional modules of the visual saliency area detection device according to the embodiments of the present invention may be specifically implemented according to the method in the above method embodiments. For the specific implementation process, reference may be made to the related descriptions in the above method embodiments, and details are not described here.

It can be seen from the above that the embodiment of the present invention improves the calculation efficiency of the visual saliency value of pixels, thereby effectively improving the detection efficiency of the visual saliency area of the image to be processed, and can also improve the visual saliency area of the image to be processed. Detection accuracy and precision.

An embodiment of the present invention also provides a visually significant area detection device, which may specifically include:

Memory, used to store computer programs;

The processor is configured to execute a computer program to implement the steps of the visual saliency area detection method described in any one of the above embodiments.

The functions of the functional modules of the visual saliency area detection device according to the embodiments of the present invention may be specifically implemented according to the methods in the above method embodiments. For the specific implementation process, reference may be made to the related descriptions in the above method embodiments, and details are not described here.

An embodiment of the present invention also provides a computer-readable storage medium that stores a visual saliency area detection program, and the visual saliency area detection program is executed by the processor as described in any one of the above embodiments in the visual saliency area detection method A step of.

The functions of the functional modules of the computer-readable storage medium according to the embodiments of the present invention may be specifically implemented according to the method in the foregoing method embodiments. For the specific implementation process, reference may be made to the related descriptions of the foregoing method embodiments, and details are not described here.

The embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the embodiments may refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description in the method part.

Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to function. Whether these functions are executed in hardware or software depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of the present invention.

The steps of the method or algorithm described in conjunction with the embodiments disclosed herein may be implemented directly by hardware, a software module executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable and programmable ROM, registers, hard disks, removable disks, CD-ROMs, or all fields of technology. Any other known storage medium.

The method and device for detecting visually significant regions provided by the present invention have been described in detail above. In this article, specific examples are used to explain the principles and implementations of the present invention. The descriptions of the above examples are only used to help understand the method and the core idea of the present invention. It should be noted that for those of ordinary skill in the art, without departing from the principles of the present invention, the present invention may also be subject to several improvements and modifications, and these improvements and modifications also fall within the protection scope of the claims of the present invention.

Claims

A visual saliency area detection method, characterized in that it includes:

Use a random search method to select multiple candidate image area blocks for the current pixel from the image to be processed, and select target image area blocks that satisfy the similarity condition from each candidate image area block;

Calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel;

Generating a visual saliency map at each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

Based on the visual saliency values of the two adjacent layers of visual saliency maps, the weight values of the visual saliency maps of each layer are calculated to be used to fuse the visual saliency maps of each layer according to the weight values of the visual saliency maps of each layer.
The method for detecting a visual saliency area according to claim 1, wherein after calculating the weight values of the visual saliency maps of each layer according to the visual saliency values of the visual saliency maps of two adjacent layers, the method further comprises:

Select multiple preset position points in the visual saliency map obtained after fusion;

For each preset position point, according to the visual saliency value of each pixel point, it is divided into an enhanced pixel point set and a weakened pixel point set, and the visual saliency value of the pixels in the enhanced pixel point set is greater than the To weaken the visual significance value of the pixels in the pixel set;

Use the image enhancement method to perform enhancement processing on each pixel in the enhanced pixel set;

The image weakening method is used to weaken each pixel in the weakened pixel set.
The method for detecting a visual saliency region according to claim 1, wherein the generating the visual saliency maps of each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed includes:

Generating rough visual saliency maps at various levels of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

Perform detailed operations on each rough visual saliency map to remove image noise signals, and obtain respective corresponding detailed visual saliency maps.
The visual saliency area detection method according to claim 1, wherein the selection of target image area blocks satisfying the similarity condition from each candidate image area block includes:

Calculating the similarity metric values of the current pixel point and each candidate image area block separately;

Delete the candidate image area blocks corresponding to the similarity metric value lower than the similarity threshold, and use the remaining candidate image area blocks as the target image area blocks.
The visual saliency area detection method according to any one of claims 1 to 4, wherein the vision of the current pixel is calculated according to the similarity between each target image block and the current pixel Significance values include:

Use the following formula to calculate the visual significance value S of the current pixel:

among them,

Where dist(r i , r k ) is the dissimilarity measure between the current pixel position and the Kth target image block, K is the total number of target image blocks, dist color (r i , R k ) is the Euclidean distance in the HSV color space between the target image area block subjected to vectorization and the location of the current pixel, and dist pos (r i , r k ) is the location of the current pixel The Euclidean distance between the Kth target image block.
The visual saliency area detection method according to any one of claims 1 to 4, characterized in that, based on the visual saliency value of each pixel of the image to be processed, each level of the image to be processed is generated Visual saliency maps include:

Calculating the local system saliency value of each pixel of the image to be processed based on the frequency domain to generate a local system saliency map;

Use the following formula to calculate the global system saliency value V Global (x, y) of each pixel of the image to be processed to generate a global system saliency map:

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f(x, y) is the significance function for solving the pixel (x, y), and f average (x, y) is f(x, y). Arithmetic average, the size of the image to be processed is M*N;

Use the following formula to calculate the scarcity saliency value V Scarcity (x,y) of each pixel of the image to be processed to generate a scarcity saliency map:

In the formula, x and y are the horizontal and vertical coordinate values of the pixel, f average (x, y) is the arithmetic average value of f (x, y), and h (f average (x, y)) is the image to be processed The generated feature histogram;

The local system saliency map, the global system saliency map, and the scarcity saliency map are fused to obtain visual saliency maps at various levels of the image to be processed.
The visual saliency area detection method according to claim 6, wherein the fusion of the local system saliency map, the global system saliency map, and the scarcity saliency map to obtain each of the to-be-processed images Hierarchical visual saliency maps include:

For each layer of the image to be processed, the visual saliency value V final of each pixel of the image to be processed is calculated using the following formula to generate a visual saliency map:

In the formula, V Local , V Global , and V Scarcity are the local system saliency value, global system saliency value, and scarcity saliency value of the pixels whose horizontal and vertical coordinates are x and y, and v 1 is the local system saliency value. The weight value of, v 2 is the weight value of the global system saliency value, v 3 is the weight value of the scarcity saliency value; i=1, V i = V 1 is the local system saliency value, i=2, V i = V 2 is the significance value of all the systems, i = 3, V i = V 3 is the scarcity significance value, and V 1 , V 2 , and V 3 are based on
Calculated.
The method for detecting a visual saliency region according to claim 6, wherein the calculating the local system saliency value of each pixel of the image to be processed based on the frequency domain includes:

Use the following formula to calculate the local system saliency value V Local (x,y) of each pixel of the image to be processed:

In the formula, x and y are the horizontal and vertical coordinates of the pixel, FFT(u,v) is the characteristic value of the pixel, |FFT(u,v)e jψ(u,v) | is after the fast Fourier transform The obtained image amplitude spectrum, ψ(u,v) is the phase spectrum of the image to be processed.
The method for detecting a visual saliency area according to any one of claims 1 to 4, wherein the weight value of the visual saliency map of each layer is calculated according to the visual saliency value of the visual saliency maps of two adjacent layers, to It is used to fuse the visual saliency map of each layer according to the weight value of the visual saliency map of each layer, including:

Use the following formula to calculate the weight value of each layer of visual saliency map:

In the formula, p is the pixel position,
Is the weight value of layer i,
Is the weight value of layer i-1,
Is the visual significance value of layer i,
Is the visual significance value of layer i-1;

Use the following formula to fuse the visual saliency maps of two adjacent layers to obtain a fusion visual saliency map;

In the formula,
Is the visual saliency value of the pixel at the p position of the fusion visual saliency map.
A visual saliency area detection device, characterized in that it includes:

The random search module is used to select multiple candidate image area blocks for the current pixel from the to-be-processed image using a random search method, and select target image area blocks satisfying the similarity condition from each candidate image area block;

The visual saliency value calculation module is used to calculate the visual saliency value of the current pixel according to the similarity between each target image block and the current pixel;

A multi-layer visual saliency map generation module, used to generate visual saliency maps of each level of the image to be processed based on the visual saliency value of each pixel of the image to be processed;

The visual saliency map fusion module is used to calculate the weight value of the visual saliency map of each layer according to the visual saliency value of the adjacent two layers of visual saliency maps, which is used to visually map each layer according to the weight value of the visual saliency map of each layer Perform fusion.