CN114359323B

CN114359323B - Image target area detection method based on visual attention mechanism

Info

Publication number: CN114359323B
Application number: CN202210021568.6A
Authority: CN
Inventors: 黄方昊; 江佳诚; 杨霄; 陈正; 聂勇; 唐建中
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2024-07-05
Anticipated expiration: 2042-01-10
Also published as: CN114359323A

Abstract

The invention discloses an image target area detection method based on a visual attention mechanism. Comprising the following steps: calculating an underlying visual feature descriptor of the reference image; then, super-pixel segmentation is carried out on the real-time image by utilizing an SLIC method, regional description is carried out on the current real-time image information, then, the salient values of all the super-pixels are calculated by utilizing a bottom-up data driving method, and the salient values of all the super-pixels form a salient map of the real-time image; calculating the similarity between each super pixel of the real-time image and the reference image by using a fuzzy matching method; then, a region fusion expansion method based on similarity and a salient value is utilized to carry out region expansion on a salient image of the real-time image, and a possible target region of the real-time image is obtained; and screening the possible target areas of the real-time image to obtain a final target area of the current real-time image. The invention improves the accuracy of target area detection in a complex environment and ensures the real-time performance of running on low-power equipment.

Description

Image target area detection method based on visual attention mechanism

Technical Field

The invention belongs to a target area detection method of an image in the field of computer vision, and particularly relates to a target area detection method of an image based on a visual attention mechanism.

Background

Target area detection is an important image processing technology in computer vision, and the target area is obtained through target area detection, so that computing resources can be concentrated, irrelevant information interference is reduced, and the efficiency and accuracy of image processing are improved. Research on target area detection algorithms has been a popular direction in the field of computer vision.

Early target region detection algorithms were based on manual feature construction, describing the information of image regions by manually designing a region feature description method, and performing detection based on simple detection methods such as sliding window detectors, e.g., VJ detectors proposed by p.viola and m.jones et al, HOG detectors proposed by n.dalal and b.triggs et al, etc.; on this basis, P.Felzenszwalb et al propose a detection model based on deformable components, and use a hybrid model to detect objects that may have significant changes, achieving a better effect in the traditional target area detection algorithm.

With the development of convolutional neural networks and deep learning, a target region detection algorithm based on deep learning enters a new stage. The target area detection algorithm based on deep learning is divided into a secondary detection method and a primary detection method according to a processing flow, wherein the secondary detection method comprises an R-CNN (R-CNN) proposed by R.Girshick et al, a FAST RCNN (R-CNN), SPPNet proposed by K.He et al and a FASTER RCNN detector proposed by S.ren et al, candidate frames are firstly generated, and then the candidate frames are detected through a model obtained through training; the first-level detection method comprises a YOLO, a SSD, and the like proposed by R.Joseph et al, W.Liu et al, and a neural network is applied to the whole picture to detect.

The target area detection can be realized based on the traditional characteristics and the neural network method, but the method has a certain problem: the traditional target area detection algorithm needs to design a relatively complex characteristic description mode, and the detection accuracy is relatively low; the target area detection algorithm based on the neural network needs a large amount of data set and takes a large amount of time to train, and has high requirement on the computing power of the system during detection, and is not suitable for mobile equipment with low computing power represented by augmented reality equipment.

Disclosure of Invention

Aiming at the application problem of the current target area detection algorithm on low-computational-power mobile equipment, the invention provides an image target area detection method based on a visual attention mechanism.

In order to achieve the above purpose, the technical scheme of the invention specifically comprises the following steps:

the invention comprises the following steps:

1) Calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of the reference image;

2) Performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on the current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the salient values of each superpixel by using a bottom-up data driving method, and forming a salient map of the current real-time image by the salient values of each superpixel;

3) Calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the main color descriptor of the reference image and the average color of each super pixel in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;

4) According to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, performing region expansion on the salient image of the current real-time image by using a region fusion expansion method based on the similarity and the salient value, and then obtaining a possible target region of the current real-time image;

5) And carrying out preliminary screening on a possible target area of the current real-time image based on the area, and further carrying out regional screening on the preliminary screening target area by utilizing texture feature descriptors and Fourier descriptors of the reference image to obtain a final target area of the current real-time image.

The step 1) specifically comprises the following steps:

1.1 Converting the colors of the reference image from the RGB color space to the HSV color space, and quantizing the HSV color space of the reference image to obtain color values of each pixel in the quantized HSV color space;

1.2 Calculating a color histogram of the reference image based on the color values of the pixels in the quantized HSV color space, calculating the percentage of each quantized color according to the color histogram, arranging the quantized colors in descending order, taking the first quantized color as the first main color and recording as (L ₁,p₁), wherein L ₁ represents the quantized value of the first main color, p ₁ represents the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected percentage p _n of the nth quantized color in the color histogram satisfies p _n>0.5p₁, taking the quantized color as the nth main color and recording as (L _n,p_n),L_n represents the quantized value of the nth main color until the quantized color of p _n>0.5p₁ is not satisfied or the sum of the percentages of the main colors in the color histogram Wherein N represents the number of main colors, and a main color descriptor D _DC is formed by a plurality of main colors;

1.3 A gray level co-occurrence matrix of the reference image is calculated, and corresponding angular second moment, inverse difference moment, entropy and contrast are calculated based on the gray level co-occurrence matrix and used as texture feature descriptors of the reference image;

1.4 The complex form of the reference image is subjected to discrete Fourier transformation and normalization to obtain a Fourier descriptor of the reference image.

The step 2) specifically comprises the following steps:

2.1 Performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels and corresponding label graphs of the real-time image, wherein the superpixels form a superpixel set;

2.2 Processing the super-pixel set based on the label graph to obtain a pixel set, color characteristics and position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each superpixel in the CIELab color space and an average color in the quantized HSV color space; the position information is the geometric center of each superpixel and the adjacent superpixel set of each superpixel;

2.3 Taking a plurality of super pixels containing edge pixels p _e of the real-time image in the super pixel set as an edge super pixel set E, and calculating the image edge main color of the current real-time image based on the edge super pixel set E;

2.4 Calculating a contrast ratio between each superpixel and each superpixel based on a set of pixels of each superpixel, an average value of each color component of each superpixel in a CIELab color space, and a geometric center of each superpixel; then, based on the average color of the current superpixel in the quantized HSV color space, respectively calculating the salient value of the current superpixel relative to the background information and the salient value based on the contrast by combining the contrast between the current superpixel and each superpixel and the main color of the image edge through the following formulas:

where Sal _E(S_k) represents the salient value of the superpixel S _k relative to the background information, |represents the absolute value taking operation, min represents the minimum value taking operation, sal _C(S_k) represents the contrast-based salient value of the superpixel S _k, Represents the average color of the superpixel S _k in the quantized HSV color space,Is the mth quantized edge color of the image edge dominant colors,Respectively representing the luminance of the super-pixel S _k in the CIELab color space and the average value of the first and second color channels,Respectively representing the luminance of the super-pixel S _i in the CIELab color space and the average value of the first and second color channels,Representing the coordinates of the geometric center of the superpixel S _k in the current real-time image,Representing the coordinates of the geometric center of the superpixel S _i in the current real-time image, λ _pos is a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is the superpixel number in the superpixel set;

2.5 Repeating the step 2.4), traversing the residual super-pixels, and calculating and obtaining the salient values of the residual super-pixels relative to the background information and the salient values based on the contrast;

2.6 Respectively carrying out normalization processing on the salient values of the super pixels relative to the background information and the salient values based on the contrast ratio, and carrying out linear fusion to obtain the final salient values of the super pixels, wherein the calculation formula is as follows:

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

Where Sal (S _k) represents the final saliency value of superpixel S _k, sal '_E(S_k represents the normalized saliency value of superpixel S _k with respect to background information, sal' _C(S_k represents the contrast-based normalized saliency value of superpixel S _k, λ _Sal1 is a first weight coefficient, λ _Sal2 is a second weight coefficient, and λ _Sal1+λ_Sal2 =1 is satisfied;

2.7 After normalizing the final significant value of each super pixel to the [0,255] interval, carrying out gray scale assignment on the corresponding super pixel based on the normalized final significant value, and obtaining the significant map of the current real-time image.

In the step 2.3), calculating the image edge main color of the current real-time image based on the edge super-pixel set E, specifically:

s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;

S2: the quantized edge color L ^E in the color histogram of the edge super-pixel set E meets the requirement of L ^E E [0,71], and the sum of the percentages of the neighborhood colors { L ^E-1,L^E,L^E +1} of the current quantized edge color L ^E in the color histogram of the edge super-pixel set E is taken as the percentage of the current quantized edge color L ^E in the color histogram of the edge super-pixel set E;

S3: repeating the step S2, and traversing to calculate the percentage of all quantized edge colors L ^E in the color histogram of the edge super-pixel set E;

S4: deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, and taking the percentage of the quantized edge color obtained in the S3 in the color histogram of the edge super-pixel set E as the image edge main color of the current real-time image, wherein the percentage is more than or equal to 20%.

The calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:

Wherein Sim (S _k) represents the similarity value of the super pixel S _k, Representing the average color of super pixel S _k in the current live image, L _DC∈D_DC,L_DC representing the dominant color in the dominant color descriptor of the reference image, and D _DC representing the dominant color descriptor of the reference image; the absolute value operation is denoted by the absolute value, the minimum value operation is denoted by min (), and th _Sim is the similarity threshold.

The step 4) is specifically as follows:

4.1 Judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final salient value of the current super pixel in the salient image of the current real-time image is larger than the initial salient value threshold, taking the current super pixel as the target area to be determined, otherwise, not the target area to be determined; traversing all super pixels, and forming an initial target area of the current real-time image by all target areas to be determined;

4.2 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the initial target region, if the final salient value of the current adjacent super-pixel in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixel belongs to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, forming an expansion target region by all the target regions to be expanded, and improving the current salient value threshold;

4.3 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the current expansion target region, if the final salient value of the current adjacent super-pixels in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixels belong to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, obtaining a new expansion target region and improving the current salient value threshold;

4.4 Repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all expansion target areas to form a possible target area of the current real-time image.

The threshold calculation formulas for increasing the current significant value threshold in the steps 4.2) and 4.3) are specifically as follows:

where th _Sal denotes the initial saliency threshold, And representing the saliency value threshold after the t th expansion.

The step 5) specifically comprises the following steps:

5.1 The possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number more than the preset proportion of the total number of the pixels of the current real-time image are taken as primary screening target areas;

5.2 Calculating texture feature descriptors and Fourier descriptors of all connected areas of the preliminary screening target area, respectively calculating a first Euclidean distance between the texture feature descriptors of all connected areas of the preliminary screening target area and the texture feature descriptors of the reference image and a second Euclidean distance between the Fourier descriptors of all connected areas of the preliminary screening target area and the Fourier descriptors of the reference image, and calculating the difference degree between all connected areas of the preliminary screening target area and the reference image based on the first Euclidean distance and the second Euclidean distance of all connected areas of the preliminary screening target area, wherein the calculation formula is as follows:

Wherein Diff (R _q) represents the degree of difference between the connected region R _q of the preliminary screening target region and the reference image, A first Euclidean distance of the connected region R _q representing the preliminary screening target region,A second euclidean distance of the connected region R _q representing the preliminary screening target region, λ _Diff representing a third weight coefficient;

5.3 The connected region with the smallest difference degree is used as the final target region of the current real-time image.

Compared with the prior art, the invention has the following beneficial effects:

1. The method simulates the human visual system to search the object, searches possible target areas from two angles of bottom up (significance detection) and top down (color matching based on the main color descriptor) based on a visual attention mechanism, and improves the accuracy of target area detection in a complex environment.

2. The invention carries out fusion, expansion and screening of the target area on the basis of significance detection and color matching based on the main color descriptor, and further screens by using the texture feature descriptor and the Fourier descriptor, thereby improving the detection accuracy.

3. According to the invention, the target area detection is carried out by using the bottom visual feature descriptors of the images, so that the target area detection calculation efficiency is improved, and the real-time performance on low-calculation-force equipment is ensured.

Drawings

Fig. 1 is a flow chart of a target area detection algorithm based on a visual attention mechanism according to the present invention.

Fig. 2 is an example flow chart and a result chart of a target area detection algorithm based on a visual attention mechanism according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The invention will now be further described with reference to examples, figures:

the technical scheme of the invention is as follows:

as shown in fig. 1 and 2, the present invention includes the steps of:

1) Calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of the reference image to describe the color, the texture and the shape of the reference image respectively, wherein the main color descriptor, the texture feature descriptor and the Fourier descriptor form a bottom visual feature descriptor; the reference image is a target object in the real-time image.

Extracting color, texture and shape characteristics in a reference image based on pixels, wherein the step 1) specifically comprises the following steps:

For color features, a main color selection method based on a quantized HSV color space is provided, and the main color features of a reference image are described more accurately.

1.1 Converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image through the following formula to obtain the color value of each pixel in the quantized HSV color space;

L＝9H+3S+V

Where h, s, v represent hue, saturation, and brightness components of the pixel in the HSV color space, respectively, H, S, V represent hue, saturation, and brightness components of the pixel in the quantized HSV color space, respectively, and L represents the color value of the current pixel in the quantized HSV color space.

1.2 Calculating a color histogram of the reference image based on color values of each pixel in the quantized HSV color space of the reference image, calculating a percentage of each quantized color according to the color histogram, arranging in descending order, taking a first quantized color (i.e., a quantized color with the highest percentage) as a first main color and recording as (L ₁,p₁), wherein L ₁ represents a quantized value of the first main color, p ₁ represents a percentage of the first main color in the color histogram, sequentially selecting an nth quantized color, and when the selected percentage p _n of the nth quantized color in the color histogram satisfies p _n>0.5p₁, taking the quantized color as the nth main color and recording as (L _n,p_n),L_n represents a quantized value of the nth main color until the quantized color of p _n>0.5p₁ is not satisfied or a sum of percentages of the main colors in the color histogram is not satisfiedWherein N represents the number of main colors, and a main color descriptor D _DC is formed by a plurality of main colors;

D_DC＝{(L_n,p_n),i＝1,2…N}

2) Performing superpixel segmentation on a real-time image by using a Simple Linear Iterative Clustering (SLIC) algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on the current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the salient values of each superpixel by using a bottom-up data driving method, and forming a salient map of the current real-time image by the salient values of each superpixel;

the step 2) is specifically as follows:

The bottom-up data driving method is used for obtaining a saliency map of corresponding features by extracting bottom visual features of an image, such as colors, textures, shapes and the like, calculating the contrast between pixels or areas of the image based on the features, and obtaining a final saliency area by fusing the saliency maps of the features.

2.1 Super-pixel segmentation is carried out on the real-time image through a simple linear iterative clustering algorithm, a plurality of super-pixels and corresponding label images of the real-time image are obtained, and a super-pixel set S is formed by the plurality of super-pixels, so that the requirements ofK represents the total number of super pixels, in this embodiment, K is 300, S _i represents the ith super pixel in the super pixel set S, and i represents the super pixel serial number;

2.3 Using a plurality of super pixels including an edge pixel p _e of the real-time image in the super pixel set as an edge super pixel set E, satisfying the following conditions S _u denotes the u-th superpixel in the edge superpixel set E, and P _u denotes the pixel of the u-th superpixel in the edge superpixel set E. Calculating the image edge main color of the current real-time image based on the edge super-pixel set E;

in the step 2.3), calculating the image edge main color of the current real-time image based on the edge super-pixel set E, wherein the image edge main color specifically comprises the following steps:

S4: deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, taking 20% or more of the quantized edge color obtained by calculation in the step S3 in the color histogram of the edge super-pixel set E as an image edge main color D _EDC of the current real-time image, and describing background information of the real-time image by using the image edge main color;

Where M is the number of colors of the image edge dominant color and satisfies M ε {0,1,2,3,4,5}, Represents the mth quantized edge color of the image edge dominant colors,Satisfying that the current quantized edge color and the color close thereto are not the main color of the reference image, i.e

where Sal _E(S_k) represents the salient value of the superpixel S _k relative to the background information, |represents the absolute value taking operation, min represents the minimum value taking operation, sal _C(S_k) represents the contrast-based salient value of the superpixel S _k, Represents the average color of the superpixel S _k in the quantized HSV color space,Is the mth quantized edge color of the image edge dominant colors,Respectively representing the luminance of the super-pixel S _k in the CIELab color space and the average value of the first and second color channels,Representing the brightness of the super pixel S _i in the CIELab color space, respectively, as well as the average of the first and second color channels, the first color channel being from dark green (low brightness value) to gray (medium brightness value) to bright pink (high brightness value), the second color channel being from bright blue (low brightness value) to gray (medium brightness value) to yellow (high brightness value),Representing the coordinates of the geometric center of the superpixel S _k in the current real-time image,Representing the coordinates of the geometric center of the superpixel S _i in the current real-time image, λ _pos is a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is the superpixel number in the superpixel set;

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

Where Sal (S _k) represents the final saliency value of superpixel S _k, sal '_E(S_k represents the normalized saliency value of superpixel S _k with respect to background information, sal' _C(S_k represents the contrast-based normalized saliency value of superpixel S _k, λ _Sal1 is a first weight coefficient, λ _Sal2 is a second weight coefficient, and λ _Sal1+λ_Sal2 =1 is satisfied; in this embodiment, λ _Sal1＝0.3,λ_Sal2 =0.7 is selected.

3) Aiming at the color matching problem, calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the main color descriptor of the reference image and the average color of each super pixel in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image; wherein, the super pixel with the similarity value of 1 is a region with similar color to the reference image;

Wherein Sim (S _k) represents the similarity value of the super pixel S _k, Representing the average color of super pixel S _k in the current live image, L _DC∈D_DC,L_DC representing the dominant color in the dominant color descriptor of the reference image, and D _DC representing the dominant color descriptor of the reference image; the absolute value operation is denoted by the absolute value, the minimum value operation is denoted by min (), and th _Sim is the similarity threshold. When (when)When the minimum value of the difference values from all the main colors in D _DC is less than or equal to the threshold value, S _k is considered to belong to the target area of color matching. A binary image can be obtained through the similarity of the super pixels, and the binary image is displayed as white in the image when the similarity value is 1, and is displayed as black in the image when the similarity value is 0. The white portion of the image is the color matching region. Th _Sim =2 is taken in this embodiment.

4) Aiming at the fusion problem of the saliency map and the color matching region, according to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, the region fusion expansion method based on the similarity and the saliency value is utilized to expand the region of the saliency map of the current real-time image, and then a possible target region of the current real-time image is obtained;

The step 4) is specifically as follows:

4.1 Whether each superpixel of the current real-time image is a target area or not is determined, and if the similarity value of each superpixel is 1 and the final saliency value of the current superpixel in the saliency map of the current real-time image is greater than an initial saliency value threshold, the initial saliency value threshold is 0.2 in this embodiment. The current super pixel is used as a target area to be determined, otherwise, the current super pixel is not the target area to be determined; traversing all super pixels, and forming an initial target area of the current real-time image by all target areas to be determined; initial target area Can be expressed as:

Wherein Sim (S _j) represents the similarity value of the superpixel S _j, sim (S _j) describes the similarity of the superpixel S _j to the reference image, sal (S _j) represents the final saliency value of the superpixel S _j, sal (S _j) describes the saliency of the superpixel S _j in the real-time image, S _j which is considered to be similar to the target object and has a higher saliency belongs to the target area to be searched for, and th _Sal is a threshold value for determining whether the final saliency value satisfies the requirement.

4.2 The initial target region generally cannot contain the entire target object, and is extended based on the saliency map. Expanding a target area of an adjacent super-pixel set of each super-pixel of the initial target area, if the final salient value of the current adjacent super-pixel in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixel belongs to the target area to be expanded, traversing the adjacent super-pixel set of each super-pixel of the initial target area, forming an expansion target area by all the target areas to be expanded, and improving the current salient value threshold;

The threshold calculation formulas for improving the current significant value threshold in the steps 4.2) and 4.3) are specifically as follows:

4.4 Repeating the step 4.3) until no new expansion target area is generated, fusing the initial target area and all expansion target areas to form a possible target area R' _T of the current real-time image, and meeting the requirements ofWherein the method comprises the steps ofThe initial target area is indicated as such,And representing the expansion target area in the nth expansion.

The step 5) is specifically as follows:

5.1 The possible target area of the current real-time image consists of a plurality of connected areas, the pixel number in the connected areas is used for representing the size of one area, the pixel number of each connected area is calculated, and the connected areas with the pixel number more than the preset proportion of the total number of the pixels of the current real-time image are taken and used as primary screening target areas; in a specific implementation, the preset proportion is 20%.

5.2 Calculating texture feature descriptors and Fourier descriptors of all connected areas of the preliminary screening target area by the method of calculating the texture feature descriptors and the Fourier descriptors in the step 1), respectively calculating first Euclidean distances between the texture feature descriptors of all connected areas of the preliminary screening target area and the texture feature descriptors of the reference image and second Euclidean distances between the Fourier descriptors of all connected areas of the preliminary screening target area and the Fourier descriptors of the reference image, and calculating the difference degree between all connected areas of the preliminary screening target area and the reference image based on the first Euclidean distances and the second Euclidean distances of all connected areas of the preliminary screening target area, wherein the calculation formula is as follows:

Wherein Diff (R _q) represents the degree of difference between the connected region R _q of the preliminary screening target region and the reference image, A first Euclidean distance of the connected region R _q representing the preliminary screening target region,A second euclidean distance of the connected region R _q representing the preliminary screening target region, and λ _Diff representing a third weight coefficient for adjusting the two specific gravities, in this embodiment, λ _Diff =0.5;

5.3 Taking the connected region with the smallest difference degree as the final target region of the current real-time image

The above is only a technical idea of the present invention, and the protection scope of the present invention is not limited by the above, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. The image target area detection method based on the visual attention mechanism is characterized by comprising the following steps of:

2. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein the step 1) specifically comprises:

3. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 2) specifically comprises:

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

4. A visual attention mechanism-based image target area detection method as recited in claim 3, wherein the calculating of the image edge dominant color of the current real-time image based on the edge super-pixel set E in the step 2.3) specifically includes:

5. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein the calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:

Wherein Sim (S _k) represents the similarity value of the super pixel S _k, Representing the average color of super pixel S _k in the current live image, L _DC∈D_DC,L_DC representing the dominant color in the dominant color descriptor of the reference image, and D _DC representing the dominant color descriptor of the reference image; the i represents an absolute value taking operation, the mix () represents a minimum value taking operation, and th _Sim is a similarity threshold.

6. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 4) specifically comprises:

7. The method for detecting an image target area based on a visual attention mechanism according to claim 6, wherein the threshold calculation formulas for increasing the current saliency value threshold in the steps 4.2) and 4.3) are specifically:

8. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 5) specifically comprises: