CN114359323A

CN114359323A - Image target area detection method based on visual attention mechanism

Info

Publication number: CN114359323A
Application number: CN202210021568.6A
Authority: CN
Inventors: 黄方昊; 江佳诚; 杨霄; 陈正; 聂勇; 唐建中
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-15

Abstract

The invention discloses an image target area detection method based on a visual attention mechanism. The method comprises the following steps: calculating a bottom layer visual characteristic descriptor of the reference image; secondly, performing super-pixel segmentation on the real-time image by using an SLIC method, performing regionalized description on current real-time image information, calculating a significant value of each super-pixel by using a bottom-up data driving method, and forming a significant graph of the real-time image by using the significant value of each super-pixel; then, calculating the similarity between each super pixel of the real-time image and the reference image by using a fuzzy matching method; then, carrying out region expansion on the saliency map of the real-time image by using a region fusion expansion method based on the similarity and the saliency value to obtain a possible target region of the real-time image; and screening the possible target area of the real-time image to obtain the final target area of the current real-time image. The invention improves the accuracy of target area detection in a complex environment and simultaneously ensures the real-time performance of operation on low-computing-power equipment.

Description

Image target area detection method based on visual attention mechanism

Technical Field

The invention belongs to a method for detecting a target area of an image in the field of computer vision, and particularly relates to a method for detecting a target area of an image based on a visual attention mechanism.

Background

The target area detection is an important image processing technology in computer vision, and the target area is obtained through the target area detection, so that the computing resources can be concentrated, the interference of irrelevant information is reduced, and the efficiency and the accuracy of image processing are improved. The search for target area detection algorithms has been a popular direction in the field of computer vision.

An early target region detection algorithm is constructed based on manual features, describes information of an image region by manually designing a region feature description method, and detects based on simple detection methods such as a sliding window detector and the like, for example, VJ detectors proposed by p.viola and m.jones et al, HOG detectors proposed by n.dalal and b.triggs et al, and the like; on the basis, P.Felzenszwalb et al propose a detection model based on a deformable component, and use a hybrid model to detect and process objects which may have significant changes, so that a better effect is achieved in a traditional target region detection algorithm.

With the development of convolutional neural networks and deep learning, the target region detection algorithm based on deep learning enters a new stage. The target area detection algorithm based on deep learning is further divided into a secondary detection method and a primary detection method according to the processing flow, the secondary detection method comprises R-CNN and FAST RCNN which are proposed by R.Girshick and the like, and SPPNet and S.ren which are proposed by K.He and the like, and FAST RCNN detectors which are proposed by S.ren and the like, and the candidate frames are firstly generated and then detected through a model obtained through training; the first-level detection method comprises YOLO proposed by r.joseph et al, SSD proposed by w.liu et al, and the like, and the neural network is applied to the whole picture for detection.

The target area detection can be realized by methods based on traditional features and neural networks, but certain problems exist: the traditional target area detection algorithm needs to design a relatively complex feature description mode, and the detection precision is relatively low; target area detection algorithms based on neural networks all require a large amount of data sets and spend a large amount of time for training, have high requirements on the computing power of the system during detection, and are not suitable for mobile devices with low computing power represented by augmented reality devices.

Disclosure of Invention

The invention provides an image target area detection method based on a visual attention mechanism, aiming at the application problem of the current target area detection algorithm on low-computing-power mobile equipment.

In order to achieve the purpose, the technical scheme of the invention comprises the following specific contents:

the invention comprises the following steps:

1) calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of a reference image;

2) performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the significant value of each superpixel by using a bottom-up data driving method, and forming a significant image of the current real-time image by using the significant value of each superpixel;

3) calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the average color of the main color descriptor and each super pixel of the reference image in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;

4) according to the similarity value of each super pixel of the current real-time image and an adjacent super pixel set of each super pixel, carrying out region expansion on a saliency map of the current real-time image by using a region fusion expansion method based on the similarity and the saliency value to obtain a possible target region of the current real-time image;

5) and carrying out preliminary screening on a possible target region of the current real-time image based on the region area, and further carrying out regional screening on the preliminary screening target region by utilizing a textural feature descriptor and a Fourier descriptor of a reference image to obtain a final target region of the current real-time image.

The step 1) is specifically as follows:

1.1) converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image to obtain the color value of each pixel in the reference image in the quantized HSV color space;

1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color as a first main color and marking the first quantized color as (L)₁,p₁) Wherein L is₁Representing a first dominant color quantization value, p₁Representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogram_nSatisfies p_n>0.5p₁Then the quantized color is taken as the nth main color and is noted as (L)_n,p_n)，L_nRepresenting the nth dominant color quantization value until p is not satisfied_n>0.5p₁Is quantized or the sum of the percentages of the main colors in the color histogram

Wherein N represents the number of dominant colors, and the multiple dominant colors constitute a dominant color descriptor D_DC；

1.3) calculating a gray level co-occurrence matrix of the reference image, calculating corresponding angle second moment, inverse difference moment, entropy and contrast based on the gray level co-occurrence matrix and using the corresponding angle second moment, inverse difference moment, entropy and contrast as texture feature descriptors of the reference image;

1.4) obtaining a Fourier descriptor of the reference image after discrete Fourier transform and normalization of the complex form of the reference image.

The step 2) is specifically as follows:

2.1) performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the real-time image and corresponding label images, and forming a superpixel set by the plurality of superpixels;

2.2) processing the super-pixel set based on the label graph to obtain the pixel set, the color characteristic and the position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each super-pixel in a CIELab color space and an average color in a quantized HSV color space; the position information is a geometric center of each super pixel and an adjacent super pixel set of each super pixel;

2.3) including edge pixels p of the real-time image in the superpixel set_eThe plurality of super pixels are used as an edge super pixel set E, and the image edge main color of the current real-time image is calculated based on the edge super pixel set E;

2.4) calculating the contrast between each super pixel and each super pixel based on the pixel set of each super pixel, the average value of each color component of each super pixel in the CIELab color space and the geometric center of each super pixel; then, based on the average color of the current super-pixel in the quantized HSV color space, respectively calculating the significant value of the current super-pixel relative to the background information and the significant value based on the contrast by combining the contrast between the current super-pixel and each super-pixel and the main color of the image edge through the following formulas:

wherein, Sal_E(S_k) Representing a super-pixel S_kWith respect to the significant value of the background information, | | represents an absolute value taking operation, min represents a minimum value taking operation, Sal_C(S_k) Representing a super-pixel S_kBased on the saliency value of the contrast of,

representing a super-pixel S_kThe average color in the quantized HSV color space,

is the mth quantized edge color in the image edge dominant colors,

respectively representing super-pixels S_kThe luminance in the CIELab color space and the average values of the first and second color channels,

respectively representing super-pixels S_iThe luminance in the CIELab color space and the average values of the first and second color channels,

representing a super-pixel S_kThe coordinates of the geometric center of (a) in the current real-time image,

representing a super-pixel S_iIs the coordinate of the geometric center of (b) in the current real-time image, λ_posIs a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is a superpixel in the superpixel setA serial number;

2.5) repeating the step 2.4), traversing the residual superpixels, and calculating and obtaining significant values of the residual superpixels relative to background information and significant values based on contrast;

2.6) respectively carrying out normalization processing and linear fusion on the significant value of each super pixel relative to the background information and the significant value based on the contrast to obtain the final significant value of each super pixel, wherein the calculation formula is as follows:

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

wherein Sal (S)_k) Representing a super-pixel S_kOf Sal'_E(S_k) Representing a super-pixel S_kNormalized significant value, Sal 'against background information'_C(S_k) Representing a super-pixel S_kIs based on a normalized saliency value, lambda, of the contrast_Sal1Is a first weight coefficient, λ_Sal2Is a second weight coefficient and satisfies lambda_Sal1+λ_Sal2＝1；

2.7) normalizing the final significant value of each super-pixel to a [0,255] interval, and performing gray assignment on the corresponding super-pixel based on the normalized final significant value to obtain a significant map of the current real-time image.

The step 2.3) of calculating the image edge dominant color of the current real-time image based on the edge superpixel set E specifically comprises the following steps:

s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;

s2: quantized edge color L in a color histogram of an edge superpixel set E^ESatisfy L^E∈[0,71]The current quantized edge color L^ENeighborhood color of { L }^E-1,L^E,L^E+1 the sum of the percentages in the color histogram of the edge superpixel set E is taken as the current quantized edge color L^EThe percentage of the color histogram in the edge super-pixel set E;

s3: repeating the step S2, and traversing and calculating allQuantized edge color L^EThe percentage of the color histogram in the edge super-pixel set E;

s4: and deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, wherein the percentage of the quantized edge color in the color histogram of the edge super-pixel set E calculated and obtained in the step S3 is more than or equal to 20% as the image edge main color of the current real-time image.

The calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:

wherein, Sim (S)_k) Representing a super-pixel S_kThe similarity value of (a) is calculated,

representing a superpixel S in a current real-time image_kAverage color of L_DC∈D_DC，L_DCRepresenting the dominant color in a descriptor of the dominant color of the reference image, D_DCA primary color descriptor representing a reference image; | | denotes an absolute value taking operation, min () denotes a minimum value taking operation, th_SimIs the similarity threshold.

The step 4) is specifically as follows:

4.1) judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final significant value of the current super pixel in the significant image of the current real-time image is larger than the initial significant value threshold, taking the current super pixel as an undetermined target area, otherwise, not determining the undetermined target area; traversing all super pixels, and forming an initial target area of the current real-time image by all the target areas to be determined;

4.2) carrying out target area expansion on the adjacent superpixel set of each superpixel of the initial target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, forming an expansion target area by all target areas to be expanded, and improving the current significant value threshold value;

4.3) carrying out target area expansion on the adjacent superpixel set of each superpixel of the current expanded target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, obtaining a new expanded target area and improving the current significant value threshold value;

4.4) repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all the expansion target areas to form a possible target area of the current real-time image.

The threshold calculation formula for increasing the current significant value threshold in the steps 4.2) and 4.3) is specifically as follows:

therein, th_SalA threshold value representing the initial significance value is indicated,

representing the significance threshold after the t-th expansion.

The step 5) is specifically as follows:

5.1) the possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number accounting for more than the preset proportion of the total pixel number of the current real-time image are taken as primary screening target areas;

5.2) calculating texture feature descriptors and Fourier descriptors of all connected regions of the primary screening target region, respectively calculating first Euclidean distances between the texture feature descriptors of all connected regions of the primary screening target region and texture feature descriptors of the reference image and second Euclidean distances between the Fourier descriptors of all connected regions of the primary screening target region and Fourier descriptors of the reference image, and calculating the difference between each connected region of the primary screening target region and the reference image based on the first Euclidean distances and the second Euclidean distances of all connected regions of the primary screening target region, wherein the calculation formula is as follows:

wherein, Diff (R)_q) Connected region R representing preliminary screening target region_qThe degree of difference from the reference image,

connected region R representing preliminary screening target region_qIs measured in a first euclidean distance of (c),

connected region R representing preliminary screening target region_qSecond Euclidean distance of (λ)_DiffRepresents a third weight coefficient;

and 5.3) taking the connected region with the minimum difference as the final target region of the current real-time image.

Compared with the prior art, the invention has the following beneficial effects:

1. the method for simulating the human visual system to search the object is based on the visual attention mechanism, and searches possible target areas from two angles of bottom-up (significance detection) and top-down (color matching based on the main color descriptor), so that the accuracy of target area detection in a complex environment is improved.

2. The method performs target area fusion, expansion and screening on the basis of significance detection and color matching based on the main color descriptor, and then performs further screening by using the texture feature descriptor and the Fourier descriptor, thereby improving the detection accuracy.

3. The invention uses the bottom layer visual characteristic descriptor of the image to detect the target area, improves the detection and calculation efficiency of the target area and ensures the real-time performance on low-computing-power equipment.

Drawings

Fig. 1 is a flow chart of a target area detection algorithm based on a visual attention mechanism proposed by the present invention.

Fig. 2 is an example flow and result diagram of the target area detection algorithm based on visual attention mechanism proposed in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention will now be further described with reference to the following examples and drawings:

the implementation technical scheme of the invention is as follows:

as shown in fig. 1 and 2, the present invention includes the following steps:

1) calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of a reference image to respectively describe the color, the texture and the shape of the reference image, wherein the main color descriptor, the texture feature descriptor and the Fourier descriptor form a bottom layer visual feature descriptor; the reference image is the target object in the real-time image.

Extracting color, texture and shape features in a reference image based on pixels, wherein the step 1) specifically comprises the following steps:

for color features, a main color selection method based on a quantized HSV color space is provided, and the main color features of a reference image are described more accurately.

1.1) converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image by the following formula to obtain the color value of each pixel in the reference image in the quantized HSV color space;

L＝9H+3S+V

wherein h, s, v represent hue, saturation and value components of the pixel in the HSV color space, H, S, V represent hue, saturation and value components of the pixel in the quantized HSV color space, respectively, and L represents a color value of the current pixel in the quantized HSV color space.

1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color (i.e. the quantized color with the highest percentage) as the first main color and recording the first main color as (L)₁,p₁) Wherein L is₁Representing a first dominant color quantization value, p₁Representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogram_nSatisfies p_n>0.5p₁Then the quantized color is taken as the nth main color and is noted as (L)_n,p_n)，L_nRepresenting the nth dominant color quantization value until p is not satisfied_n>0.5p₁Is quantized or the sum of the percentages of the main colors in the color histogram

Wherein N represents the number of dominant colors, the dominant colors being formed by a plurality of dominant colorsDescriptor D_DC；

D_DC＝{(L_n,p_n),i＝1,2…N}

2) Performing superpixel segmentation on a real-time image by using a Simple Linear Iterative Clustering (SLIC) algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the significant value of each superpixel by using a bottom-up data driving method, and forming a significant image of the current real-time image by using the significant value of each superpixel;

the step 2) is specifically as follows:

the bottom-up data driving method extracts bottom visual features of an image, such as color, texture, shape and the like, calculates contrast between image pixels or regions based on the features to obtain a saliency map of corresponding features, and fuses the saliency maps of the features to obtain a final saliency region.

2.1) performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the real-time image and corresponding label images, wherein a superpixel set S is formed by the plurality of superpixels, and the requirements of the superpixels are met

K represents the total number of superpixels, and K is 300 in the embodiment, S_iRepresenting the ith superpixel in the superpixel set S, wherein i represents the superpixel serial number;

2.3) including edge pixels p of the real-time image in the superpixel set_eAs an edge superpixel set E, satisfies

S_uRepresenting the u-th superpixel, P, in the edge superpixel set E_uRepresenting the pixel of the u-th superpixel in the edge superpixel set E. Calculating the image edge main color of the current real-time image based on the edge super pixel set E;

calculating the image edge main color of the current real-time image based on the edge super-pixel set E in the step 2.3), specifically:

s3: repeating the step S2, and traversing and calculating all the quantized edge colors L^EThe percentage of the color histogram in the edge super-pixel set E;

s4: deleting the quantized edge color in the color histogram which is the same as the main color in the main color descriptor, and taking the percentage of the quantized edge color which is obtained by calculation in S3 and accounts for more than or equal to 20 percent in the color histogram of the edge super-pixel set E as the image edge main color D of the current real-time image_EDCDescribing the background information of the real-time image by using the edge dominant color of the image;

wherein M is the number of colors of the dominant color of the edge of the image and satisfies M ∈ {0,1,2,3,4,5},

representing the mth quantized edge color of the image edge dominant colors,

the requirement that the current quantized edge color and the color close to the edge color are not the main color of the reference image is met, namely

wherein, Sal_E(S_k) Representing a super-pixel S_kRelative to the significant value of the background information, | | represents the operation of taking the absolute value, min represents the minimum valueOperation, Sal_C(S_k) Representing a super-pixel S_kBased on the saliency value of the contrast of,

is the mth quantized edge color in the image edge dominant colors,

respectively representing super-pixels S_iLuminance in the CIELab color space and the average of the first and second color channels, the first color channel from dark green (low luminance value) to gray (medium luminance value) to bright pink red (high luminance value), the second color channel from bright blue (low luminance value) to gray (medium luminance value) to yellow (high luminance value),

representing a super-pixel S_iIs the coordinate of the geometric center of (b) in the current real-time image, λ_posThe coefficient is used for adjusting the influence of the spatial distance on the significant value based on the contrast, and i is the super pixel serial number in the super pixel set;

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

wherein Sal (S)_k) Representing a super-pixel S_kOf Sal'_E(S_k) Representing a super-pixel S_kNormalized significant value, Sal 'against background information'_C(S_k) Representing a super-pixel S_kIs based on a normalized saliency value, lambda, of the contrast_Sal1Is a first weight coefficient, λ_Sal2Is a second weight coefficient and satisfies lambda_Sal1+λ_Sal21 is ═ 1; in this example, λ is selected_Sal1＝0.3，λ_Sal2＝0.7。

3) Aiming at the color matching problem, according to the average color of the main color descriptor and each superpixel of the reference image in the quantized HSV color space, calculating the similarity between each superpixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor, and obtaining the similarity value of each superpixel of the current real-time image; wherein, the super pixel with the similarity value of 1 is a region with the color similar to that of the reference image;

representing a superpixel S in a current real-time image_kAverage color of L_DC∈D_DC，L_DCRepresenting the dominant color in a descriptor of the dominant color of the reference image, D_DCA primary color descriptor representing a reference image; | | denotes an absolute value taking operation, min () denotes a minimum value taking operation, th_SimIs the similarity threshold. When in use

And D_DCWhen the minimum value of the difference values of all the main colors is less than or equal to the threshold value, the color is considered as S_kBelonging to a target area of color matching. A binary image can be obtained through the similarity of the super pixels, and when the similarity value is 1, the binary image is displayed as white in the image, and when the similarity value is 0, the binary image is displayed as black in the image. The white part in the image is the color matching region. In this example, get th_Sim＝2。

4) Aiming at the fusion problem of the saliency map and the color matching region, according to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, carrying out region expansion on the saliency map of the current real-time image by using a region fusion expansion method based on the similarity and the saliency value, and then obtaining a possible target region of the current real-time image;

the step 4) is specifically as follows:

4.1) judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final significant value of the current super pixel in the significant map of the current real-time image is greater than the initial significant value threshold, in the embodiment, the initial significant value threshold is 0.2. Taking the current super pixel as an undetermined target area, otherwise, not determining the current super pixel as the undetermined target area; traversing all super pixels, and forming an initial target area of the current real-time image by all the target areas to be determined; initial target area

Can be expressed as:

wherein, Sim (S)_j) Representing a super-pixel S_jSimilarity value of (2), Sim (S)_j) A super-pixel S is described_jDegree of similarity to the reference image, Sal (S)_j) Representing a super-pixel S_jOf Sal (S)_j) A super-pixel S is described_jS which is considered to be similar to the target object and has a higher significance degree in the real-time image_jBelonging to the target area to be sought, th_SalTo determine whether the final saliency value meets the required threshold.

4.2) the initial target area usually cannot contain the whole target object, and is expanded based on the saliency map. Carrying out target area expansion on an adjacent super-pixel set of each super-pixel of the initial target area, if the final significant value of the current adjacent super-pixel in a significant image of the current real-time image is greater than a current significant value threshold value, enabling the current adjacent super-pixel to belong to a target area to be expanded, traversing the adjacent super-pixel set of each super-pixel of the initial target area, forming an expanded target area by all target areas to be expanded, and improving the current significant value threshold value;

after the t-th expansionA significance threshold.

4.4) repeating the step 4.3) until no new expansion target area is generated, fusing the initial target area and all the expansion target areas to form a possible target area R 'of the current real-time image'_TSatisfy the following requirements

Wherein

The initial target area is represented by a representation of,

and (3) showing an expansion target area in the nth expansion.

The step 5) is specifically as follows:

5.1) the possible target area of the current real-time image consists of a plurality of connected areas, the size of one connected area is represented by the number of pixels in the connected area, the number of pixels in each connected area is calculated, and the connected area with the number of pixels more than the preset proportion of the total number of pixels of the current real-time image is taken as a primary screening target area; in a specific implementation, the predetermined ratio is 20%.

5.2) calculating the texture feature descriptor and the Fourier descriptor of each connected region of the preliminary screening target region by the method for calculating the texture feature descriptor and the Fourier descriptor in the step 1), respectively calculating a first Euclidean distance between the texture feature descriptor and the texture feature descriptor of the reference image of each connected region of the preliminary screening target region and a second Euclidean distance between the Fourier descriptor and the Fourier descriptor of the reference image of each connected region of the preliminary screening target region, and calculating the difference between each connected region of the preliminary screening target region and the reference image based on the first Euclidean distance and the second Euclidean distance of each connected region of the preliminary screening target region, wherein the calculation formula is as follows:

connected region R representing preliminary screening target region_qSecond Euclidean distance of (λ)_DiffRepresents a third weight coefficient for adjusting the front and rear specific gravities, in this embodiment, λ is taken_Diff＝0.5；

5.3) taking the connected region with the minimum difference as the final target region of the current real-time image

The above-mentioned contents are only technical ideas of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical ideas proposed by the present invention fall within the protection scope of the claims of the present invention.

Claims

1. An image target area detection method based on a visual attention mechanism is characterized by comprising the following steps:

2. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 1) is specifically as follows:

1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color as a first main color and marking the first quantized color as (L)₁,p₁) Wherein L is₁Representing a first dominant color quantized value of the color,p₁representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogram_nSatisfies p_n>0.5p₁Then the quantized color is taken as the nth main color and is noted as (L)_n,p_n)，L_nRepresenting the nth dominant color quantization value until p is not satisfied_n>0.5p₁Is quantized or the sum of the percentages of the main colors in the color histogram

3. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 2) is specifically as follows:

2.3) including edges of real-time images in the superpixel setPixel p_eThe plurality of super pixels are used as an edge super pixel set E, and the image edge main color of the current real-time image is calculated based on the edge super pixel set E;

is the mth quantized edge color in the image edge dominant colors,

Sal(S_k)＝λ_Sal1Sal'_E(S_k)+λ_Sal2Sal'_C(S_k)

4. The method according to claim 3, wherein the step 2.3) of calculating the dominant color of the image edge of the current real-time image based on the edge superpixel set E specifically comprises:

5. The method according to claim 1, wherein the similarity value of each super pixel of the current real-time image in step 3) is calculated as follows:

6. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 4) is specifically as follows:

7. The method for detecting image target area based on visual attention mechanism according to claim 6, wherein the threshold calculation formula for increasing the current saliency value threshold in steps 4.2) and 4.3) is specifically:

representing the significance threshold after the t-th expansion.

8. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 5) is specifically as follows: