CN114359323B - Image target area detection method based on visual attention mechanism - Google Patents

Image target area detection method based on visual attention mechanism Download PDF

Info

Publication number
CN114359323B
CN114359323B CN202210021568.6A CN202210021568A CN114359323B CN 114359323 B CN114359323 B CN 114359323B CN 202210021568 A CN202210021568 A CN 202210021568A CN 114359323 B CN114359323 B CN 114359323B
Authority
CN
China
Prior art keywords
color
super
superpixel
image
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210021568.6A
Other languages
Chinese (zh)
Other versions
CN114359323A (en
Inventor
黄方昊
江佳诚
杨霄
陈正
聂勇
唐建中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210021568.6A priority Critical patent/CN114359323B/en
Publication of CN114359323A publication Critical patent/CN114359323A/en
Application granted granted Critical
Publication of CN114359323B publication Critical patent/CN114359323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image target area detection method based on a visual attention mechanism. Comprising the following steps: calculating an underlying visual feature descriptor of the reference image; then, super-pixel segmentation is carried out on the real-time image by utilizing an SLIC method, regional description is carried out on the current real-time image information, then, the salient values of all the super-pixels are calculated by utilizing a bottom-up data driving method, and the salient values of all the super-pixels form a salient map of the real-time image; calculating the similarity between each super pixel of the real-time image and the reference image by using a fuzzy matching method; then, a region fusion expansion method based on similarity and a salient value is utilized to carry out region expansion on a salient image of the real-time image, and a possible target region of the real-time image is obtained; and screening the possible target areas of the real-time image to obtain a final target area of the current real-time image. The invention improves the accuracy of target area detection in a complex environment and ensures the real-time performance of running on low-power equipment.

Description

Image target area detection method based on visual attention mechanism
Technical Field
The invention belongs to a target area detection method of an image in the field of computer vision, and particularly relates to a target area detection method of an image based on a visual attention mechanism.
Background
Target area detection is an important image processing technology in computer vision, and the target area is obtained through target area detection, so that computing resources can be concentrated, irrelevant information interference is reduced, and the efficiency and accuracy of image processing are improved. Research on target area detection algorithms has been a popular direction in the field of computer vision.
Early target region detection algorithms were based on manual feature construction, describing the information of image regions by manually designing a region feature description method, and performing detection based on simple detection methods such as sliding window detectors, e.g., VJ detectors proposed by p.viola and m.jones et al, HOG detectors proposed by n.dalal and b.triggs et al, etc.; on this basis, P.Felzenszwalb et al propose a detection model based on deformable components, and use a hybrid model to detect objects that may have significant changes, achieving a better effect in the traditional target area detection algorithm.
With the development of convolutional neural networks and deep learning, a target region detection algorithm based on deep learning enters a new stage. The target area detection algorithm based on deep learning is divided into a secondary detection method and a primary detection method according to a processing flow, wherein the secondary detection method comprises an R-CNN (R-CNN) proposed by R.Girshick et al, a FAST RCNN (R-CNN), SPPNet proposed by K.He et al and a FASTER RCNN detector proposed by S.ren et al, candidate frames are firstly generated, and then the candidate frames are detected through a model obtained through training; the first-level detection method comprises a YOLO, a SSD, and the like proposed by R.Joseph et al, W.Liu et al, and a neural network is applied to the whole picture to detect.
The target area detection can be realized based on the traditional characteristics and the neural network method, but the method has a certain problem: the traditional target area detection algorithm needs to design a relatively complex characteristic description mode, and the detection accuracy is relatively low; the target area detection algorithm based on the neural network needs a large amount of data set and takes a large amount of time to train, and has high requirement on the computing power of the system during detection, and is not suitable for mobile equipment with low computing power represented by augmented reality equipment.
Disclosure of Invention
Aiming at the application problem of the current target area detection algorithm on low-computational-power mobile equipment, the invention provides an image target area detection method based on a visual attention mechanism.
In order to achieve the above purpose, the technical scheme of the invention specifically comprises the following steps:
the invention comprises the following steps:
1) Calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of the reference image;
2) Performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on the current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the salient values of each superpixel by using a bottom-up data driving method, and forming a salient map of the current real-time image by the salient values of each superpixel;
3) Calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the main color descriptor of the reference image and the average color of each super pixel in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;
4) According to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, performing region expansion on the salient image of the current real-time image by using a region fusion expansion method based on the similarity and the salient value, and then obtaining a possible target region of the current real-time image;
5) And carrying out preliminary screening on a possible target area of the current real-time image based on the area, and further carrying out regional screening on the preliminary screening target area by utilizing texture feature descriptors and Fourier descriptors of the reference image to obtain a final target area of the current real-time image.
The step 1) specifically comprises the following steps:
1.1 Converting the colors of the reference image from the RGB color space to the HSV color space, and quantizing the HSV color space of the reference image to obtain color values of each pixel in the quantized HSV color space;
1.2 Calculating a color histogram of the reference image based on the color values of the pixels in the quantized HSV color space, calculating the percentage of each quantized color according to the color histogram, arranging the quantized colors in descending order, taking the first quantized color as the first main color and recording as (L 1,p1), wherein L 1 represents the quantized value of the first main color, p 1 represents the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected percentage p n of the nth quantized color in the color histogram satisfies p n>0.5p1, taking the quantized color as the nth main color and recording as (L n,pn),Ln represents the quantized value of the nth main color until the quantized color of p n>0.5p1 is not satisfied or the sum of the percentages of the main colors in the color histogram Wherein N represents the number of main colors, and a main color descriptor D DC is formed by a plurality of main colors;
1.3 A gray level co-occurrence matrix of the reference image is calculated, and corresponding angular second moment, inverse difference moment, entropy and contrast are calculated based on the gray level co-occurrence matrix and used as texture feature descriptors of the reference image;
1.4 The complex form of the reference image is subjected to discrete Fourier transformation and normalization to obtain a Fourier descriptor of the reference image.
The step 2) specifically comprises the following steps:
2.1 Performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels and corresponding label graphs of the real-time image, wherein the superpixels form a superpixel set;
2.2 Processing the super-pixel set based on the label graph to obtain a pixel set, color characteristics and position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each superpixel in the CIELab color space and an average color in the quantized HSV color space; the position information is the geometric center of each superpixel and the adjacent superpixel set of each superpixel;
2.3 Taking a plurality of super pixels containing edge pixels p e of the real-time image in the super pixel set as an edge super pixel set E, and calculating the image edge main color of the current real-time image based on the edge super pixel set E;
2.4 Calculating a contrast ratio between each superpixel and each superpixel based on a set of pixels of each superpixel, an average value of each color component of each superpixel in a CIELab color space, and a geometric center of each superpixel; then, based on the average color of the current superpixel in the quantized HSV color space, respectively calculating the salient value of the current superpixel relative to the background information and the salient value based on the contrast by combining the contrast between the current superpixel and each superpixel and the main color of the image edge through the following formulas:
where Sal E(Sk) represents the salient value of the superpixel S k relative to the background information, |represents the absolute value taking operation, min represents the minimum value taking operation, sal C(Sk) represents the contrast-based salient value of the superpixel S k, Represents the average color of the superpixel S k in the quantized HSV color space,Is the mth quantized edge color of the image edge dominant colors,Respectively representing the luminance of the super-pixel S k in the CIELab color space and the average value of the first and second color channels,Respectively representing the luminance of the super-pixel S i in the CIELab color space and the average value of the first and second color channels,Representing the coordinates of the geometric center of the superpixel S k in the current real-time image,Representing the coordinates of the geometric center of the superpixel S i in the current real-time image, λ pos is a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is the superpixel number in the superpixel set;
2.5 Repeating the step 2.4), traversing the residual super-pixels, and calculating and obtaining the salient values of the residual super-pixels relative to the background information and the salient values based on the contrast;
2.6 Respectively carrying out normalization processing on the salient values of the super pixels relative to the background information and the salient values based on the contrast ratio, and carrying out linear fusion to obtain the final salient values of the super pixels, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
Where Sal (S k) represents the final saliency value of superpixel S k, sal 'E(Sk represents the normalized saliency value of superpixel S k with respect to background information, sal' C(Sk represents the contrast-based normalized saliency value of superpixel S k, λ Sal1 is a first weight coefficient, λ Sal2 is a second weight coefficient, and λ Sal1Sal2 =1 is satisfied;
2.7 After normalizing the final significant value of each super pixel to the [0,255] interval, carrying out gray scale assignment on the corresponding super pixel based on the normalized final significant value, and obtaining the significant map of the current real-time image.
In the step 2.3), calculating the image edge main color of the current real-time image based on the edge super-pixel set E, specifically:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
S2: the quantized edge color L E in the color histogram of the edge super-pixel set E meets the requirement of L E E [0,71], and the sum of the percentages of the neighborhood colors { L E-1,LE,LE +1} of the current quantized edge color L E in the color histogram of the edge super-pixel set E is taken as the percentage of the current quantized edge color L E in the color histogram of the edge super-pixel set E;
S3: repeating the step S2, and traversing to calculate the percentage of all quantized edge colors L E in the color histogram of the edge super-pixel set E;
S4: deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, and taking the percentage of the quantized edge color obtained in the S3 in the color histogram of the edge super-pixel set E as the image edge main color of the current real-time image, wherein the percentage is more than or equal to 20%.
The calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:
Wherein Sim (S k) represents the similarity value of the super pixel S k, Representing the average color of super pixel S k in the current live image, L DC∈DDC,LDC representing the dominant color in the dominant color descriptor of the reference image, and D DC representing the dominant color descriptor of the reference image; the absolute value operation is denoted by the absolute value, the minimum value operation is denoted by min (), and th Sim is the similarity threshold.
The step 4) is specifically as follows:
4.1 Judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final salient value of the current super pixel in the salient image of the current real-time image is larger than the initial salient value threshold, taking the current super pixel as the target area to be determined, otherwise, not the target area to be determined; traversing all super pixels, and forming an initial target area of the current real-time image by all target areas to be determined;
4.2 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the initial target region, if the final salient value of the current adjacent super-pixel in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixel belongs to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, forming an expansion target region by all the target regions to be expanded, and improving the current salient value threshold;
4.3 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the current expansion target region, if the final salient value of the current adjacent super-pixels in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixels belong to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, obtaining a new expansion target region and improving the current salient value threshold;
4.4 Repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all expansion target areas to form a possible target area of the current real-time image.
The threshold calculation formulas for increasing the current significant value threshold in the steps 4.2) and 4.3) are specifically as follows:
where th Sal denotes the initial saliency threshold, And representing the saliency value threshold after the t th expansion.
The step 5) specifically comprises the following steps:
5.1 The possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number more than the preset proportion of the total number of the pixels of the current real-time image are taken as primary screening target areas;
5.2 Calculating texture feature descriptors and Fourier descriptors of all connected areas of the preliminary screening target area, respectively calculating a first Euclidean distance between the texture feature descriptors of all connected areas of the preliminary screening target area and the texture feature descriptors of the reference image and a second Euclidean distance between the Fourier descriptors of all connected areas of the preliminary screening target area and the Fourier descriptors of the reference image, and calculating the difference degree between all connected areas of the preliminary screening target area and the reference image based on the first Euclidean distance and the second Euclidean distance of all connected areas of the preliminary screening target area, wherein the calculation formula is as follows:
Wherein Diff (R q) represents the degree of difference between the connected region R q of the preliminary screening target region and the reference image, A first Euclidean distance of the connected region R q representing the preliminary screening target region,A second euclidean distance of the connected region R q representing the preliminary screening target region, λ Diff representing a third weight coefficient;
5.3 The connected region with the smallest difference degree is used as the final target region of the current real-time image.
Compared with the prior art, the invention has the following beneficial effects:
1. The method simulates the human visual system to search the object, searches possible target areas from two angles of bottom up (significance detection) and top down (color matching based on the main color descriptor) based on a visual attention mechanism, and improves the accuracy of target area detection in a complex environment.
2. The invention carries out fusion, expansion and screening of the target area on the basis of significance detection and color matching based on the main color descriptor, and further screens by using the texture feature descriptor and the Fourier descriptor, thereby improving the detection accuracy.
3. According to the invention, the target area detection is carried out by using the bottom visual feature descriptors of the images, so that the target area detection calculation efficiency is improved, and the real-time performance on low-calculation-force equipment is ensured.
Drawings
Fig. 1 is a flow chart of a target area detection algorithm based on a visual attention mechanism according to the present invention.
Fig. 2 is an example flow chart and a result chart of a target area detection algorithm based on a visual attention mechanism according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
The invention will now be further described with reference to examples, figures:
the technical scheme of the invention is as follows:
as shown in fig. 1 and 2, the present invention includes the steps of:
1) Calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of the reference image to describe the color, the texture and the shape of the reference image respectively, wherein the main color descriptor, the texture feature descriptor and the Fourier descriptor form a bottom visual feature descriptor; the reference image is a target object in the real-time image.
Extracting color, texture and shape characteristics in a reference image based on pixels, wherein the step 1) specifically comprises the following steps:
For color features, a main color selection method based on a quantized HSV color space is provided, and the main color features of a reference image are described more accurately.
1.1 Converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image through the following formula to obtain the color value of each pixel in the quantized HSV color space;
L=9H+3S+V
Where h, s, v represent hue, saturation, and brightness components of the pixel in the HSV color space, respectively, H, S, V represent hue, saturation, and brightness components of the pixel in the quantized HSV color space, respectively, and L represents the color value of the current pixel in the quantized HSV color space.
1.2 Calculating a color histogram of the reference image based on color values of each pixel in the quantized HSV color space of the reference image, calculating a percentage of each quantized color according to the color histogram, arranging in descending order, taking a first quantized color (i.e., a quantized color with the highest percentage) as a first main color and recording as (L 1,p1), wherein L 1 represents a quantized value of the first main color, p 1 represents a percentage of the first main color in the color histogram, sequentially selecting an nth quantized color, and when the selected percentage p n of the nth quantized color in the color histogram satisfies p n>0.5p1, taking the quantized color as the nth main color and recording as (L n,pn),Ln represents a quantized value of the nth main color until the quantized color of p n>0.5p1 is not satisfied or a sum of percentages of the main colors in the color histogram is not satisfiedWherein N represents the number of main colors, and a main color descriptor D DC is formed by a plurality of main colors;
DDC={(Ln,pn),i=1,2…N}
1.3 A gray level co-occurrence matrix of the reference image is calculated, and corresponding angular second moment, inverse difference moment, entropy and contrast are calculated based on the gray level co-occurrence matrix and used as texture feature descriptors of the reference image;
1.4 The complex form of the reference image is subjected to discrete Fourier transformation and normalization to obtain a Fourier descriptor of the reference image.
2) Performing superpixel segmentation on a real-time image by using a Simple Linear Iterative Clustering (SLIC) algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on the current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the salient values of each superpixel by using a bottom-up data driving method, and forming a salient map of the current real-time image by the salient values of each superpixel;
the step 2) is specifically as follows:
The bottom-up data driving method is used for obtaining a saliency map of corresponding features by extracting bottom visual features of an image, such as colors, textures, shapes and the like, calculating the contrast between pixels or areas of the image based on the features, and obtaining a final saliency area by fusing the saliency maps of the features.
2.1 Super-pixel segmentation is carried out on the real-time image through a simple linear iterative clustering algorithm, a plurality of super-pixels and corresponding label images of the real-time image are obtained, and a super-pixel set S is formed by the plurality of super-pixels, so that the requirements ofK represents the total number of super pixels, in this embodiment, K is 300, S i represents the ith super pixel in the super pixel set S, and i represents the super pixel serial number;
2.2 Processing the super-pixel set based on the label graph to obtain a pixel set, color characteristics and position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each superpixel in the CIELab color space and an average color in the quantized HSV color space; the position information is the geometric center of each superpixel and the adjacent superpixel set of each superpixel;
2.3 Using a plurality of super pixels including an edge pixel p e of the real-time image in the super pixel set as an edge super pixel set E, satisfying the following conditions S u denotes the u-th superpixel in the edge superpixel set E, and P u denotes the pixel of the u-th superpixel in the edge superpixel set E. Calculating the image edge main color of the current real-time image based on the edge super-pixel set E;
in the step 2.3), calculating the image edge main color of the current real-time image based on the edge super-pixel set E, wherein the image edge main color specifically comprises the following steps:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
S2: the quantized edge color L E in the color histogram of the edge super-pixel set E meets the requirement of L E E [0,71], and the sum of the percentages of the neighborhood colors { L E-1,LE,LE +1} of the current quantized edge color L E in the color histogram of the edge super-pixel set E is taken as the percentage of the current quantized edge color L E in the color histogram of the edge super-pixel set E;
S3: repeating the step S2, and traversing to calculate the percentage of all quantized edge colors L E in the color histogram of the edge super-pixel set E;
S4: deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, taking 20% or more of the quantized edge color obtained by calculation in the step S3 in the color histogram of the edge super-pixel set E as an image edge main color D EDC of the current real-time image, and describing background information of the real-time image by using the image edge main color;
Where M is the number of colors of the image edge dominant color and satisfies M ε {0,1,2,3,4,5}, Represents the mth quantized edge color of the image edge dominant colors,Satisfying that the current quantized edge color and the color close thereto are not the main color of the reference image, i.e
2.4 Calculating a contrast ratio between each superpixel and each superpixel based on a set of pixels of each superpixel, an average value of each color component of each superpixel in a CIELab color space, and a geometric center of each superpixel; then, based on the average color of the current superpixel in the quantized HSV color space, respectively calculating the salient value of the current superpixel relative to the background information and the salient value based on the contrast by combining the contrast between the current superpixel and each superpixel and the main color of the image edge through the following formulas:
where Sal E(Sk) represents the salient value of the superpixel S k relative to the background information, |represents the absolute value taking operation, min represents the minimum value taking operation, sal C(Sk) represents the contrast-based salient value of the superpixel S k, Represents the average color of the superpixel S k in the quantized HSV color space,Is the mth quantized edge color of the image edge dominant colors,Respectively representing the luminance of the super-pixel S k in the CIELab color space and the average value of the first and second color channels,Representing the brightness of the super pixel S i in the CIELab color space, respectively, as well as the average of the first and second color channels, the first color channel being from dark green (low brightness value) to gray (medium brightness value) to bright pink (high brightness value), the second color channel being from bright blue (low brightness value) to gray (medium brightness value) to yellow (high brightness value),Representing the coordinates of the geometric center of the superpixel S k in the current real-time image,Representing the coordinates of the geometric center of the superpixel S i in the current real-time image, λ pos is a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is the superpixel number in the superpixel set;
2.5 Repeating the step 2.4), traversing the residual super-pixels, and calculating and obtaining the salient values of the residual super-pixels relative to the background information and the salient values based on the contrast;
2.6 Respectively carrying out normalization processing on the salient values of the super pixels relative to the background information and the salient values based on the contrast ratio, and carrying out linear fusion to obtain the final salient values of the super pixels, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
Where Sal (S k) represents the final saliency value of superpixel S k, sal 'E(Sk represents the normalized saliency value of superpixel S k with respect to background information, sal' C(Sk represents the contrast-based normalized saliency value of superpixel S k, λ Sal1 is a first weight coefficient, λ Sal2 is a second weight coefficient, and λ Sal1Sal2 =1 is satisfied; in this embodiment, λ Sal1=0.3,λSal2 =0.7 is selected.
2.7 After normalizing the final significant value of each super pixel to the [0,255] interval, carrying out gray scale assignment on the corresponding super pixel based on the normalized final significant value, and obtaining the significant map of the current real-time image.
3) Aiming at the color matching problem, calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the main color descriptor of the reference image and the average color of each super pixel in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image; wherein, the super pixel with the similarity value of 1 is a region with similar color to the reference image;
the calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:
Wherein Sim (S k) represents the similarity value of the super pixel S k, Representing the average color of super pixel S k in the current live image, L DC∈DDC,LDC representing the dominant color in the dominant color descriptor of the reference image, and D DC representing the dominant color descriptor of the reference image; the absolute value operation is denoted by the absolute value, the minimum value operation is denoted by min (), and th Sim is the similarity threshold. When (when)When the minimum value of the difference values from all the main colors in D DC is less than or equal to the threshold value, S k is considered to belong to the target area of color matching. A binary image can be obtained through the similarity of the super pixels, and the binary image is displayed as white in the image when the similarity value is 1, and is displayed as black in the image when the similarity value is 0. The white portion of the image is the color matching region. Th Sim =2 is taken in this embodiment.
4) Aiming at the fusion problem of the saliency map and the color matching region, according to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, the region fusion expansion method based on the similarity and the saliency value is utilized to expand the region of the saliency map of the current real-time image, and then a possible target region of the current real-time image is obtained;
The step 4) is specifically as follows:
4.1 Whether each superpixel of the current real-time image is a target area or not is determined, and if the similarity value of each superpixel is 1 and the final saliency value of the current superpixel in the saliency map of the current real-time image is greater than an initial saliency value threshold, the initial saliency value threshold is 0.2 in this embodiment. The current super pixel is used as a target area to be determined, otherwise, the current super pixel is not the target area to be determined; traversing all super pixels, and forming an initial target area of the current real-time image by all target areas to be determined; initial target area Can be expressed as:
Wherein Sim (S j) represents the similarity value of the superpixel S j, sim (S j) describes the similarity of the superpixel S j to the reference image, sal (S j) represents the final saliency value of the superpixel S j, sal (S j) describes the saliency of the superpixel S j in the real-time image, S j which is considered to be similar to the target object and has a higher saliency belongs to the target area to be searched for, and th Sal is a threshold value for determining whether the final saliency value satisfies the requirement.
4.2 The initial target region generally cannot contain the entire target object, and is extended based on the saliency map. Expanding a target area of an adjacent super-pixel set of each super-pixel of the initial target area, if the final salient value of the current adjacent super-pixel in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixel belongs to the target area to be expanded, traversing the adjacent super-pixel set of each super-pixel of the initial target area, forming an expansion target area by all the target areas to be expanded, and improving the current salient value threshold;
4.3 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the current expansion target region, if the final salient value of the current adjacent super-pixels in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixels belong to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, obtaining a new expansion target region and improving the current salient value threshold;
The threshold calculation formulas for improving the current significant value threshold in the steps 4.2) and 4.3) are specifically as follows:
where th Sal denotes the initial saliency threshold, And representing the saliency value threshold after the t th expansion.
4.4 Repeating the step 4.3) until no new expansion target area is generated, fusing the initial target area and all expansion target areas to form a possible target area R' T of the current real-time image, and meeting the requirements ofWherein the method comprises the steps ofThe initial target area is indicated as such,And representing the expansion target area in the nth expansion.
5) And carrying out preliminary screening on a possible target area of the current real-time image based on the area, and further carrying out regional screening on the preliminary screening target area by utilizing texture feature descriptors and Fourier descriptors of the reference image to obtain a final target area of the current real-time image.
The step 5) is specifically as follows:
5.1 The possible target area of the current real-time image consists of a plurality of connected areas, the pixel number in the connected areas is used for representing the size of one area, the pixel number of each connected area is calculated, and the connected areas with the pixel number more than the preset proportion of the total number of the pixels of the current real-time image are taken and used as primary screening target areas; in a specific implementation, the preset proportion is 20%.
5.2 Calculating texture feature descriptors and Fourier descriptors of all connected areas of the preliminary screening target area by the method of calculating the texture feature descriptors and the Fourier descriptors in the step 1), respectively calculating first Euclidean distances between the texture feature descriptors of all connected areas of the preliminary screening target area and the texture feature descriptors of the reference image and second Euclidean distances between the Fourier descriptors of all connected areas of the preliminary screening target area and the Fourier descriptors of the reference image, and calculating the difference degree between all connected areas of the preliminary screening target area and the reference image based on the first Euclidean distances and the second Euclidean distances of all connected areas of the preliminary screening target area, wherein the calculation formula is as follows:
Wherein Diff (R q) represents the degree of difference between the connected region R q of the preliminary screening target region and the reference image, A first Euclidean distance of the connected region R q representing the preliminary screening target region,A second euclidean distance of the connected region R q representing the preliminary screening target region, and λ Diff representing a third weight coefficient for adjusting the two specific gravities, in this embodiment, λ Diff =0.5;
5.3 Taking the connected region with the smallest difference degree as the final target region of the current real-time image
The above is only a technical idea of the present invention, and the protection scope of the present invention is not limited by the above, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (8)

1. The image target area detection method based on the visual attention mechanism is characterized by comprising the following steps of:
1) Calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of the reference image;
2) Performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on the current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the salient values of each superpixel by using a bottom-up data driving method, and forming a salient map of the current real-time image by the salient values of each superpixel;
3) Calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the main color descriptor of the reference image and the average color of each super pixel in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;
4) According to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, performing region expansion on the salient image of the current real-time image by using a region fusion expansion method based on the similarity and the salient value, and then obtaining a possible target region of the current real-time image;
5) And carrying out preliminary screening on a possible target area of the current real-time image based on the area, and further carrying out regional screening on the preliminary screening target area by utilizing texture feature descriptors and Fourier descriptors of the reference image to obtain a final target area of the current real-time image.
2. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein the step 1) specifically comprises:
1.1 Converting the colors of the reference image from the RGB color space to the HSV color space, and quantizing the HSV color space of the reference image to obtain color values of each pixel in the quantized HSV color space;
1.2 Calculating a color histogram of the reference image based on the color values of the pixels in the quantized HSV color space, calculating the percentage of each quantized color according to the color histogram, arranging the quantized colors in descending order, taking the first quantized color as the first main color and recording as (L 1,p1), wherein L 1 represents the quantized value of the first main color, p 1 represents the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected percentage p n of the nth quantized color in the color histogram satisfies p n>0.5p1, taking the quantized color as the nth main color and recording as (L n,pn),Ln represents the quantized value of the nth main color until the quantized color of p n>0.5p1 is not satisfied or the sum of the percentages of the main colors in the color histogram Wherein N represents the number of main colors, and a main color descriptor D DC is formed by a plurality of main colors;
1.3 A gray level co-occurrence matrix of the reference image is calculated, and corresponding angular second moment, inverse difference moment, entropy and contrast are calculated based on the gray level co-occurrence matrix and used as texture feature descriptors of the reference image;
1.4 The complex form of the reference image is subjected to discrete Fourier transformation and normalization to obtain a Fourier descriptor of the reference image.
3. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 2) specifically comprises:
2.1 Performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels and corresponding label graphs of the real-time image, wherein the superpixels form a superpixel set;
2.2 Processing the super-pixel set based on the label graph to obtain a pixel set, color characteristics and position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each superpixel in the CIELab color space and an average color in the quantized HSV color space; the position information is the geometric center of each superpixel and the adjacent superpixel set of each superpixel;
2.3 Taking a plurality of super pixels containing edge pixels p e of the real-time image in the super pixel set as an edge super pixel set E, and calculating the image edge main color of the current real-time image based on the edge super pixel set E;
2.4 Calculating a contrast ratio between each superpixel and each superpixel based on a set of pixels of each superpixel, an average value of each color component of each superpixel in a CIELab color space, and a geometric center of each superpixel; then, based on the average color of the current superpixel in the quantized HSV color space, respectively calculating the salient value of the current superpixel relative to the background information and the salient value based on the contrast by combining the contrast between the current superpixel and each superpixel and the main color of the image edge through the following formulas:
where Sal E(Sk) represents the salient value of the superpixel S k relative to the background information, |represents the absolute value taking operation, min represents the minimum value taking operation, sal C(Sk) represents the contrast-based salient value of the superpixel S k, Represents the average color of the superpixel S k in the quantized HSV color space,Is the mth quantized edge color of the image edge dominant colors,Respectively representing the luminance of the super-pixel S k in the CIELab color space and the average value of the first and second color channels,Respectively representing the luminance of the super-pixel S i in the CIELab color space and the average value of the first and second color channels,Representing the coordinates of the geometric center of the superpixel S k in the current real-time image,Representing the coordinates of the geometric center of the superpixel S i in the current real-time image, λ pos is a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is the superpixel number in the superpixel set;
2.5 Repeating the step 2.4), traversing the residual super-pixels, and calculating and obtaining the salient values of the residual super-pixels relative to the background information and the salient values based on the contrast;
2.6 Respectively carrying out normalization processing on the salient values of the super pixels relative to the background information and the salient values based on the contrast ratio, and carrying out linear fusion to obtain the final salient values of the super pixels, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
Where Sal (S k) represents the final saliency value of superpixel S k, sal 'E(Sk represents the normalized saliency value of superpixel S k with respect to background information, sal' C(Sk represents the contrast-based normalized saliency value of superpixel S k, λ Sal1 is a first weight coefficient, λ Sal2 is a second weight coefficient, and λ Sal1Sal2 =1 is satisfied;
2.7 After normalizing the final significant value of each super pixel to the [0,255] interval, carrying out gray scale assignment on the corresponding super pixel based on the normalized final significant value, and obtaining the significant map of the current real-time image.
4. A visual attention mechanism-based image target area detection method as recited in claim 3, wherein the calculating of the image edge dominant color of the current real-time image based on the edge super-pixel set E in the step 2.3) specifically includes:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
S2: the quantized edge color L E in the color histogram of the edge super-pixel set E meets the requirement of L E E [0,71], and the sum of the percentages of the neighborhood colors { L E-1,LE,LE +1} of the current quantized edge color L E in the color histogram of the edge super-pixel set E is taken as the percentage of the current quantized edge color L E in the color histogram of the edge super-pixel set E;
S3: repeating the step S2, and traversing to calculate the percentage of all quantized edge colors L E in the color histogram of the edge super-pixel set E;
S4: deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, and taking the percentage of the quantized edge color obtained in the S3 in the color histogram of the edge super-pixel set E as the image edge main color of the current real-time image, wherein the percentage is more than or equal to 20%.
5. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein the calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:
Wherein Sim (S k) represents the similarity value of the super pixel S k, Representing the average color of super pixel S k in the current live image, L DC∈DDC,LDC representing the dominant color in the dominant color descriptor of the reference image, and D DC representing the dominant color descriptor of the reference image; the i represents an absolute value taking operation, the mix () represents a minimum value taking operation, and th Sim is a similarity threshold.
6. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 4) specifically comprises:
4.1 Judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final salient value of the current super pixel in the salient image of the current real-time image is larger than the initial salient value threshold, taking the current super pixel as the target area to be determined, otherwise, not the target area to be determined; traversing all super pixels, and forming an initial target area of the current real-time image by all target areas to be determined;
4.2 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the initial target region, if the final salient value of the current adjacent super-pixel in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixel belongs to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, forming an expansion target region by all the target regions to be expanded, and improving the current salient value threshold;
4.3 Performing target region expansion on adjacent super-pixel sets of all super-pixels of the current expansion target region, if the final salient value of the current adjacent super-pixels in the salient map of the current real-time image is larger than the current salient value threshold, the current adjacent super-pixels belong to the target region to be expanded, traversing the adjacent super-pixel sets of all super-pixels of the initial target region, obtaining a new expansion target region and improving the current salient value threshold;
4.4 Repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all expansion target areas to form a possible target area of the current real-time image.
7. The method for detecting an image target area based on a visual attention mechanism according to claim 6, wherein the threshold calculation formulas for increasing the current saliency value threshold in the steps 4.2) and 4.3) are specifically:
where th Sal denotes the initial saliency threshold, And representing the saliency value threshold after the t th expansion.
8. The method for detecting an image target area based on a visual attention mechanism according to claim 1, wherein said step 5) specifically comprises:
5.1 The possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number more than the preset proportion of the total number of the pixels of the current real-time image are taken as primary screening target areas;
5.2 Calculating texture feature descriptors and Fourier descriptors of all connected areas of the preliminary screening target area, respectively calculating a first Euclidean distance between the texture feature descriptors of all connected areas of the preliminary screening target area and the texture feature descriptors of the reference image and a second Euclidean distance between the Fourier descriptors of all connected areas of the preliminary screening target area and the Fourier descriptors of the reference image, and calculating the difference degree between all connected areas of the preliminary screening target area and the reference image based on the first Euclidean distance and the second Euclidean distance of all connected areas of the preliminary screening target area, wherein the calculation formula is as follows:
Wherein Diff (R q) represents the degree of difference between the connected region R q of the preliminary screening target region and the reference image, A first Euclidean distance of the connected region R q representing the preliminary screening target region,A second euclidean distance of the connected region R q representing the preliminary screening target region, λ Diff representing a third weight coefficient;
5.3 The connected region with the smallest difference degree is used as the final target region of the current real-time image.
CN202210021568.6A 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism Active CN114359323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021568.6A CN114359323B (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021568.6A CN114359323B (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Publications (2)

Publication Number Publication Date
CN114359323A CN114359323A (en) 2022-04-15
CN114359323B true CN114359323B (en) 2024-07-05

Family

ID=81109197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021568.6A Active CN114359323B (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Country Status (1)

Country Link
CN (1) CN114359323B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439688B (en) * 2022-09-01 2023-06-16 哈尔滨工业大学 Weak supervision object detection method based on surrounding area sensing and association
CN116645368B (en) * 2023-07-27 2023-10-03 青岛伟东包装有限公司 Online visual detection method for edge curl of casting film
CN117173175B (en) * 2023-11-02 2024-02-09 湖南格尔智慧科技有限公司 Image similarity detection method based on super pixels

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464256A (en) * 2017-07-05 2017-12-12 河海大学 A kind of target detection differentiates the correlating method of amendment with possibility
WO2020133170A1 (en) * 2018-12-28 2020-07-02 深圳市大疆创新科技有限公司 Image processing method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464256A (en) * 2017-07-05 2017-12-12 河海大学 A kind of target detection differentiates the correlating method of amendment with possibility
WO2020133170A1 (en) * 2018-12-28 2020-07-02 深圳市大疆创新科技有限公司 Image processing method and apparatus

Also Published As

Publication number Publication date
CN114359323A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN114359323B (en) Image target area detection method based on visual attention mechanism
JP3740065B2 (en) Object extraction device and method based on region feature value matching of region-divided video
CN108537239B (en) Method for detecting image saliency target
CN108629783B (en) Image segmentation method, system and medium based on image feature density peak search
CN111666834A (en) Forest fire automatic monitoring and recognizing system and method based on image recognition technology
CN106960182B (en) A kind of pedestrian's recognition methods again integrated based on multiple features
Almogdady et al. A flower recognition system based on image processing and neural networks
JP4098021B2 (en) Scene identification method, apparatus, and program
CN103366178A (en) Method and device for carrying out color classification on target image
CN107705254B (en) City environment assessment method based on street view
CN111583279A (en) Super-pixel image segmentation method based on PCBA
Mythili et al. Color image segmentation using ERKFCM
CN111310768B (en) Saliency target detection method based on robustness background prior and global information
Chen et al. Adaptive fuzzy color segmentation with neural network for road detections
CN105678318A (en) Traffic label matching method and apparatus
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN111091129A (en) Image salient region extraction method based on multi-color characteristic manifold sorting
Reta et al. Color uniformity descriptor: An efficient contextual color representation for image indexing and retrieval
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network
CN111612797B (en) Rice image information processing system
CN113128433A (en) Video monitoring image enhancement method of color migration matching characteristics
Dhingra et al. Clustering-based shadow detection from images with texture and color analysis
CN112037230B (en) Forest image segmentation method based on superpixels and hyper-metric profile map
Dornaika et al. A comparative study of image segmentation algorithms and descriptors for building detection
Wu et al. Vehicle detection in high-resolution images using superpixel segmentation and CNN iteration strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 316021 Zhoushan campus of Zhejiang University, No.1 Zheda Road, Dinghai District, Zhoushan City, Zhejiang Province

Applicant after: ZHEJIANG University

Address before: 310058 Yuhang Tang Road, Xihu District, Hangzhou, Zhejiang 866

Applicant before: ZHEJIANG University

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant