CN114359323A - Image target area detection method based on visual attention mechanism - Google Patents

Image target area detection method based on visual attention mechanism Download PDF

Info

Publication number
CN114359323A
CN114359323A CN202210021568.6A CN202210021568A CN114359323A CN 114359323 A CN114359323 A CN 114359323A CN 202210021568 A CN202210021568 A CN 202210021568A CN 114359323 A CN114359323 A CN 114359323A
Authority
CN
China
Prior art keywords
color
super
pixel
image
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021568.6A
Other languages
Chinese (zh)
Inventor
黄方昊
江佳诚
杨霄
陈正
聂勇
唐建中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210021568.6A priority Critical patent/CN114359323A/en
Publication of CN114359323A publication Critical patent/CN114359323A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses an image target area detection method based on a visual attention mechanism. The method comprises the following steps: calculating a bottom layer visual characteristic descriptor of the reference image; secondly, performing super-pixel segmentation on the real-time image by using an SLIC method, performing regionalized description on current real-time image information, calculating a significant value of each super-pixel by using a bottom-up data driving method, and forming a significant graph of the real-time image by using the significant value of each super-pixel; then, calculating the similarity between each super pixel of the real-time image and the reference image by using a fuzzy matching method; then, carrying out region expansion on the saliency map of the real-time image by using a region fusion expansion method based on the similarity and the saliency value to obtain a possible target region of the real-time image; and screening the possible target area of the real-time image to obtain the final target area of the current real-time image. The invention improves the accuracy of target area detection in a complex environment and simultaneously ensures the real-time performance of operation on low-computing-power equipment.

Description

Image target area detection method based on visual attention mechanism
Technical Field
The invention belongs to a method for detecting a target area of an image in the field of computer vision, and particularly relates to a method for detecting a target area of an image based on a visual attention mechanism.
Background
The target area detection is an important image processing technology in computer vision, and the target area is obtained through the target area detection, so that the computing resources can be concentrated, the interference of irrelevant information is reduced, and the efficiency and the accuracy of image processing are improved. The search for target area detection algorithms has been a popular direction in the field of computer vision.
An early target region detection algorithm is constructed based on manual features, describes information of an image region by manually designing a region feature description method, and detects based on simple detection methods such as a sliding window detector and the like, for example, VJ detectors proposed by p.viola and m.jones et al, HOG detectors proposed by n.dalal and b.triggs et al, and the like; on the basis, P.Felzenszwalb et al propose a detection model based on a deformable component, and use a hybrid model to detect and process objects which may have significant changes, so that a better effect is achieved in a traditional target region detection algorithm.
With the development of convolutional neural networks and deep learning, the target region detection algorithm based on deep learning enters a new stage. The target area detection algorithm based on deep learning is further divided into a secondary detection method and a primary detection method according to the processing flow, the secondary detection method comprises R-CNN and FAST RCNN which are proposed by R.Girshick and the like, and SPPNet and S.ren which are proposed by K.He and the like, and FAST RCNN detectors which are proposed by S.ren and the like, and the candidate frames are firstly generated and then detected through a model obtained through training; the first-level detection method comprises YOLO proposed by r.joseph et al, SSD proposed by w.liu et al, and the like, and the neural network is applied to the whole picture for detection.
The target area detection can be realized by methods based on traditional features and neural networks, but certain problems exist: the traditional target area detection algorithm needs to design a relatively complex feature description mode, and the detection precision is relatively low; target area detection algorithms based on neural networks all require a large amount of data sets and spend a large amount of time for training, have high requirements on the computing power of the system during detection, and are not suitable for mobile devices with low computing power represented by augmented reality devices.
Disclosure of Invention
The invention provides an image target area detection method based on a visual attention mechanism, aiming at the application problem of the current target area detection algorithm on low-computing-power mobile equipment.
In order to achieve the purpose, the technical scheme of the invention comprises the following specific contents:
the invention comprises the following steps:
1) calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of a reference image;
2) performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the significant value of each superpixel by using a bottom-up data driving method, and forming a significant image of the current real-time image by using the significant value of each superpixel;
3) calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the average color of the main color descriptor and each super pixel of the reference image in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;
4) according to the similarity value of each super pixel of the current real-time image and an adjacent super pixel set of each super pixel, carrying out region expansion on a saliency map of the current real-time image by using a region fusion expansion method based on the similarity and the saliency value to obtain a possible target region of the current real-time image;
5) and carrying out preliminary screening on a possible target region of the current real-time image based on the region area, and further carrying out regional screening on the preliminary screening target region by utilizing a textural feature descriptor and a Fourier descriptor of a reference image to obtain a final target region of the current real-time image.
The step 1) is specifically as follows:
1.1) converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image to obtain the color value of each pixel in the reference image in the quantized HSV color space;
1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color as a first main color and marking the first quantized color as (L)1,p1) Wherein L is1Representing a first dominant color quantization value, p1Representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogramnSatisfies pn>0.5p1Then the quantized color is taken as the nth main color and is noted as (L)n,pn),LnRepresenting the nth dominant color quantization value until p is not satisfiedn>0.5p1Is quantized or the sum of the percentages of the main colors in the color histogram
Figure BDA0003462810630000031
Wherein N represents the number of dominant colors, and the multiple dominant colors constitute a dominant color descriptor DDC
1.3) calculating a gray level co-occurrence matrix of the reference image, calculating corresponding angle second moment, inverse difference moment, entropy and contrast based on the gray level co-occurrence matrix and using the corresponding angle second moment, inverse difference moment, entropy and contrast as texture feature descriptors of the reference image;
1.4) obtaining a Fourier descriptor of the reference image after discrete Fourier transform and normalization of the complex form of the reference image.
The step 2) is specifically as follows:
2.1) performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the real-time image and corresponding label images, and forming a superpixel set by the plurality of superpixels;
2.2) processing the super-pixel set based on the label graph to obtain the pixel set, the color characteristic and the position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each super-pixel in a CIELab color space and an average color in a quantized HSV color space; the position information is a geometric center of each super pixel and an adjacent super pixel set of each super pixel;
2.3) including edge pixels p of the real-time image in the superpixel seteThe plurality of super pixels are used as an edge super pixel set E, and the image edge main color of the current real-time image is calculated based on the edge super pixel set E;
2.4) calculating the contrast between each super pixel and each super pixel based on the pixel set of each super pixel, the average value of each color component of each super pixel in the CIELab color space and the geometric center of each super pixel; then, based on the average color of the current super-pixel in the quantized HSV color space, respectively calculating the significant value of the current super-pixel relative to the background information and the significant value based on the contrast by combining the contrast between the current super-pixel and each super-pixel and the main color of the image edge through the following formulas:
Figure BDA0003462810630000032
Figure BDA0003462810630000033
wherein, SalE(Sk) Representing a super-pixel SkWith respect to the significant value of the background information, | | represents an absolute value taking operation, min represents a minimum value taking operation, SalC(Sk) Representing a super-pixel SkBased on the saliency value of the contrast of,
Figure BDA0003462810630000034
representing a super-pixel SkThe average color in the quantized HSV color space,
Figure BDA0003462810630000041
is the mth quantized edge color in the image edge dominant colors,
Figure BDA0003462810630000042
respectively representing super-pixels SkThe luminance in the CIELab color space and the average values of the first and second color channels,
Figure BDA0003462810630000043
respectively representing super-pixels SiThe luminance in the CIELab color space and the average values of the first and second color channels,
Figure BDA0003462810630000044
representing a super-pixel SkThe coordinates of the geometric center of (a) in the current real-time image,
Figure BDA0003462810630000045
representing a super-pixel SiIs the coordinate of the geometric center of (b) in the current real-time image, λposIs a coefficient that adjusts the spatial distance to affect the contrast-based saliency value, i is a superpixel in the superpixel setA serial number;
2.5) repeating the step 2.4), traversing the residual superpixels, and calculating and obtaining significant values of the residual superpixels relative to background information and significant values based on contrast;
2.6) respectively carrying out normalization processing and linear fusion on the significant value of each super pixel relative to the background information and the significant value based on the contrast to obtain the final significant value of each super pixel, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
wherein Sal (S)k) Representing a super-pixel SkOf Sal'E(Sk) Representing a super-pixel SkNormalized significant value, Sal 'against background information'C(Sk) Representing a super-pixel SkIs based on a normalized saliency value, lambda, of the contrastSal1Is a first weight coefficient, λSal2Is a second weight coefficient and satisfies lambdaSal1Sal2=1;
2.7) normalizing the final significant value of each super-pixel to a [0,255] interval, and performing gray assignment on the corresponding super-pixel based on the normalized final significant value to obtain a significant map of the current real-time image.
The step 2.3) of calculating the image edge dominant color of the current real-time image based on the edge superpixel set E specifically comprises the following steps:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
s2: quantized edge color L in a color histogram of an edge superpixel set EESatisfy LE∈[0,71]The current quantized edge color LENeighborhood color of { L }E-1,LE,LE+1 the sum of the percentages in the color histogram of the edge superpixel set E is taken as the current quantized edge color LEThe percentage of the color histogram in the edge super-pixel set E;
s3: repeating the step S2, and traversing and calculating allQuantized edge color LEThe percentage of the color histogram in the edge super-pixel set E;
s4: and deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, wherein the percentage of the quantized edge color in the color histogram of the edge super-pixel set E calculated and obtained in the step S3 is more than or equal to 20% as the image edge main color of the current real-time image.
The calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:
Figure BDA0003462810630000051
wherein, Sim (S)k) Representing a super-pixel SkThe similarity value of (a) is calculated,
Figure BDA0003462810630000052
representing a superpixel S in a current real-time imagekAverage color of LDC∈DDC,LDCRepresenting the dominant color in a descriptor of the dominant color of the reference image, DDCA primary color descriptor representing a reference image; | | denotes an absolute value taking operation, min () denotes a minimum value taking operation, thSimIs the similarity threshold.
The step 4) is specifically as follows:
4.1) judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final significant value of the current super pixel in the significant image of the current real-time image is larger than the initial significant value threshold, taking the current super pixel as an undetermined target area, otherwise, not determining the undetermined target area; traversing all super pixels, and forming an initial target area of the current real-time image by all the target areas to be determined;
4.2) carrying out target area expansion on the adjacent superpixel set of each superpixel of the initial target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, forming an expansion target area by all target areas to be expanded, and improving the current significant value threshold value;
4.3) carrying out target area expansion on the adjacent superpixel set of each superpixel of the current expanded target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, obtaining a new expanded target area and improving the current significant value threshold value;
4.4) repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all the expansion target areas to form a possible target area of the current real-time image.
The threshold calculation formula for increasing the current significant value threshold in the steps 4.2) and 4.3) is specifically as follows:
Figure BDA0003462810630000053
therein, thSalA threshold value representing the initial significance value is indicated,
Figure BDA0003462810630000054
representing the significance threshold after the t-th expansion.
The step 5) is specifically as follows:
5.1) the possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number accounting for more than the preset proportion of the total pixel number of the current real-time image are taken as primary screening target areas;
5.2) calculating texture feature descriptors and Fourier descriptors of all connected regions of the primary screening target region, respectively calculating first Euclidean distances between the texture feature descriptors of all connected regions of the primary screening target region and texture feature descriptors of the reference image and second Euclidean distances between the Fourier descriptors of all connected regions of the primary screening target region and Fourier descriptors of the reference image, and calculating the difference between each connected region of the primary screening target region and the reference image based on the first Euclidean distances and the second Euclidean distances of all connected regions of the primary screening target region, wherein the calculation formula is as follows:
Figure BDA0003462810630000061
wherein, Diff (R)q) Connected region R representing preliminary screening target regionqThe degree of difference from the reference image,
Figure BDA0003462810630000062
connected region R representing preliminary screening target regionqIs measured in a first euclidean distance of (c),
Figure BDA0003462810630000063
connected region R representing preliminary screening target regionqSecond Euclidean distance of (λ)DiffRepresents a third weight coefficient;
and 5.3) taking the connected region with the minimum difference as the final target region of the current real-time image.
Compared with the prior art, the invention has the following beneficial effects:
1. the method for simulating the human visual system to search the object is based on the visual attention mechanism, and searches possible target areas from two angles of bottom-up (significance detection) and top-down (color matching based on the main color descriptor), so that the accuracy of target area detection in a complex environment is improved.
2. The method performs target area fusion, expansion and screening on the basis of significance detection and color matching based on the main color descriptor, and then performs further screening by using the texture feature descriptor and the Fourier descriptor, thereby improving the detection accuracy.
3. The invention uses the bottom layer visual characteristic descriptor of the image to detect the target area, improves the detection and calculation efficiency of the target area and ensures the real-time performance on low-computing-power equipment.
Drawings
Fig. 1 is a flow chart of a target area detection algorithm based on a visual attention mechanism proposed by the present invention.
Fig. 2 is an example flow and result diagram of the target area detection algorithm based on visual attention mechanism proposed in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention will now be further described with reference to the following examples and drawings:
the implementation technical scheme of the invention is as follows:
as shown in fig. 1 and 2, the present invention includes the following steps:
1) calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of a reference image to respectively describe the color, the texture and the shape of the reference image, wherein the main color descriptor, the texture feature descriptor and the Fourier descriptor form a bottom layer visual feature descriptor; the reference image is the target object in the real-time image.
Extracting color, texture and shape features in a reference image based on pixels, wherein the step 1) specifically comprises the following steps:
for color features, a main color selection method based on a quantized HSV color space is provided, and the main color features of a reference image are described more accurately.
1.1) converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image by the following formula to obtain the color value of each pixel in the reference image in the quantized HSV color space;
Figure BDA0003462810630000071
Figure BDA0003462810630000072
Figure BDA0003462810630000073
L=9H+3S+V
wherein h, s, v represent hue, saturation and value components of the pixel in the HSV color space, H, S, V represent hue, saturation and value components of the pixel in the quantized HSV color space, respectively, and L represents a color value of the current pixel in the quantized HSV color space.
1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color (i.e. the quantized color with the highest percentage) as the first main color and recording the first main color as (L)1,p1) Wherein L is1Representing a first dominant color quantization value, p1Representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogramnSatisfies pn>0.5p1Then the quantized color is taken as the nth main color and is noted as (L)n,pn),LnRepresenting the nth dominant color quantization value until p is not satisfiedn>0.5p1Is quantized or the sum of the percentages of the main colors in the color histogram
Figure BDA0003462810630000081
Wherein N represents the number of dominant colors, the dominant colors being formed by a plurality of dominant colorsDescriptor DDC
DDC={(Ln,pn),i=1,2…N}
1.3) calculating a gray level co-occurrence matrix of the reference image, calculating corresponding angle second moment, inverse difference moment, entropy and contrast based on the gray level co-occurrence matrix and using the corresponding angle second moment, inverse difference moment, entropy and contrast as texture feature descriptors of the reference image;
1.4) obtaining a Fourier descriptor of the reference image after discrete Fourier transform and normalization of the complex form of the reference image.
2) Performing superpixel segmentation on a real-time image by using a Simple Linear Iterative Clustering (SLIC) algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the significant value of each superpixel by using a bottom-up data driving method, and forming a significant image of the current real-time image by using the significant value of each superpixel;
the step 2) is specifically as follows:
the bottom-up data driving method extracts bottom visual features of an image, such as color, texture, shape and the like, calculates contrast between image pixels or regions based on the features to obtain a saliency map of corresponding features, and fuses the saliency maps of the features to obtain a final saliency region.
2.1) performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the real-time image and corresponding label images, wherein a superpixel set S is formed by the plurality of superpixels, and the requirements of the superpixels are met
Figure BDA0003462810630000082
K represents the total number of superpixels, and K is 300 in the embodiment, SiRepresenting the ith superpixel in the superpixel set S, wherein i represents the superpixel serial number;
2.2) processing the super-pixel set based on the label graph to obtain the pixel set, the color characteristic and the position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each super-pixel in a CIELab color space and an average color in a quantized HSV color space; the position information is a geometric center of each super pixel and an adjacent super pixel set of each super pixel;
2.3) including edge pixels p of the real-time image in the superpixel seteAs an edge superpixel set E, satisfies
Figure BDA0003462810630000083
SuRepresenting the u-th superpixel, P, in the edge superpixel set EuRepresenting the pixel of the u-th superpixel in the edge superpixel set E. Calculating the image edge main color of the current real-time image based on the edge super pixel set E;
calculating the image edge main color of the current real-time image based on the edge super-pixel set E in the step 2.3), specifically:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
s2: quantized edge color L in a color histogram of an edge superpixel set EESatisfy LE∈[0,71]The current quantized edge color LENeighborhood color of { L }E-1,LE,LE+1 the sum of the percentages in the color histogram of the edge superpixel set E is taken as the current quantized edge color LEThe percentage of the color histogram in the edge super-pixel set E;
s3: repeating the step S2, and traversing and calculating all the quantized edge colors LEThe percentage of the color histogram in the edge super-pixel set E;
s4: deleting the quantized edge color in the color histogram which is the same as the main color in the main color descriptor, and taking the percentage of the quantized edge color which is obtained by calculation in S3 and accounts for more than or equal to 20 percent in the color histogram of the edge super-pixel set E as the image edge main color D of the current real-time imageEDCDescribing the background information of the real-time image by using the edge dominant color of the image;
Figure BDA0003462810630000091
wherein M is the number of colors of the dominant color of the edge of the image and satisfies M ∈ {0,1,2,3,4,5},
Figure BDA0003462810630000092
representing the mth quantized edge color of the image edge dominant colors,
Figure BDA0003462810630000093
the requirement that the current quantized edge color and the color close to the edge color are not the main color of the reference image is met, namely
Figure BDA0003462810630000094
Figure BDA0003462810630000095
2.4) calculating the contrast between each super pixel and each super pixel based on the pixel set of each super pixel, the average value of each color component of each super pixel in the CIELab color space and the geometric center of each super pixel; then, based on the average color of the current super-pixel in the quantized HSV color space, respectively calculating the significant value of the current super-pixel relative to the background information and the significant value based on the contrast by combining the contrast between the current super-pixel and each super-pixel and the main color of the image edge through the following formulas:
Figure BDA0003462810630000096
Figure BDA0003462810630000097
wherein, SalE(Sk) Representing a super-pixel SkRelative to the significant value of the background information, | | represents the operation of taking the absolute value, min represents the minimum valueOperation, SalC(Sk) Representing a super-pixel SkBased on the saliency value of the contrast of,
Figure BDA0003462810630000098
representing a super-pixel SkThe average color in the quantized HSV color space,
Figure BDA0003462810630000099
is the mth quantized edge color in the image edge dominant colors,
Figure BDA00034628106300000910
respectively representing super-pixels SkThe luminance in the CIELab color space and the average values of the first and second color channels,
Figure BDA00034628106300000911
respectively representing super-pixels SiLuminance in the CIELab color space and the average of the first and second color channels, the first color channel from dark green (low luminance value) to gray (medium luminance value) to bright pink red (high luminance value), the second color channel from bright blue (low luminance value) to gray (medium luminance value) to yellow (high luminance value),
Figure BDA0003462810630000101
representing a super-pixel SkThe coordinates of the geometric center of (a) in the current real-time image,
Figure BDA0003462810630000102
representing a super-pixel SiIs the coordinate of the geometric center of (b) in the current real-time image, λposThe coefficient is used for adjusting the influence of the spatial distance on the significant value based on the contrast, and i is the super pixel serial number in the super pixel set;
2.5) repeating the step 2.4), traversing the residual superpixels, and calculating and obtaining significant values of the residual superpixels relative to background information and significant values based on contrast;
2.6) respectively carrying out normalization processing and linear fusion on the significant value of each super pixel relative to the background information and the significant value based on the contrast to obtain the final significant value of each super pixel, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
wherein Sal (S)k) Representing a super-pixel SkOf Sal'E(Sk) Representing a super-pixel SkNormalized significant value, Sal 'against background information'C(Sk) Representing a super-pixel SkIs based on a normalized saliency value, lambda, of the contrastSal1Is a first weight coefficient, λSal2Is a second weight coefficient and satisfies lambdaSal1Sal21 is ═ 1; in this example, λ is selectedSal1=0.3,λSal2=0.7。
2.7) normalizing the final significant value of each super-pixel to a [0,255] interval, and performing gray assignment on the corresponding super-pixel based on the normalized final significant value to obtain a significant map of the current real-time image.
3) Aiming at the color matching problem, according to the average color of the main color descriptor and each superpixel of the reference image in the quantized HSV color space, calculating the similarity between each superpixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor, and obtaining the similarity value of each superpixel of the current real-time image; wherein, the super pixel with the similarity value of 1 is a region with the color similar to that of the reference image;
the calculation formula of the similarity value of each super pixel of the current real-time image in the step 3) is as follows:
Figure BDA0003462810630000103
wherein, Sim (S)k) Representing a super-pixel SkThe similarity value of (a) is calculated,
Figure BDA0003462810630000104
representing a superpixel S in a current real-time imagekAverage color of LDC∈DDC,LDCRepresenting the dominant color in a descriptor of the dominant color of the reference image, DDCA primary color descriptor representing a reference image; | | denotes an absolute value taking operation, min () denotes a minimum value taking operation, thSimIs the similarity threshold. When in use
Figure BDA0003462810630000105
And DDCWhen the minimum value of the difference values of all the main colors is less than or equal to the threshold value, the color is considered as SkBelonging to a target area of color matching. A binary image can be obtained through the similarity of the super pixels, and when the similarity value is 1, the binary image is displayed as white in the image, and when the similarity value is 0, the binary image is displayed as black in the image. The white part in the image is the color matching region. In this example, get thSim=2。
4) Aiming at the fusion problem of the saliency map and the color matching region, according to the similarity value of each super pixel of the current real-time image and the adjacent super pixel set of each super pixel, carrying out region expansion on the saliency map of the current real-time image by using a region fusion expansion method based on the similarity and the saliency value, and then obtaining a possible target region of the current real-time image;
the step 4) is specifically as follows:
4.1) judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final significant value of the current super pixel in the significant map of the current real-time image is greater than the initial significant value threshold, in the embodiment, the initial significant value threshold is 0.2. Taking the current super pixel as an undetermined target area, otherwise, not determining the current super pixel as the undetermined target area; traversing all super pixels, and forming an initial target area of the current real-time image by all the target areas to be determined; initial target area
Figure BDA0003462810630000111
Can be expressed as:
Figure BDA0003462810630000112
wherein, Sim (S)j) Representing a super-pixel SjSimilarity value of (2), Sim (S)j) A super-pixel S is describedjDegree of similarity to the reference image, Sal (S)j) Representing a super-pixel SjOf Sal (S)j) A super-pixel S is describedjS which is considered to be similar to the target object and has a higher significance degree in the real-time imagejBelonging to the target area to be sought, thSalTo determine whether the final saliency value meets the required threshold.
4.2) the initial target area usually cannot contain the whole target object, and is expanded based on the saliency map. Carrying out target area expansion on an adjacent super-pixel set of each super-pixel of the initial target area, if the final significant value of the current adjacent super-pixel in a significant image of the current real-time image is greater than a current significant value threshold value, enabling the current adjacent super-pixel to belong to a target area to be expanded, traversing the adjacent super-pixel set of each super-pixel of the initial target area, forming an expanded target area by all target areas to be expanded, and improving the current significant value threshold value;
4.3) carrying out target area expansion on the adjacent superpixel set of each superpixel of the current expanded target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, obtaining a new expanded target area and improving the current significant value threshold value;
the threshold calculation formula for increasing the current significant value threshold in the steps 4.2) and 4.3) is specifically as follows:
Figure BDA0003462810630000113
therein, thSalA threshold value representing the initial significance value is indicated,
Figure BDA0003462810630000114
after the t-th expansionA significance threshold.
4.4) repeating the step 4.3) until no new expansion target area is generated, fusing the initial target area and all the expansion target areas to form a possible target area R 'of the current real-time image'TSatisfy the following requirements
Figure BDA0003462810630000121
Wherein
Figure BDA0003462810630000122
The initial target area is represented by a representation of,
Figure BDA0003462810630000123
and (3) showing an expansion target area in the nth expansion.
5) And carrying out preliminary screening on a possible target region of the current real-time image based on the region area, and further carrying out regional screening on the preliminary screening target region by utilizing a textural feature descriptor and a Fourier descriptor of a reference image to obtain a final target region of the current real-time image.
The step 5) is specifically as follows:
5.1) the possible target area of the current real-time image consists of a plurality of connected areas, the size of one connected area is represented by the number of pixels in the connected area, the number of pixels in each connected area is calculated, and the connected area with the number of pixels more than the preset proportion of the total number of pixels of the current real-time image is taken as a primary screening target area; in a specific implementation, the predetermined ratio is 20%.
5.2) calculating the texture feature descriptor and the Fourier descriptor of each connected region of the preliminary screening target region by the method for calculating the texture feature descriptor and the Fourier descriptor in the step 1), respectively calculating a first Euclidean distance between the texture feature descriptor and the texture feature descriptor of the reference image of each connected region of the preliminary screening target region and a second Euclidean distance between the Fourier descriptor and the Fourier descriptor of the reference image of each connected region of the preliminary screening target region, and calculating the difference between each connected region of the preliminary screening target region and the reference image based on the first Euclidean distance and the second Euclidean distance of each connected region of the preliminary screening target region, wherein the calculation formula is as follows:
Figure BDA0003462810630000124
wherein, Diff (R)q) Connected region R representing preliminary screening target regionqThe degree of difference from the reference image,
Figure BDA0003462810630000125
connected region R representing preliminary screening target regionqIs measured in a first euclidean distance of (c),
Figure BDA0003462810630000126
connected region R representing preliminary screening target regionqSecond Euclidean distance of (λ)DiffRepresents a third weight coefficient for adjusting the front and rear specific gravities, in this embodiment, λ is takenDiff=0.5;
5.3) taking the connected region with the minimum difference as the final target region of the current real-time image
Figure BDA0003462810630000127
The above-mentioned contents are only technical ideas of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical ideas proposed by the present invention fall within the protection scope of the claims of the present invention.

Claims (8)

1. An image target area detection method based on a visual attention mechanism is characterized by comprising the following steps:
1) calculating a main color descriptor, a texture feature descriptor and a Fourier descriptor of a reference image;
2) performing superpixel segmentation on a real-time image by using a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the current real-time image, performing regional description on current real-time image information based on each superpixel to obtain the average color of each superpixel in a quantized HSV color space and an adjacent superpixel set of each superpixel, calculating the significant value of each superpixel by using a bottom-up data driving method, and forming a significant image of the current real-time image by using the significant value of each superpixel;
3) calculating the similarity between each super pixel of the current real-time image and the reference image by using a fuzzy matching method based on the main color descriptor according to the average color of the main color descriptor and each super pixel of the reference image in the quantized HSV color space, and obtaining the similarity value of each super pixel of the current real-time image;
4) according to the similarity value of each super pixel of the current real-time image and an adjacent super pixel set of each super pixel, carrying out region expansion on a saliency map of the current real-time image by using a region fusion expansion method based on the similarity and the saliency value to obtain a possible target region of the current real-time image;
5) and carrying out preliminary screening on a possible target region of the current real-time image based on the region area, and further carrying out regional screening on the preliminary screening target region by utilizing a textural feature descriptor and a Fourier descriptor of a reference image to obtain a final target region of the current real-time image.
2. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 1) is specifically as follows:
1.1) converting the color of the reference image from an RGB color space to an HSV color space, and quantizing the HSV color space of the reference image to obtain the color value of each pixel in the reference image in the quantized HSV color space;
1.2) calculating a color histogram of the reference image based on color values of all pixels in the reference image in a quantized HSV color space, calculating the percentage of each quantized color according to the color histogram and arranging the quantized colors in a descending order, taking the first quantized color as a first main color and marking the first quantized color as (L)1,p1) Wherein L is1Representing a first dominant color quantized value of the color,p1representing the percentage of the first main color in the color histogram, sequentially selecting the nth quantized color, and when the selected nth quantized color accounts for the percentage p in the color histogramnSatisfies pn>0.5p1Then the quantized color is taken as the nth main color and is noted as (L)n,pn),LnRepresenting the nth dominant color quantization value until p is not satisfiedn>0.5p1Is quantized or the sum of the percentages of the main colors in the color histogram
Figure FDA0003462810620000021
Wherein N represents the number of dominant colors, and the multiple dominant colors constitute a dominant color descriptor DDC
1.3) calculating a gray level co-occurrence matrix of the reference image, calculating corresponding angle second moment, inverse difference moment, entropy and contrast based on the gray level co-occurrence matrix and using the corresponding angle second moment, inverse difference moment, entropy and contrast as texture feature descriptors of the reference image;
1.4) obtaining a Fourier descriptor of the reference image after discrete Fourier transform and normalization of the complex form of the reference image.
3. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 2) is specifically as follows:
2.1) performing superpixel segmentation on the real-time image through a simple linear iterative clustering algorithm to obtain a plurality of superpixels of the real-time image and corresponding label images, and forming a superpixel set by the plurality of superpixels;
2.2) processing the super-pixel set based on the label graph to obtain the pixel set, the color characteristic and the position information of each super-pixel in the super-pixel set; wherein the color features are an average value of each color component of each super-pixel in a CIELab color space and an average color in a quantized HSV color space; the position information is a geometric center of each super pixel and an adjacent super pixel set of each super pixel;
2.3) including edges of real-time images in the superpixel setPixel peThe plurality of super pixels are used as an edge super pixel set E, and the image edge main color of the current real-time image is calculated based on the edge super pixel set E;
2.4) calculating the contrast between each super pixel and each super pixel based on the pixel set of each super pixel, the average value of each color component of each super pixel in the CIELab color space and the geometric center of each super pixel; then, based on the average color of the current super-pixel in the quantized HSV color space, respectively calculating the significant value of the current super-pixel relative to the background information and the significant value based on the contrast by combining the contrast between the current super-pixel and each super-pixel and the main color of the image edge through the following formulas:
Figure FDA0003462810620000022
Figure FDA0003462810620000023
wherein, SalE(Sk) Representing a super-pixel SkWith respect to the significant value of the background information, | | represents an absolute value taking operation, min represents a minimum value taking operation, SalC(Sk) Representing a super-pixel SkBased on the saliency value of the contrast of,
Figure FDA0003462810620000024
representing a super-pixel SkThe average color in the quantized HSV color space,
Figure FDA0003462810620000031
is the mth quantized edge color in the image edge dominant colors,
Figure FDA0003462810620000032
respectively representing super-pixels SkThe luminance in the CIELab color space and the average values of the first and second color channels,
Figure FDA0003462810620000033
respectively representing super-pixels SiThe luminance in the CIELab color space and the average values of the first and second color channels,
Figure FDA0003462810620000034
representing a super-pixel SkThe coordinates of the geometric center of (a) in the current real-time image,
Figure FDA0003462810620000035
representing a super-pixel SiIs the coordinate of the geometric center of (b) in the current real-time image, λposThe coefficient is used for adjusting the influence of the spatial distance on the significant value based on the contrast, and i is the super pixel serial number in the super pixel set;
2.5) repeating the step 2.4), traversing the residual superpixels, and calculating and obtaining significant values of the residual superpixels relative to background information and significant values based on contrast;
2.6) respectively carrying out normalization processing and linear fusion on the significant value of each super pixel relative to the background information and the significant value based on the contrast to obtain the final significant value of each super pixel, wherein the calculation formula is as follows:
Sal(Sk)=λSal1Sal'E(Sk)+λSal2Sal'C(Sk)
wherein Sal (S)k) Representing a super-pixel SkOf Sal'E(Sk) Representing a super-pixel SkNormalized significant value, Sal 'against background information'C(Sk) Representing a super-pixel SkIs based on a normalized saliency value, lambda, of the contrastSal1Is a first weight coefficient, λSal2Is a second weight coefficient and satisfies lambdaSal1Sal2=1;
2.7) normalizing the final significant value of each super-pixel to a [0,255] interval, and performing gray assignment on the corresponding super-pixel based on the normalized final significant value to obtain a significant map of the current real-time image.
4. The method according to claim 3, wherein the step 2.3) of calculating the dominant color of the image edge of the current real-time image based on the edge superpixel set E specifically comprises:
s1: calculating a color histogram of the edge superpixel set E based on the quantized HSV color space;
s2: quantized edge color L in a color histogram of an edge superpixel set EESatisfy LE∈[0,71]The current quantized edge color LENeighborhood color of { L }E-1,LE,LE+1 the sum of the percentages in the color histogram of the edge superpixel set E is taken as the current quantized edge color LEThe percentage of the color histogram in the edge super-pixel set E;
s3: repeating the step S2, and traversing and calculating all the quantized edge colors LEThe percentage of the color histogram in the edge super-pixel set E;
s4: and deleting the quantized edge color which is the same as the main color in the main color descriptor in the color histogram, wherein the percentage of the quantized edge color in the color histogram of the edge super-pixel set E calculated and obtained in the step S3 is more than or equal to 20% as the image edge main color of the current real-time image.
5. The method according to claim 1, wherein the similarity value of each super pixel of the current real-time image in step 3) is calculated as follows:
Figure FDA0003462810620000041
wherein, Sim (S)k) Representing a super-pixel SkThe similarity value of (a) is calculated,
Figure FDA0003462810620000042
representing a superpixel S in a current real-time imagekAverage color of LDC∈DDC,LDCRepresenting the dominant color in a descriptor of the dominant color of the reference image, DDCA primary color descriptor representing a reference image; | | denotes an absolute value taking operation, MiN () denotes a minimum value taking operation, thSimIs the similarity threshold.
6. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 4) is specifically as follows:
4.1) judging whether each super pixel of the current real-time image is a target area, if the similarity value of each super pixel is 1 and the final significant value of the current super pixel in the significant image of the current real-time image is larger than the initial significant value threshold, taking the current super pixel as an undetermined target area, otherwise, not determining the undetermined target area; traversing all super pixels, and forming an initial target area of the current real-time image by all the target areas to be determined;
4.2) carrying out target area expansion on the adjacent superpixel set of each superpixel of the initial target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, forming an expansion target area by all target areas to be expanded, and improving the current significant value threshold value;
4.3) carrying out target area expansion on the adjacent superpixel set of each superpixel of the current expanded target area, if the final significant value of the current adjacent superpixel in the significant image of the current real-time image is greater than the current significant value threshold value, the current adjacent superpixel belongs to the target area to be expanded, traversing the adjacent superpixel set of each superpixel of the initial target area, obtaining a new expanded target area and improving the current significant value threshold value;
4.4) repeating the step 4.3) until no new expansion target area is generated, and fusing the initial target area and all the expansion target areas to form a possible target area of the current real-time image.
7. The method for detecting image target area based on visual attention mechanism according to claim 6, wherein the threshold calculation formula for increasing the current saliency value threshold in steps 4.2) and 4.3) is specifically:
Figure FDA0003462810620000043
therein, thSalA threshold value representing the initial significance value is indicated,
Figure FDA0003462810620000051
representing the significance threshold after the t-th expansion.
8. The method for detecting the image target area based on the visual attention mechanism according to claim 1, wherein the step 5) is specifically as follows:
5.1) the possible target area of the current real-time image consists of a plurality of connected areas, the pixel number of each connected area is calculated, and the connected areas with the pixel number accounting for more than the preset proportion of the total pixel number of the current real-time image are taken as primary screening target areas;
5.2) calculating texture feature descriptors and Fourier descriptors of all connected regions of the primary screening target region, respectively calculating first Euclidean distances between the texture feature descriptors of all connected regions of the primary screening target region and texture feature descriptors of the reference image and second Euclidean distances between the Fourier descriptors of all connected regions of the primary screening target region and Fourier descriptors of the reference image, and calculating the difference between each connected region of the primary screening target region and the reference image based on the first Euclidean distances and the second Euclidean distances of all connected regions of the primary screening target region, wherein the calculation formula is as follows:
Figure FDA0003462810620000052
wherein, Diff (R)q) Connected region R representing preliminary screening target regionqThe degree of difference from the reference image,
Figure FDA0003462810620000053
connected region R representing preliminary screening target regionqIs measured in a first euclidean distance of (c),
Figure FDA0003462810620000054
connected region R representing preliminary screening target regionqSecond Euclidean distance of (λ)DiffRepresents a third weight coefficient;
and 5.3) taking the connected region with the minimum difference as the final target region of the current real-time image.
CN202210021568.6A 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism Pending CN114359323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021568.6A CN114359323A (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021568.6A CN114359323A (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Publications (1)

Publication Number Publication Date
CN114359323A true CN114359323A (en) 2022-04-15

Family

ID=81109197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021568.6A Pending CN114359323A (en) 2022-01-10 2022-01-10 Image target area detection method based on visual attention mechanism

Country Status (1)

Country Link
CN (1) CN114359323A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439688A (en) * 2022-09-01 2022-12-06 哈尔滨工业大学 Weak supervision object detection method based on surrounding area perception and association
CN116645368A (en) * 2023-07-27 2023-08-25 青岛伟东包装有限公司 Online visual detection method for edge curl of casting film
CN117173175A (en) * 2023-11-02 2023-12-05 湖南格尔智慧科技有限公司 Image similarity detection method based on super pixels

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115439688A (en) * 2022-09-01 2022-12-06 哈尔滨工业大学 Weak supervision object detection method based on surrounding area perception and association
CN115439688B (en) * 2022-09-01 2023-06-16 哈尔滨工业大学 Weak supervision object detection method based on surrounding area sensing and association
CN116645368A (en) * 2023-07-27 2023-08-25 青岛伟东包装有限公司 Online visual detection method for edge curl of casting film
CN116645368B (en) * 2023-07-27 2023-10-03 青岛伟东包装有限公司 Online visual detection method for edge curl of casting film
CN117173175A (en) * 2023-11-02 2023-12-05 湖南格尔智慧科技有限公司 Image similarity detection method based on super pixels
CN117173175B (en) * 2023-11-02 2024-02-09 湖南格尔智慧科技有限公司 Image similarity detection method based on super pixels

Similar Documents

Publication Publication Date Title
Wang et al. Fuzzy-based algorithm for color recognition of license plates
Soriano et al. Adaptive skin color modeling using the skin locus for selecting training pixels
CN108537239B (en) Method for detecting image saliency target
CN114359323A (en) Image target area detection method based on visual attention mechanism
CN106023257B (en) A kind of method for tracking target based on rotor wing unmanned aerial vehicle platform
CN111666834A (en) Forest fire automatic monitoring and recognizing system and method based on image recognition technology
US7606414B2 (en) Fusion of color space data to extract dominant color
CN106960182B (en) A kind of pedestrian's recognition methods again integrated based on multiple features
JP4098021B2 (en) Scene identification method, apparatus, and program
Almogdady et al. A flower recognition system based on image processing and neural networks
Mythili et al. Color image segmentation using ERKFCM
CN103366178A (en) Method and device for carrying out color classification on target image
Feng et al. A color image segmentation method based on region salient color and fuzzy c-means algorithm
CN111310768B (en) Saliency target detection method based on robustness background prior and global information
Chen et al. Adaptive fuzzy color segmentation with neural network for road detections
CN105678318A (en) Traffic label matching method and apparatus
CN113159043A (en) Feature point matching method and system based on semantic information
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
Tian et al. Outdoor shadow detection by combining tricolor attenuation and intensity
CN110910497B (en) Method and system for realizing augmented reality map
Wang Image matting with transductive inference
Hu et al. Automatic spectral video matting
CN114820707A (en) Calculation method for camera target automatic tracking
CN114581475A (en) Laser stripe segmentation method based on multi-scale saliency features
KhabiriKhatiri et al. Road Traffic Sign Detection and Recognition using Adaptive Color Segmentation and Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 316021 Zhoushan campus of Zhejiang University, No.1 Zheda Road, Dinghai District, Zhoushan City, Zhejiang Province

Applicant after: ZHEJIANG University

Address before: 310058 Yuhang Tang Road, Xihu District, Hangzhou, Zhejiang 866

Applicant before: ZHEJIANG University

Country or region before: China