WO2020211522A1 - 图像的显著性区域的检测方法和装置 - Google Patents
图像的显著性区域的检测方法和装置 Download PDFInfo
- Publication number
- WO2020211522A1 WO2020211522A1 PCT/CN2020/076000 CN2020076000W WO2020211522A1 WO 2020211522 A1 WO2020211522 A1 WO 2020211522A1 CN 2020076000 W CN2020076000 W CN 2020076000W WO 2020211522 A1 WO2020211522 A1 WO 2020211522A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature
- image
- brightness
- pixel
- map
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
Definitions
- the present disclosure relates to the field of image processing technology, and in particular to a method and device for detecting a salient region of an image, a computer device, and a computer-readable storage medium.
- the saliency of an image is an important visual feature of an image, and it reflects the degree to which the human eye attaches importance to certain areas of the image.
- a saliency detection algorithm it is often necessary to use a saliency detection algorithm to detect the image to obtain the saliency area of the image. It is mainly used in mobile phone camera software, target detection software, and image compression software.
- one way to obtain the saliency area of an image is to detect the saliency area of the image based on a pure mathematical calculation method.
- the detection accuracy of the saliency area is not high in this method, which is different from the human eye perception.
- Another way to obtain the saliency area of the image is to detect the saliency area of the image based on the deep learning method.
- this method is related to the selected training samples, requires high hardware, and has poor real-time effects. Therefore, how to improve the speed and accuracy of image saliency region detection is a technical problem to be solved urgently.
- the first aspect of the embodiments of the present disclosure provides a method for detecting a salient area of an image, including:
- the saliency area is obtained.
- the detection method further includes:
- the obtaining the saliency area according to the saliency map includes: obtaining the saliency area according to the binary image.
- the acquiring the saliency region according to the binary image includes:
- the connected domain is regarded as a saliency area.
- performing connected domain labeling on pixels of the binary image, and combining pixels with the same connected domain label into one connected domain includes:
- the gray value of the pixel is 1 and it is not marked, it is used as a seed pixel, and the connected domain label of the seed pixel is marked as N, and the 8 neighboring pixels of the seed pixel are traversed to determine 8 of the seed pixel. Whether the gray value of the unmarked pixel in the neighborhood pixel is 1;
- the gray value of at least one unmarked pixel in the 8-neighbor pixels of the seed pixel is 1, then the connected domain label of the unmarked pixel in the 8-neighbor pixel of the seed pixel and the gray value of 1 is marked as N, and use all unmarked pixels with a gray value of 1 in the 8 neighborhood pixels as seed pixels;
- the 8 neighborhood pixels of the pixel with coordinates (x, y) have coordinates (x-1, y-1), (x-1, y), (x-1, y+1), (x , Y-1), (x, y+1), (x+1, y-1), (x+1, y), (x+1, y+1) pixels;
- x is the pixel in The number of rows in the binary image, and
- y is the number of columns of the pixel in the binary image.
- the extracting the brightness feature of the image and obtaining at least two first brightness feature maps by using the image pyramid and the first multi-scale calculation relational expression includes:
- the three primary color components of each pixel in the image use Calculate the brightness feature corresponding to each pixel to obtain the brightness feature middle map, where the three primary color components of each pixel include red component, green component and blue component, R is red component, G is green component, and B is blue Component, I is the brightness feature;
- the extracting the hue feature of the image and using the image pyramid and the second multi-scale calculation relational expression to obtain at least two first hue feature maps includes:
- the hue feature corresponding to each pixel in the image use Calculate the hue feature corresponding to each pixel to obtain the hue feature intermediate image, where the three primary color components of each pixel include red, green and blue, R is the red component, G is the green component, and B is the blue Component, H is the hue feature;
- H(c,s)
- the at least two first brightness feature maps are respectively normalized, and the at least two first brightness feature maps after the normalization process are merged into a second brightness feature map ,include:
- For each first brightness feature map traverse the brightness features of the first brightness feature map to obtain the first brightness feature maximum value I 1max and the first brightness feature minimum value I 1min of the first brightness feature map, according to the formula Normalize the brightness feature of the first brightness feature map to between 0 and P, where I represents the brightness feature value of the pixel in the first brightness feature map;
- the at least two first brightness feature maps after the normalization process are fused into a second brightness feature map in a weighted average manner.
- the at least two first tone feature maps are respectively normalized, and the at least two first tone feature maps after the normalization process are merged into a second tone feature Figures, including:
- each first hue feature map traverse the hue features of the first hue feature map to obtain the first hue feature maximum value H 1max and the first hue feature minimum value H 1min of the first hue feature map, according to the formula Normalize the hue feature of the first hue feature map to between 0 and P, and H represents the hue feature value of the pixel of the first hue feature map;
- the maximum value of the 8-neighbor tone feature and the 8 neighborhoods are obtained.
- the at least two first tone feature maps after all the normalization processes are merged into a second tone feature map through a weighted average manner.
- a second aspect of the embodiments of the present disclosure provides a computer device, including a storage unit and a processing unit, wherein the storage unit stores a computer program that can run on the processing unit; the processing unit executes the computer program When realizing the above-mentioned method of detecting the saliency area of the image.
- a third aspect of the embodiments of the present disclosure provides a computer-readable storage medium that stores a computer program, where the computer program is executed by a processor to realize the above-mentioned method for detecting a salient region of an image.
- a fourth aspect of the embodiments of the present disclosure provides a device for detecting a salient region of an image, including:
- the extraction module is configured to extract the brightness feature of the image, and obtain at least two first brightness feature maps by using the image pyramid and the first multi-scale calculation relation, and the extraction module is further configured to extract the tone feature of the image , And use the image pyramid and the second multi-scale calculation relationship to obtain at least two first tone feature maps;
- the fusion module is configured to perform normalization processing on the at least two first brightness feature maps respectively, and fuse the at least two first brightness feature maps after the normalization processing into a second brightness feature map, so The fusion module is further configured to perform normalization processing on the at least two first tone feature maps, respectively, and fuse the at least two first tone feature maps after the normalization processing into a second tone feature map , The fusion module is further configured to merge the second brightness feature map and the second hue feature map into a saliency map; and
- the acquiring module is configured to acquire the saliency area according to the saliency map.
- the acquisition module is further configured to use an adaptive threshold binarization algorithm to perform binarization processing on the saliency map to obtain a binary image; and obtain the saliency region according to the binary image .
- the acquisition module is further configured to mark pixels of the binary image with connected domains, merge all pixels with the same connected domain label into one connected domain; and use the connected domain as a saliency area.
- the extraction module is further configured to:
- the three primary color components of each pixel in the image use Calculate the brightness feature corresponding to each pixel to obtain the brightness feature middle map, where the three primary color components of each pixel include red component, green component and blue component, R is red component, G is green component, and B is blue Component, I is the brightness feature;
- the extraction module is further configured:
- the hue feature corresponding to each pixel is calculated, and the hue feature intermediate image is obtained.
- the three primary color components of each pixel include red, green and blue components, R is the red component, G is the green component, and B is the blue component , H is the color feature;
- H(c,s)
- FIG. 1 is a flowchart of a method for detecting a salient region of an image provided by an embodiment of the disclosure
- FIG. 2 is a flowchart of another method for detecting a salient region of an image provided by an embodiment of the disclosure
- FIG. 3 is a flowchart of another method for detecting a salient region of an image provided by an embodiment of the disclosure
- FIG. 4 is a flowchart of another method for detecting a salient region of an image provided by an embodiment of the disclosure.
- FIG. 5 is a flowchart of another method for detecting a salient region of an image provided by an embodiment of the disclosure.
- FIG. 6 is a flowchart of another method for detecting a salient region of an image provided by an embodiment of the disclosure.
- FIG. 7 is a flowchart of yet another method for detecting a salient region of an image provided by an embodiment of the disclosure.
- FIG. 8 is a flowchart of yet another method for detecting a salient region of an image provided by an embodiment of the disclosure.
- FIG. 9 is a schematic diagram of modules of an apparatus for detecting a saliency region of an image provided by an embodiment of the disclosure.
- FIG. 10 is a block diagram of a computer device provided by an embodiment of the disclosure.
- the embodiment of the present disclosure provides a method for detecting a salient region of an image, as shown in FIG. 1, including:
- brightness refers to the perception of the brightness of the light source or object by the human eye.
- the brightness feature can show the brightness of the color.
- the brightness information of the color of each pixel of the image can be obtained, that is, the brightness feature of each pixel of the image can be obtained.
- An effective and simple conceptual structure for representing images at multiple resolutions is the image pyramid.
- the image pyramid was originally used for machine vision and image compression.
- An image pyramid is a series of image collections arranged in a pyramid shape and gradually reduced in resolution.
- the Gaussian pyramid is essentially a multi-scale representation of an image, that is, the same image is Gaussian blurring multiple times and down-sampled to produce different scales Download multiple images for subsequent processing.
- an image from which the brightness feature is extracted is Gaussian blurring based on the Gaussian pyramid multiple times, and down-sampled to generate multiple images with brightness feature information at different scales, and then proceed according to the first multi-scale calculation relationship Operate to obtain at least two first brightness feature maps.
- the first multi-scale calculation relational expression is obtained, for example, according to the central peripheral difference calculation method, and is used to calculate the contrast information in a plurality of images with brightness feature information at different scales generated by the Gaussian pyramid.
- the Gaussian Pyramid contains a series of Gaussian filters whose cut-off frequency gradually increases by a factor of 2 from the upper layer to the next.
- the hue is the comprehensive effect of the various spectral components reflected by the object under sunlight on the human eye, that is, the type of color.
- the color of the same hue refers to a series of colors with similar combination ratios of the three primary colors in the color components, which show a vivid color tendency in appearance. For example, vermilion, scarlet, and pink are all red hues.
- hue features can be used to describe color attributes, such as yellow, orange, or red.
- the color attribute information of each pixel of the image can be obtained, that is, the hue feature of each pixel of the image can be obtained.
- the image pyramid is essentially the same as the image pyramid used to obtain the first brightness feature map, and is a multi-scale representation of the image.
- an image from which tonal features can be extracted can be Gaussian blurring multiple times and down-sampled to generate multiple images with tonal feature information at different scales, and then calculate the relationship according to the second multi-scale Perform calculations to obtain at least two first tone feature maps.
- the second multi-scale calculation relational expression is also obtained according to the central peripheral difference calculation method, and is used to calculate the contrast information in multiple images with tone feature information at different scales generated by the Gaussian pyramid.
- S30 Perform normalization processing on the at least two first brightness feature maps, respectively, and merge the at least two first brightness feature maps after the normalization process into a second brightness feature map.
- the normalization process is to unify the brightness characteristics of each first brightness feature map to the same order of magnitude to provide higher-precision data information for subsequent processing.
- S40 Perform normalization processing on at least two first tone feature maps, and merge the at least two first tone feature maps after the normalization processing into a second tone feature map.
- the normalization process is to unify the tone characteristics of each first tone feature map to the same order of magnitude, and provide higher-precision data information for subsequent processing.
- first tone feature maps when two or more first tone feature maps are obtained according to S20, an interpolation operation needs to be performed first, and the first tone feature map with a smaller scale is enlarged by the operation to make it equal to the first tone with a larger scale.
- the feature maps have the same scale, and then merge to form a second tone feature map.
- the second brightness feature map and the second hue feature map can be fused by a weighted average method, that is, the second brightness feature map and the second hue feature map are sequentially combined with the pixels at the positions corresponding to the same column in the same row.
- the brightness feature and the hue feature are weighted and averaged, so that the second brightness feature map and the second hue feature map are fused into a saliency map.
- the weight value in the weighted average method can be set as required, which is not limited in the present disclosure.
- the brightness characteristics of the pixels in the first row and the first column of the second brightness feature map are directly added to the tone characteristics of the pixels in the first row and the first column of the second brightness feature map, and then Divide by 2, and the result of the calculation is the information of the pixels in the first row and first column of the saliency map.
- Other pixels can be deduced by analogy, so that they can be merged into a saliency map.
- the visual attention mechanism of the human visual system allows people to gradually exclude relatively unimportant information from complex scenes, select important and necessary information as the target of attention, and prioritize it. It can be seen from the above description that the present disclosure is based on the principle of simulating the human visual attention mechanism, excludes other information from the image, specifically selects the brightness feature and hue feature for extraction, and normalizes the first brightness feature map After the transformation processing, the second brightness feature map is fused into the second brightness feature map. The first tone feature map is normalized and then fused into the second tone feature map. Then the second brightness feature map and the second tone feature map are fused into a saliency map. The secondary processing and fusion make the obtained saliency map more approximate to the target that the human eye pays attention to.
- the area that causes visual contrast in the saliency map, the target that attracts attention is called the salience region.
- the salient area can also be called the foreground, and the remaining areas are called the background.
- the use of the saliency area can eliminate the interference of other background areas in the saliency map and directly approach the user's detection intention, which is beneficial to the improvement of detection performance.
- the embodiments of the present disclosure provide a method for detecting the saliency region of an image.
- the first brightness feature map is generated by using the image pyramid and the first multi-scale calculation relationship
- the first hue feature map is generated using the image pyramid and the second multi-scale calculation relationship
- the first brightness feature map is performed
- the first color feature map is normalized and then fused into the second color feature map
- the second brightness feature map and the second color feature map are fused into a saliency map.
- the saliency area can be extracted from the saliency map.
- the brightness feature of the image is extracted, and at least two first brightness feature maps are obtained by using the image pyramid and the first multi-scale calculation relationship, as shown in FIG. 2, including:
- the three primary color components of each pixel in the image The brightness feature corresponding to each pixel is calculated, and the brightness feature middle image is obtained.
- the three primary color components of each pixel include red, green and blue components.
- R is the red component
- G is the green component
- B is the blue component.
- I is the brightness characteristic.
- the RGB color space is not easy to describe the image, and there is no way to correctly express the true difference between the colors perceived by the human eye. Instead, use brightness, hue and saturation.
- the HIS color space is more consistent with the human eye's perception of colors. Therefore, the RGB color space can be non-linearly transformed into the HIS color space, which can describe images more naturally and intuitively.
- HIS color space uses H (Hue), I (Intensity), and S (Saturation) to represent hue, brightness and saturation.
- RGB color space According to the conversion of RGB color space into HIS color space, the relationship between red component, green component and blue component and brightness It can be seen that if you give an image in RGB color format, you can use the formula Calculate the brightness characteristics of each pixel.
- the brightness characteristics of all pixels constitute the middle image of the brightness characteristics.
- S102 Input the brightness feature middle image into the image pyramid to obtain M scale sub-brightness feature middle images; where the image pyramid is a Gaussian pyramid with M layers, and M ⁇ 7.
- the brightness feature intermediate image is input into a Gaussian pyramid, and a series of Gaussian filters are used to filter and sample the brightness feature intermediate image.
- the 0th layer is the original image of the luminance feature intermediate image, and the size remains unchanged.
- the large-scale luminance feature intermediate image of the 0th layer is convolved with the Gaussian filter to obtain the small-scale luminance feature intermediate image of the first layer.
- the other layers are followed by analogy.
- the size of the Gaussian kernel of the Gaussian filter determines the degree of blurring of the image. The smaller the Gaussian kernel, the lighter the blurring is, and the larger the Gaussian kernel is, the more serious the blurring.
- the size of the Gaussian kernel can be selected as required. For example, a Gaussian kernel of 5 by 5 pixels may be used to filter and sample the luminance feature intermediate image.
- the more Gaussian filters included in the Gaussian pyramid, the more levels of the Gaussian pyramid, and the more intermediate images of brightness features of different scales are obtained.
- the higher the Gaussian pyramid level the smaller the scale of the corresponding brightness feature middle image and the lower the resolution.
- the number of levels of the Gaussian pyramid needs to be determined according to the size of the input brightness feature middle image. The larger the input brightness feature middle image size, the more the correspondingly set Gaussian pyramid levels, the smaller the input brightness feature middle image size, and the corresponding settings The lower the level of the Gaussian pyramid.
- the following provides a method for inputting the brightness feature intermediate image into a Gaussian pyramid to obtain the sub-luminance feature intermediate image of M scales to clearly describe the implementation process.
- the level of the Gaussian pyramid is determined to be 9 layers, that is, the 0th to the 8th layer.
- the 0th layer is the original image of the brightness feature middle image, and the size is unchanged.
- the first layer First, the original image of the middle image of the brightness feature of the 0th layer is doubled, and the Gaussian filter is used to filter and sample it, so that its length and width are reduced by one time, and the image area becomes 1/4 times, thus Make its size become 1/2 times of the original image of the brightness feature middle image.
- the second layer Firstly, the original image of the half-time brightness feature intermediate image obtained in the first layer is doubled, and the Gaussian filter is used to filter and sample it, so that the length and width are respectively shortened, and the image area becomes 1/4 times, so that its size becomes 1/4 times of the original image of the brightness feature middle image.
- the process from the 3rd layer to the 8th layer is repeated in turn with reference to the above method, so as to obtain 1/8 times, 1/16 times, 1/32 times, 1/64 times, 1/128 times, and the original image of the brightness feature intermediate image.
- the middle image of the 6 sub-brightness features 1/256 times.
- 9 scales of sub-luminance feature middle images can be obtained, which are 1 times, 1/2 times, 1/4 times, 1/8 times, 1/16 times, 1/32 times of the input brightness characteristic middle images. , 1/64 times, 1/128 times, 1/256 times.
- I(c,s)
- the calculation method of the first multi-scale calculation relation I(c,s)
- is called the center-peripheral difference calculation, which is designed according to the physiological structure of the human eye and is used for Calculate the contrast information in the image I(c,s).
- the human eye's receptive field reacts strongly to the features with high contrast in the input of visual information, such as the situation where the center is bright and the periphery is dark, which belongs to the visual information with large contrast.
- the sub-luminance feature intermediate image with a larger scale has more detailed information, while the smaller-scale sub-luminance feature intermediate image can reflect the local background information better due to filtering and sampling operations. Therefore, the scale is larger.
- the large sub-luminance feature intermediate image and the smaller-scale sub-luminance feature intermediate image are subjected to a cross-scale subtraction operation to obtain the contrast information of the local center and surrounding background information.
- the algorithm of the cross-scale subtraction operation is as follows: by linearly interpolating the smaller-scale sub-luminance feature middle image representing the surrounding background information to make it the same size as the larger-scale brightness feature middle image representing the central information, and then proceed
- the pixel-to-pixel subtraction operation that is, the center peripheral difference operation, such a cross-scale operation is represented by the symbol ⁇ .
- 9 levels of Gaussian pyramids are used to obtain 9 sub-luminance feature intermediate images.
- I(2), I(3), and I(4) are selected as the middle image of the sub-luminance feature representing the central information, where c ⁇ ⁇ 2, 3, 4 ⁇ , that is, the second layer, the third layer and the fourth layer are selected
- I(2,5) select the second layer as the sub-luminance feature middle image representing the central information, and the fifth layer as the sub-luminance feature middle image representing the surrounding background information. Interpolation is performed on the sub-luminance feature intermediate map of the fifth layer, and after zooming in, the size of the sub-luminance feature intermediate map of the fifth layer and the second layer is the same, and then the sub-luminance feature intermediate maps of the second and fifth layers are sequentially The brightness features of the pixels in the same column are subtracted, thereby obtaining a first brightness feature map.
- S30 normalizes the first brightness feature map, and fuses all the normalized first brightness feature maps into a second brightness feature map, as shown in FIG. 3, including:
- the maximum value of the brightness feature is set to P.
- each first brightness feature graph traverse the brightness features of the first brightness feature graph to obtain the first brightness feature maximum value I 1max and the first brightness feature minimum value I 1min of the first brightness feature graph, according to the formula
- the brightness feature of the first brightness feature map is normalized to be between 0 and P, and I represents the brightness feature value of the pixel in the first brightness feature map.
- the brightness information of the color of each pixel of the image can be obtained, that is, the brightness feature value of each pixel of the image can be obtained.
- the first brightness feature map is traversed to obtain the first brightness feature value of each pixel, and the first brightness feature map is found.
- the first brightness feature maximum value and the first brightness feature minimum value of a brightness feature map according to the formula Normalize the brightness feature of the first brightness feature map to be between 0 and P. Based on this, the six first brightness feature maps can be unified to the same order of magnitude, the amplitude difference is eliminated, and the accuracy is improved.
- the first brightness feature map in the neighborhood of each pixel, there are at most 8 pixels surrounding it.
- the brightness feature values of the 8 neighboring pixels are compared with each other, and the value with the largest brightness feature is taken as the 8-neighbor maximum value of the pixel, and the value with the smallest brightness feature is taken as the 8-neighbor minimum value of the pixel.
- the pixel is also compared as one of the 8 neighborhood pixels of other pixels.
- S107 Average the maximum value of all 8 neighborhoods and the minimum value of all 8 neighborhoods to obtain the brightness feature average value Q.
- the brightness feature values obtained in the S105 example normalized to 6 first brightness feature maps between 0 and P, sequentially multiply the brightness feature value of each pixel of each first brightness feature map by (PQ) 2 .
- PQ the brightness feature value of each pixel of each first brightness feature map
- the potential saliency area in each first brightness feature map can be enlarged, so that the brightness feature at the location of the potential saliency area is more prominent than the background area.
- S109 Traverse the brightness features of the first brightness feature map to obtain the second maximum brightness feature I 2max and the second brightness feature minimum I 2min , according to the formula
- the brightness feature of the first brightness feature map is normalized to be between 0 and 1.
- each first brightness feature map obtained by multiplying the brightness feature value of each pixel of each first brightness feature map by (PQ) 2 in S108, each first brightness feature map is traversed to obtain each pixel Find the second brightness feature maximum value and the second brightness feature minimum value corresponding to the first brightness feature map.
- the brightness characteristics of each pixel in the first brightness feature map are normalized to be between 0 and 1, so as to further improve the accuracy of the six first brightness feature maps.
- the six first brightness feature maps obtained in S109 are merged into a second brightness feature map through a weighted average method, which improves the accuracy of the potential saliency region.
- the tonal feature of the image is extracted, and at least two first tonal feature maps are obtained by using the image pyramid and the second multi-scale calculation relation, as shown in FIG. 4, including:
- the hue feature corresponding to each pixel is calculated, and the hue feature intermediate image is obtained.
- the three primary color components of each pixel include red, green and blue components, R is the red component, G is the green component, and B is the blue component , H is the hue feature.
- RGB color space According to the conversion of RGB color space into HIS color space, the relationship between red component, green component and blue component and hue It can be seen that if you give an image in RGB color format, you can use the formula Calculate the tonal characteristics of each pixel.
- the hue characteristics of all pixels constitute an intermediate image of hue characteristics.
- the tone feature intermediate image is input into a Gaussian pyramid, and a series of Gaussian filters are used to filter and sample the tone feature intermediate image.
- the 0th layer is the original image of the tone feature intermediate image, and the size remains unchanged.
- the large-scale tone feature intermediate image of the 0th layer is convolved with the Gaussian filter to obtain the small-scale tone feature intermediate image of the first layer.
- the other layers are followed by analogy.
- the size of the Gaussian kernel of the Gaussian filter determines the degree of blurring of the image. The smaller the Gaussian kernel, the lighter the blurring is, and the larger the Gaussian kernel is, the more serious the blurring.
- the size of the Gaussian kernel can be selected as required. For example, a 5 by 5 pixel Gaussian kernel can be used to filter and sample the tone feature intermediate image.
- the more Gaussian filters included in the Gaussian pyramid, the more levels of the Gaussian pyramid, and the more tonal feature intermediate images of different scales are obtained.
- the higher the Gaussian pyramid level the smaller the scale of the corresponding tonal feature intermediate image and the lower the resolution.
- the number of levels of the Gaussian pyramid needs to be determined according to the size of the input tonal feature intermediate image. The larger the size of the input tone feature intermediate image, the more levels of the Gaussian pyramid set correspondingly, and the smaller the size of the input tone feature intermediate image, the fewer the levels of the correspondingly set Gaussian pyramid.
- the following provides a method of inputting the tone feature intermediate image into a Gaussian pyramid to obtain the sub-tone feature intermediate image of M scales to clearly describe the implementation process.
- the level of the Gaussian pyramid is 9 layers, that is, the 0th to the 8th layer.
- the 0th layer is the original image of the tone feature intermediate image, and the size remains unchanged.
- the first layer Firstly, the original image of the intermediate image of the tonal feature of the 0th layer is doubled, and the Gaussian filter is used to filter and sample it, so that its length and width are reduced by one time, and the image area becomes 1/4 times. Make its size become 1/2 times of the original image of the tone feature intermediate image.
- the second layer Firstly, the original image of the half-time tonal feature intermediate image obtained in the first layer is doubled, and the Gaussian filter is used to filter and sample it, so that the length and width are respectively shortened, and the image area becomes 1/4 times, so that its size becomes 1/4 times of the original image of the tone feature intermediate image.
- the process from the 3rd layer to the 8th layer is carried out in turn with reference to the above method, so as to obtain 1/8 times, 1/16 times, 1/32 times, 1/64 times, 1/128 times,
- 9 scales of sub-tone feature intermediate images can be obtained, which are 1 times, 1/2 times, 1/4 times, 1/8 times, 1/16 times, and 1/32 times of the input tone feature intermediate images. , 1/64 times, 1/128 times, 1/256 times.
- H(c, s)
- the second multi-scale calculation relation H(c,s)
- is called the center-peripheral difference calculation, which is designed according to the physiological structure of the human eye and is used to calculate the image H(c , S) Contrast information.
- the human eye's receptive field reacts strongly to features with high contrast in the input of visual information, such as the case where the center is green and the periphery is red. This also belongs to visual information with greater contrast.
- the larger-scale sub-tone feature intermediate image has more detailed information, while the smaller-scale sub-tone feature intermediate image can reflect the local background information more due to filtering and sampling operations. Therefore, the scale The larger sub-tone feature intermediate image and the smaller-scale sub-tone feature intermediate image are subjected to a cross-scale subtraction operation to obtain the contrast information of the local center and surrounding background information.
- the algorithm of the cross-scale subtraction operation is as follows: by linearly interpolating the small-scale sub-tone feature intermediate image representing the surrounding background information, so that it has the same size as the larger-scale sub-tone feature intermediate image representing the central information, and then Perform a pixel-to-pixel subtraction operation, that is, a central peripheral difference operation.
- a cross-scale operation is represented by the symbol ⁇ .
- 9-layer Gaussian pyramid can be used to obtain 9 intermediate image of tonal characteristics.
- H(2,5) select the second layer as the sub-tone feature intermediate map representing the central information, and the fifth layer as the sub-tone feature intermediate map representing the surrounding background information. Interpolation is performed on the tone feature intermediate map of the 5th layer, and the size of the sub-tone feature intermediate map of the fifth layer and the second layer is the same after being enlarged, and then the sub-tone feature intermediate maps of the second and fifth layers are aligned in sequence The tone features of pixels in the same column are subtracted, thereby obtaining a first tone feature map.
- S40 normalizes the first tone feature map, and merges all the normalized first tone feature maps into a second tone feature map, as shown in FIG. 5, including:
- the color attribute information of each pixel of the image can be obtained, that is, the tone feature value of each pixel of the image can be obtained.
- the first tone feature map is traversed to obtain the first tone feature value of each pixel, and the first tone feature map is found.
- the first hue feature maximum value and the first hue feature minimum value of a hue feature map according to the formula
- the hue feature of the first hue feature map is normalized to be between 0 and P. Based on this, the six first tone feature maps can be unified to the same order of magnitude, the amplitude difference is eliminated, and the accuracy is improved.
- the tone feature values of these 8 neighborhood pixels are compared with each other, and the maximum value of the tone feature is The 8-neighborhood maximum value of the pixel, and the smallest hue feature value is the 8-neighborhood minimum value of the pixel. And, the pixel is also compared as one of the 8 neighborhood pixels of other pixels.
- S207 Average the maximum value of all 8 neighborhoods and the minimum value of all 8 neighborhoods to obtain the hue feature average value Q.
- the tone feature values are normalized to 6 first tone feature maps between 0 and P, and the tone feature value of each pixel of each first tone feature map is multiplied by (PQ) 2 in turn .
- PQ the tone feature value of each pixel of each first tone feature map
- the potential saliency area in each first color feature map can be enlarged, so that the color feature at the location of the potential saliency area is more prominent relative to the background area.
- each first tone feature map is traversed separately to obtain each pixel
- the hue feature of each pixel of the first hue feature map is normalized to be between 0 and 1, so as to further improve the accuracy of the six first hue feature maps.
- the six first hue feature maps obtained in S209 are merged into a second hue feature map through a weighted average method, which improves the accuracy of the potential saliency region.
- the method for detecting the saliency area of the image further includes:
- the adaptive threshold binarization method can be the maximum between-class variance method, namely the Otsu method (OTSU).
- Otsu method Otsu method
- the maximum between-class variance method is used to binarize the saliency map.
- the saliency map is divided into two parts: the background and the foreground.
- the foreground is the part to be segmented according to the threshold, and the boundary value between the foreground and the background is the required threshold.
- Pre-set multiple values as options for the threshold traverse different values, and calculate the inter-class variance between the corresponding background and foreground under different values.
- the corresponding value is the largest at this time
- the foreground segmented according to the threshold is the saliency area.
- the value corresponding to the maximum value of the inter-class variance is The required threshold value is better for segmentation.
- the segmentation threshold of the foreground and background in the saliency map For example, set the segmentation threshold of the foreground and background in the saliency map to T, the proportion of pixels belonging to the foreground to the entire saliency map is recorded as W1, and its average gray level is ⁇ 1; the proportion of background pixels in the entire saliency map is recorded as W2, its average gray scale is ⁇ 2.
- the total average gray level of the saliency map is recorded as ⁇ ; the variance between classes is recorded as g.
- the size of the saliency map is L ⁇ N
- the number of pixels in the saliency map whose gray values are less than the threshold T is recorded as N1
- the number of pixels with gray values greater than the threshold T is recorded as N2.
- N 1 +N 2 L ⁇ N
- W 1 +W 2 1.
- Pre-set multiple values as options for the threshold T and use the traversal method to find the maximum value of the inter-class variance g in the formula (3), and the corresponding T value is the optimal threshold T that is sought.
- S60 obtains the saliency area according to the saliency map, including:
- the adaptive threshold binarization method By adopting the adaptive threshold binarization method to binarize the saliency map, the difference between the foreground and the background in the saliency map is more obvious, and the saliency region can be obtained more quickly and accurately.
- the saliency region is acquired according to the binary image obtained after the saliency map is binarized in S61, as shown in FIG. 7, including:
- S610 Perform connected domain labeling on pixels of the binary image, and merge all pixels with the same connected domain label into one connected domain.
- the connected domain is labeled for the pixels of the binary image, and the pixels with the same connected domain label are regarded as a connected domain.
- the pixels as the background in the binary image can be removed later, and only the pixels as the foreground are segmented, which reduces calculations and improves Speed.
- the pixels with the same connected domain label are a connected domain, and a connected domain is regarded as a saliency area.
- the binary image is divided into multiple connected components, and multiple connected components are extracted as multiple saliency regions.
- pixels of the binary image are labeled with connected regions, and pixels with the same connected region label are merged into one connected region, as shown in FIG. 8, including:
- the gray value of the pixel in the binary image is 1 or 0.
- the initial mark value is selected from 2.
- S611 Traverse the pixels of the binary image line by line.
- S614 If it is 1 and is not marked, use the pixel as a seed pixel, mark the connected domain label of the seed pixel as N, N ⁇ 2; and traverse 8 neighboring pixels of the seed pixel.
- S615 Determine whether the gray value of the unmarked pixel in the 8 neighborhood pixels of the seed pixel is 1.
- the 8 neighborhood pixels of the pixel with coordinates (x, y) have coordinates (x-1, y-1), (x-1, y), (x-1, y+1), (x , Y-1), (x, y+1), (x+1, y-1), (x+1, y), (x+1, y+1) pixels;
- x is the pixel in the binary value
- y is the number of columns in the binary image.
- the gray value of at least one unmarked pixel among the 8-neighbor pixels of the seed pixel is 1, then the connected domain label of the unmarked pixel with the gray value of 1 among the 8-neighbor pixels of the seed pixel It is marked as N, and all unmarked pixels with a gray value of 1 traversed in the 8 neighborhood pixels are used as seed pixels, and the traversal is performed in a loop and the marked value remains unchanged.
- the label value remains unchanged, that is, the label value is still N.
- the following provides a method for labeling the connected domains of the pixels of the binary image, and merging the pixels with the same connected domain label into one connected domain to clearly describe the implementation process.
- the method of labeling the connected domains of the pixels of the binary image and merging the pixels with the same connected domain label into one connected domain includes:
- the first step traverse the pixels of the binary image line by line.
- Step 2 Determine whether the pixel is unmarked and the gray value is 1.
- Step 3 When the pixel with coordinates (2, 3) is scanned, it is not marked and the gray value is 1, then the pixel with coordinates (2, 3) is used as the seed pixel. At this time, the coordinates are ( The connected domain of the seed pixel in 2, 3) is marked as 2.
- Step 4 The 8 neighborhood pixels of the seed pixel with coordinates (2, 3) are respectively (1, 2), (1, 3), (1, 4), (2, 2), (2) , 4), (3, 2), (3, 3), (3, 4) pixels, traverse the above 8 pixels.
- Step 5 Determine whether there are unmarked pixels and the gray value is 1.
- Step 6 When the pixels with coordinates (3, 2) and (3, 3) are scanned, they are not marked and the gray value is 1, and the gray values of the remaining pixels are all 0, then the coordinates are (3 , 2) and (3, 3) connected domain labels are marked as 2, and the two pixels with coordinates (3, 2) and (3, 3) are all used as seed pixels.
- Step 7 In the sixth step, first scan the pixel with coordinates (3, 3) and then scan the pixel with coordinates (3, 2), then first use the pixel with coordinates (3, 3) as the seed pixel, and mark it as 2. Traverse the 8 neighborhood pixels of the pixel with coordinates (3, 3), determine whether there is an unmarked pixel with a gray value of 1, and then use the pixel with coordinates (3, 2) as the seed pixel, and mark it as 2. Traverse the 8-neighbor pixels of the pixel at coordinates (3, 2) to determine whether there are unmarked pixels and the gray value is 1.
- Step 8 In the seventh step, when the pixel with coordinates (3, 3) is used as the seed pixel, among its 8 neighboring pixels, the pixels with coordinates (2, 2) and (3, 2) have been marked as 2 , The gray values of the remaining pixels are all 0; when the pixel with coordinates (3, 2) is used as the seed pixel, among the 8 neighborhood pixels, the pixels with coordinates (2, 2) and (3, 2) have been marked as 2. The gray values of the remaining pixels are all 0, then the connected domain mark ends, and the mark value becomes 3.
- the label values of pixels with coordinates (2,3), (3,2), and (3,3) are all 2, forming a connected domain, and the connected domain is extracted as a saliency region.
- the embodiment of the present disclosure also provides a detection device for a saliency area of an image, as shown in FIG. 9, including:
- the extraction module 10 is configured to extract the brightness feature of the image, and obtain at least two first brightness feature maps by using the image pyramid and the first multi-scale calculation relation formula.
- the extraction module 10 is further configured to extract the tone features of the image, and obtain at least two first tone feature maps by using the image pyramid and the second multi-scale calculation relationship.
- the fusion module 20 is configured to perform normalization processing on the at least two first brightness feature maps respectively, and fuse the at least two first brightness feature maps after the normalization processing into a second brightness feature map.
- the fusion module 20 is further configured to perform normalization processing on the at least two first tone feature maps respectively, and fuse the at least two first tone feature maps after the normalization processing into a second tone feature map .
- the fusion module 20 is further configured to merge the second brightness feature map and the second hue feature map into a saliency map.
- the acquiring module 30 is configured to acquire the saliency area according to the saliency map.
- the detection device of the image saliency area is integrated in the server.
- the brightness feature of the image is extracted by the extraction module 10, and at least two first brightness feature maps are obtained by using the image pyramid and the first multi-scale calculation relational expression, and
- the extraction module 10 extracts the tone features of the image, and obtains at least two first tone feature maps by using the image pyramid and the second multi-scale calculation relationship;
- the fusion module 20 performs normalization processing on the at least two first brightness feature maps respectively , And fusion of at least two first brightness feature maps after normalization processing into a second brightness feature map, and the fusion module 20 performs normalization processing on the at least two first tone feature maps respectively, and normalizes
- the processed at least two first tone feature maps are fused into a second tone feature map, and the fusion module 20 also fuses the second brightness feature map and the second tone feature map into a saliency map.
- the acquisition module acquires the saliency area according to the saliency map. It can be seen that the embodiments of the present disclosure can quickly obtain the target that approximates the human eye's attention when acquiring the saliency region based on the brightness feature and hue feature information fused by the saliency map, which is compared to the need for each acquisition in the prior art.
- the training sample not only improves the accuracy of acquiring the saliency region, but also improves the acquisition speed.
- the acquiring module 30 is further configured to use an adaptive threshold binarization algorithm to perform binarization processing on the saliency map to obtain a binary image.
- the acquiring module 30 is further configured to mark the connected domains of the pixels of the binary image, merge all pixels with the same connected domain label into one connected domain, and use the connected domain as a saliency area.
- the acquiring module 30 performs connected domain labeling on the pixels of the binary image, merges all pixels with the same connected domain label into a connected domain, and subsequently removes the background pixels in the binary image, and only divides the foreground pixels. Reduce calculations and increase speed.
- the extraction module 10 is also configured to use the three primary color components of each pixel in the image
- the brightness feature corresponding to each pixel is calculated, and the brightness feature middle image is obtained.
- the three primary color components of each pixel include red, green and blue components.
- R is the red component
- G is the green component
- B is the blue component.
- I is the brightness characteristic.
- the image pyramid is a Gaussian pyramid with M layers, and M ⁇ 7.
- I(c,s)
- at least two first brightness feature maps are calculated; where I(c,s) represents the first brightness feature
- I(c) represents the middle image of the sub-luminance feature of the c-th scale
- I(s) represents the middle image of the sub-luminance feature of the s-th scale
- c ⁇ 2, ⁇ 3, 5 ⁇ s ⁇ M-1, s c+ ⁇ .
- the extraction module 10 is also configured to use the three primary color components of each pixel in the image
- the hue feature corresponding to each pixel is calculated, and the hue feature intermediate image is obtained.
- the three primary color components of each pixel include red, green and blue components, R is the red component, G is the green component, and B is the blue component , H is the hue feature.
- the tone feature intermediate map is input into the image pyramid to obtain the sub-tone feature intermediate map of M scales, where the image pyramid is a Gaussian pyramid with M layers, and M ⁇ 7.
- H(c,s)
- the extraction module 10 specifically extracts brightness features and hue features, which can better reflect the essential attributes of foreground objects in the image, and make the image The foreground objects can be more completely extracted as the saliency area.
- the extraction module 10 extracts the brightness features of the image, and uses the classmate pyramid and the first multi-scale calculation relationship to obtain at least two first brightness feature maps, and the extraction module 10 extracts the tonal features of the image, and uses the image pyramid and The second multi-scale calculation relation formula is used to obtain at least two first tone feature maps; the fusion module 20 performs normalization processing on the at least two first brightness feature maps respectively, and normalizes at least two first tone feature maps after the normalization
- the brightness feature maps are fused into a second brightness feature map, and the fusion module 20 performs normalization processing on at least two first tone feature maps, respectively, and fuses the at least two first tone feature maps after the normalization process into a first Two-tone feature map, in addition, the fusion module 20 fuses the second brightness feature map and the second tone feature map into a saliency map.
- the acquisition module acquires the saliency area according to the saliency map.
- An embodiment of the present disclosure also provides a computer device 1000, as shown in FIG. 10, including a storage unit 1010 and a processing unit 1020, and the storage unit 1010 stores a computer program that can be run on the processing unit 1020 and stores the marking result.
- the processing unit 1020 implements the above-mentioned method for detecting a salient region of an image when the computer program is executed.
- Embodiments of the present disclosure also provide a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, the above-mentioned method for detecting a salient region of an image is realized.
- Computer-readable storage media include permanent/non-permanent, volatile/nonvolatile, removable/non-removable media, and information storage can be achieved by any method or technology.
- the information can be computer-readable instructions, data structures, program modules, or other data.
- Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
- PRAM phase change memory
- SRAM static random access memory
- DRAM dynamic random access memory
- RAM random access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other memory technology
- CD-ROM compact disc
- DVD digital versatile disc
- Magnetic cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non
Abstract
Description
Claims (15)
- 一种图像的显著性区域的检测方法,包括:提取所述图像的亮度特征,并利用图像金字塔和第一多尺度计算关系式,得到至少两个第一亮度特征图;提取所述图像的色调特征,并利用所述图像金字塔和第二多尺度计算关系式,得到至少两个第一色调特征图;将所述至少两个第一亮度特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一亮度特征图融合成第二亮度特征图;将所述至少两个第一色调特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一色调特征图融合成第二色调特征图;将所述第二亮度特征图和所述第二色调特征图融合成显著图;以及根据所述显著图,获取所述显著性区域。
- 根据权利要求1所述的图像的显著性区域的检测方法,还包括:采用自适应阈值二值化算法对所述显著图进行二值化处理,得到二值图像,其中,所述根据所述显著图,获取所述显著性区域,包括:根据所述二值图像,获取所述显著性区域。
- 根据权利要求2所述的图像的显著性区域的检测方法,其中,所述根据所述二值图像,获取所述显著性区域,包括:对所述二值图像的像素进行连通域标记,将连通域标号相同的所有像素合并为一个连通域;以及将所述连通域作为显著性区域。
- 根据权利要求3所述的图像的显著性区域的检测方法,其中,对所述二值图像的像素进行连通域标记,将所述连通域标号相同的像素合并为一个连通域,包括:将初始标记值设置为N,N≥2;逐行遍历所述二值图像的每个像素,判断所述像素是否未标记且灰度值为1;若该像素的灰度值不为1,或者若为1但已被标记,则继续逐行遍历所述二值 图像的像素;若该像素的灰度值为1且未标记,则作为种子像素,将所述种子像素的连通域标号标记为N,并且遍历所述种子像素的8邻域像素,判断所述种子像素的8邻域像素中未标记的像素灰度值是否为1;若该种子像素的8邻域像素中的至少一个未标记的像素的灰度值为1,则将该种子像素的8邻域像素中未标记且灰度值为1的像素连通域标号标记为N,并将8邻域像素中遍历到的未标记且灰度值为1的像素全部作为种子像素;以及若该种子像素的8邻域像素的未标记的像素的灰度值均不为1,则一个连通域标记结束,N加1,继续逐行遍历所述二值图像的像素,其中,坐标为(x,y)的像素的8邻域像素分别为坐标是(x-1,y-1)、(x-1,y)、(x-1,y+1)、(x,y-1)、(x,y+1)、(x+1,y-1)、(x+1,y)、(x+1,y+1)的像素;x为所述像素在二值图像中的行数,y为所述像素在二值图像的列数。
- 根据权利要求1-4任一项所述的图像的显著性区域的检测方法,其中,所述提取图像的亮度特征,并利用图像金字塔和第一多尺度计算关系式,得到至少两个第一亮度特征图,包括:根据所述图像中每个像素的三原色分量,利用 计算得到每个像素对应的亮度特征,以得到亮度特征中间图,其中,每个像素的三原色分量包括红色分量、绿色分量和蓝色分量,R为红色分量、G为绿色分量、B为蓝色分量,I为亮度特征;将所述亮度特征中间图输入所述图像金字塔,得到M个尺度的子亮度特征中间图,其中,所述图像金字塔为M层的高斯金字塔,M≥7;以及通过所述第一多尺度计算关系式I(c,s)=|I(c)ΘI(s)|,计算得到所述至少两个第一亮度特征图,其中,I(c,s)表示第一亮度特征图,I(c)表示第c个尺度的子亮度特征中间图,I(s)表示第s个尺度的子亮度特征中间图,c≥2,δ≥3,5≤s≤M-1,s=c+δ。
- 根据权利要求1-4任一项所述的图像的显著性区域的检测方法,其中,所述提取所述图像的色调特征,并利用所述图像金字塔和第二多尺度计算关系式,得到 至少两个第一色调特征图,包括:根据所述图像中每个像素的三原色分量,利用 计算得到每个像素对应的色调特征,以得到色调特征中间图,其中,每个像素的三原色分量包括红色分量、绿色分量和蓝色分量,R为红色分量、G为绿色分量、B为蓝色分量,H为色调特征;将所述色调特征中间图输入所述图像金字塔,得到M个尺度的子色调特征中间图,其中,所述图像金字塔为M层的高斯金字塔,M≥7;以及通过所述第二多尺度计算关系式H(c,s)=|H(c)ΘH(s)|,计算得到所述至少两个第一色调特征图,其中,H(c,s)表示第一色调特征图,H(c)表示第c个尺度的子色调特征中间图,H(s)表示第s个尺度的子色调特征中间图,c≥2,δ≥3,5≤s≤M-1,s=c+δ。
- 根据权利要求5所述的图像的显著性区域的检测方法,其中,所述将所述至少两个第一亮度特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一亮度特征图融合成第二亮度特征图,包括:设定亮度特征最大值P;针对每个第一亮度特征图,遍历所述第一亮度特征图的亮度特征,获取所述第一亮度特征图的第一亮度特征最大值I 1max和第一亮度特征最小值I 1min,根据公式 将所述第一亮度特征图的亮度特征归一化至O~P之间,I表示所述第一亮度特征图的像素的亮度特征值;在亮度特征归一化后的所述第一亮度特征图中,针对存在8邻域的每个像素,根据其8邻域像素的亮度特征值,获取其中的8邻域亮度特征最大值和8邻域亮度特征最小值;将所有8邻域亮度特征最大值和所有8邻域亮度特征最小值进行平均,得到亮度特征平均值Q;将所述第一亮度特征图的各像素的亮度特征值与(P-Q) 2相乘;将归一化处理后的所述至少两个第一亮度特征图通过加权平均方式融合成第二亮度特征图。
- 根据权利要求6所述的图像的显著性区域的检测方法,其中,所述将所述至少两个第一色调特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一色调特征图融合成一个第二色调特征图,包括:设定色调特征最大值P;针对每个第一色调特征图,遍历所述第一色调特征图的色调特征,获取所述第一色调特征图的第一色调特征最大值H 1max和第一色调特征最小值H 1min,根据公式 将所述第一色调特征图的色调特征归一化至O~P之间,H表示所述第一色调特征图的像素的色调特征值;在色调特征归一化后的所述第一色调特征图中,针对存在8邻域的每个像素,根据其8邻域的色调特征值,获取其中的8邻域色调特征最大值和8邻域色调特征最小值;将所有8邻域色调特征最大值和所有8邻域色调特征最小值进行平均,得到色调特征平均值Q;将所述第一色调特征图的各像素的色调特征值与(P-Q) 2相乘;将所有归一化处理后的所述至少两个第一色调特征图通过加权平均方式融合成一个第二色调特征图。
- 一种计算机设备,包括存储单元和处理单元,其中所述存储单元中存储可在 所述处理单元上运行的计算机程序;所述处理单元执行所述计算机程序时实现如权利要求1-8任一项所述的图像的显著性区域的检测方法。
- 一种计算机可读存储介质,其存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-8任一项所述的图像的显著性区域的检测方法。
- 一种图像的显著性区域的检测装置,包括:提取模块,配置为提取所述图像的亮度特征,并利用图像金字塔和第一多尺度计算关系式,得到至少两个第一亮度特征图,所述提取模块还配置为提取所述图像的色调特征,并利用所述图像金字塔和第二多尺度计算关系式,得到至少两个第一色调特征图;融合模块,配置为将所述至少两个第一亮度特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一亮度特征图融合成第二亮度特征图,所述融合模块,还配置为将所述至少两个第一色调特征图分别进行归一化处理,并将归一化处理后的所述至少两个第一色调特征图融合成第二色调特征图,所述融合模块还配置为将所述第二亮度特征图和所述第二色调特征图融合成显著图;以及获取模块,配置为根据所述显著图,获取所述显著性区域。
- 根据权利要求11所述的图像的显著性区域的检测装置,其中,所述获取模块还配置为采用自适应阈值二值化算法对所述显著图进行二值化处理,得到二值图像;以及根据所述二值图像,获取所述显著性区域。
- 根据权利要求12所述的图像的显著性区域的检测装置,其中,所述获取模块还配置为对所述二值图像的像素进行连通域标记,将连通域标号相同的所有像素合并为一个连通域;以及将所述连通域作为显著性区域。
- 根据权利要求11所述的图像的显著性区域的检测装置,其中,所述提取模块还配置为:根据所述图像中每个像素的三原色分量,利用 计算得到每个像素对应的亮度特征,以得到亮度特征中间图,其中,每个像素的三原色分量包括红色分量、绿色分量和蓝色分量,R为红色分量、G为绿色分量、B为蓝色分量,I为亮度特征;将所述亮度特征中间图输入所述图像金字塔,得到M个尺度的子亮度特征中间图,其中,所述图像金字塔为M层的高斯金字塔,M≥7;以及通过所述第一多尺度计算关系式I(c,s)=|I(c)ΘI(s)|,计算得到所述至少两个第一亮度特征图,其中,I(c,s)表示第一亮度特征图,I(c)表示第c个尺度的子亮度特征中间图,I(s)表示第s个尺度的子亮度特征中间图;c≥2,δ≥3,5≤s≤M-1,s=c+δ。
- 根据权利要求11所述的图像的显著性区域的检测装置,其中,所述提取模块还配置:根据所述图像中每个像素的三原色分量,利用 计算得到每个像素对应的色调特征,得到色调特征中间图,其中,每个像素的三原色分量包括红色分量、绿色分量和蓝色分量,R为红色分量、G为绿色分量、B为蓝色分量,H为色调特征;将所述色调特征中间图输入所述图像金字塔,得到M个尺度的子色调特征中间图,其中,所述图像金字塔为M层的高斯金字塔,M≥7;以及通过所述第二多尺度计算关系式H(c,s)=|H(c)ΘH(s)|,计算得到所述至少两个第一色调特征图,其中,H(c,s)表示第一色调特征图,H(c)表示第c个尺度的子色调特征中间图,H(s)表示第s个尺度的子色调特征中间图,c≥2,δ≥3,5≤s≤M-1,s=c+δ。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910301296.3A CN110008969B (zh) | 2019-04-15 | 2019-04-15 | 图像显著性区域的检测方法和装置 |
CN201910301296.3 | 2019-04-15 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020211522A1 true WO2020211522A1 (zh) | 2020-10-22 |
Family
ID=67172018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/076000 WO2020211522A1 (zh) | 2019-04-15 | 2020-02-20 | 图像的显著性区域的检测方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110008969B (zh) |
WO (1) | WO2020211522A1 (zh) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329796A (zh) * | 2020-11-12 | 2021-02-05 | 北京环境特性研究所 | 基于视觉显著性的红外成像卷云检测方法和装置 |
CN113009443A (zh) * | 2021-02-22 | 2021-06-22 | 南京邮电大学 | 一种基于图连通密度的海面目标检测方法及其装置 |
CN113362356A (zh) * | 2021-06-02 | 2021-09-07 | 杭州电子科技大学 | 一种基于双侧注意通路的显著轮廓提取方法 |
CN114022753A (zh) * | 2021-11-16 | 2022-02-08 | 北京航空航天大学 | 基于显著性和边缘分析的对空小目标检测算法 |
CN114332572A (zh) * | 2021-12-15 | 2022-04-12 | 南方医科大学 | 基于显著图引导分层密集特征融合网络用于提取乳腺病变超声图像多尺度融合特征参数方法 |
CN114998189A (zh) * | 2022-04-15 | 2022-09-02 | 电子科技大学 | 一种彩色显示点缺陷检测方法 |
CN115578476A (zh) * | 2022-11-21 | 2023-01-06 | 山东省标筑建筑规划设计有限公司 | 一种用于城乡规划数据的高效存储方法 |
CN115598138A (zh) * | 2022-11-23 | 2023-01-13 | 惠州威尔高电子有限公司(Cn) | 基于显著性检测的电源控制电路板瑕疵检测方法及系统 |
CN115810113A (zh) * | 2023-02-10 | 2023-03-17 | 南京隼眼电子科技有限公司 | 针对sar图像的显著特征提取方法及装置 |
CN116051543A (zh) * | 2023-03-06 | 2023-05-02 | 山东锦霖钢材加工有限公司 | 一种用于钢材剥皮的缺陷识别方法 |
CN117455913A (zh) * | 2023-12-25 | 2024-01-26 | 卡松科技股份有限公司 | 基于图像特征的液压油污染智能检测方法 |
CN114022753B (zh) * | 2021-11-16 | 2024-05-14 | 北京航空航天大学 | 基于显著性和边缘分析的对空小目标检测算法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008969B (zh) * | 2019-04-15 | 2021-05-14 | 京东方科技集团股份有限公司 | 图像显著性区域的检测方法和装置 |
CN111444929B (zh) * | 2020-04-01 | 2023-05-09 | 北京信息科技大学 | 一种基于模糊神经网络的显著图计算方法及系统 |
CN112669306A (zh) * | 2021-01-06 | 2021-04-16 | 北京信息科技大学 | 一种基于显著图的太阳能电池片缺陷检测方法及系统 |
CN114022747B (zh) * | 2022-01-07 | 2022-03-15 | 中国空气动力研究与发展中心低速空气动力研究所 | 基于特征感知的显著目标提取方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227816A1 (en) * | 2014-02-10 | 2015-08-13 | Huawei Technologies Co., Ltd. | Method and apparatus for detecting salient region of image |
CN108960247A (zh) * | 2017-05-22 | 2018-12-07 | 阿里巴巴集团控股有限公司 | 图像显著性检测方法、装置以及电子设备 |
CN109410171A (zh) * | 2018-09-14 | 2019-03-01 | 安徽三联学院 | 一种用于雨天图像的目标显著性检测方法 |
CN110008969A (zh) * | 2019-04-15 | 2019-07-12 | 京东方科技集团股份有限公司 | 图像显著性区域的检测方法和装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100573523C (zh) * | 2006-12-30 | 2009-12-23 | 中国科学院计算技术研究所 | 一种基于显著区域的图像查询方法 |
US20100268301A1 (en) * | 2009-03-06 | 2010-10-21 | University Of Southern California | Image processing algorithm for cueing salient regions |
CN102521595B (zh) * | 2011-12-07 | 2014-01-15 | 中南大学 | 一种基于眼动数据和底层特征的图像感兴趣区域提取方法 |
CN107301420A (zh) * | 2017-06-30 | 2017-10-27 | 武汉大学 | 一种基于显著性分析的热红外影像目标探测方法 |
-
2019
- 2019-04-15 CN CN201910301296.3A patent/CN110008969B/zh not_active Expired - Fee Related
-
2020
- 2020-02-20 WO PCT/CN2020/076000 patent/WO2020211522A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150227816A1 (en) * | 2014-02-10 | 2015-08-13 | Huawei Technologies Co., Ltd. | Method and apparatus for detecting salient region of image |
CN108960247A (zh) * | 2017-05-22 | 2018-12-07 | 阿里巴巴集团控股有限公司 | 图像显著性检测方法、装置以及电子设备 |
CN109410171A (zh) * | 2018-09-14 | 2019-03-01 | 安徽三联学院 | 一种用于雨天图像的目标显著性检测方法 |
CN110008969A (zh) * | 2019-04-15 | 2019-07-12 | 京东方科技集团股份有限公司 | 图像显著性区域的检测方法和装置 |
Non-Patent Citations (1)
Title |
---|
WANG, WENHAO ET AL.: "Improved Multi-scale Saliency Detection Based on HSV Space", COMPUTER ENGINEERING & SCIENCE, vol. 39, no. 2, 28 February 2017 (2017-02-28), ISSN: 1007-130X, DOI: 20200424151536X * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112329796B (zh) * | 2020-11-12 | 2023-05-23 | 北京环境特性研究所 | 基于视觉显著性的红外成像卷云检测方法和装置 |
CN112329796A (zh) * | 2020-11-12 | 2021-02-05 | 北京环境特性研究所 | 基于视觉显著性的红外成像卷云检测方法和装置 |
CN113009443A (zh) * | 2021-02-22 | 2021-06-22 | 南京邮电大学 | 一种基于图连通密度的海面目标检测方法及其装置 |
CN113009443B (zh) * | 2021-02-22 | 2023-09-12 | 南京邮电大学 | 一种基于图连通密度的海面目标检测方法及其装置 |
CN113362356A (zh) * | 2021-06-02 | 2021-09-07 | 杭州电子科技大学 | 一种基于双侧注意通路的显著轮廓提取方法 |
CN113362356B (zh) * | 2021-06-02 | 2024-02-02 | 杭州电子科技大学 | 一种基于双侧注意通路的显著轮廓提取方法 |
CN114022753A (zh) * | 2021-11-16 | 2022-02-08 | 北京航空航天大学 | 基于显著性和边缘分析的对空小目标检测算法 |
CN114022753B (zh) * | 2021-11-16 | 2024-05-14 | 北京航空航天大学 | 基于显著性和边缘分析的对空小目标检测算法 |
CN114332572A (zh) * | 2021-12-15 | 2022-04-12 | 南方医科大学 | 基于显著图引导分层密集特征融合网络用于提取乳腺病变超声图像多尺度融合特征参数方法 |
CN114332572B (zh) * | 2021-12-15 | 2024-03-26 | 南方医科大学 | 基于显著图引导分层密集特征融合网络用于提取乳腺病变超声图像多尺度融合特征参数方法 |
CN114998189B (zh) * | 2022-04-15 | 2024-04-16 | 电子科技大学 | 一种彩色显示点缺陷检测方法 |
CN114998189A (zh) * | 2022-04-15 | 2022-09-02 | 电子科技大学 | 一种彩色显示点缺陷检测方法 |
CN115578476A (zh) * | 2022-11-21 | 2023-01-06 | 山东省标筑建筑规划设计有限公司 | 一种用于城乡规划数据的高效存储方法 |
CN115598138A (zh) * | 2022-11-23 | 2023-01-13 | 惠州威尔高电子有限公司(Cn) | 基于显著性检测的电源控制电路板瑕疵检测方法及系统 |
CN115810113A (zh) * | 2023-02-10 | 2023-03-17 | 南京隼眼电子科技有限公司 | 针对sar图像的显著特征提取方法及装置 |
CN116051543A (zh) * | 2023-03-06 | 2023-05-02 | 山东锦霖钢材加工有限公司 | 一种用于钢材剥皮的缺陷识别方法 |
CN116051543B (zh) * | 2023-03-06 | 2023-06-16 | 山东锦霖钢材加工有限公司 | 一种用于钢材剥皮的缺陷识别方法 |
CN117455913A (zh) * | 2023-12-25 | 2024-01-26 | 卡松科技股份有限公司 | 基于图像特征的液压油污染智能检测方法 |
CN117455913B (zh) * | 2023-12-25 | 2024-03-08 | 卡松科技股份有限公司 | 基于图像特征的液压油污染智能检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN110008969A (zh) | 2019-07-12 |
CN110008969B (zh) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020211522A1 (zh) | 图像的显著性区域的检测方法和装置 | |
CN108805023B (zh) | 一种图像检测方法、装置、计算机设备及存储介质 | |
TWI774659B (zh) | 圖像文字的識別方法和裝置 | |
CN108229490B (zh) | 关键点检测方法、神经网络训练方法、装置和电子设备 | |
Cornelis et al. | Crack detection and inpainting for virtual restoration of paintings: The case of the Ghent Altarpiece | |
CN111915704A (zh) | 一种基于深度学习的苹果分级识别方法 | |
CN107784669A (zh) | 一种光斑提取及其质心确定的方法 | |
US11151402B2 (en) | Method of character recognition in written document | |
US9256928B2 (en) | Image processing apparatus, image processing method, and storage medium capable of determining a region corresponding to local light from an image | |
JP2012008100A (ja) | 画像処理装置、画像処理方法及び画像処理プログラム | |
WO2015161794A1 (zh) | 一种基于图像显著性检测的获取缩略图的方法 | |
CN108389215B (zh) | 一种边缘检测方法、装置、计算机存储介质及终端 | |
Ružić et al. | Virtual restoration of the Ghent Altarpiece using crack detection and inpainting | |
US10885326B2 (en) | Character recognition method | |
CN112101386B (zh) | 文本检测方法、装置、计算机设备和存储介质 | |
Besheer et al. | Modified invariant colour model for shadow detection | |
JP3740351B2 (ja) | 画像加工装置および方法およびこの方法の実行プログラムを記録した記録媒体 | |
US8885971B2 (en) | Image processing apparatus, image processing method, and storage medium | |
JP5870745B2 (ja) | 画像処理装置、二値化閾値算出方法及びコンピュータプログラム | |
CN108877030B (zh) | 图像处理方法、装置、终端和计算机可读存储介质 | |
JP2012252691A (ja) | 画像からテキスト筆画画像を抽出する方法及び装置 | |
JP2024501642A (ja) | 画像内の注釈付きの関心領域の検出 | |
JP2011076302A (ja) | 輪郭抽出装置、輪郭抽出方法、および輪郭抽出プログラム | |
JP2010186246A (ja) | 画像処理装置、方法、及び、プログラム | |
CN114648751A (zh) | 一种处理视频字幕的方法、装置、终端及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20792070 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20792070 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20792070 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/05/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20792070 Country of ref document: EP Kind code of ref document: A1 |