WO2020133170A1 - 图像处理方法和装置 - Google Patents
图像处理方法和装置 Download PDFInfo
- Publication number
- WO2020133170A1 WO2020133170A1 PCT/CN2018/124724 CN2018124724W WO2020133170A1 WO 2020133170 A1 WO2020133170 A1 WO 2020133170A1 CN 2018124724 W CN2018124724 W CN 2018124724W WO 2020133170 A1 WO2020133170 A1 WO 2020133170A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target frame
- area
- target
- preset
- color channel
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 76
- 238000001514 detection method Methods 0.000 claims abstract description 31
- 230000004044 response Effects 0.000 claims description 27
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 230000011218 segmentation Effects 0.000 claims description 22
- 230000004927 fusion Effects 0.000 claims description 21
- 230000005484 gravity Effects 0.000 claims description 21
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims 2
- 230000000007 visual effect Effects 0.000 abstract description 9
- 230000006870 function Effects 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000005549 size reduction Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Definitions
- the invention relates to the field of image processing, in particular to an image processing method and device.
- saliency regions that is, when facing a scene, humans automatically process regions of interest and selectively ignore regions of non-interest. These human regions of interest are called saliency regions.
- Image cropping is an important task in image editing. It is used to improve the aesthetic quality of an image. The main goal is to improve the composition of the image, such as by emphasizing objects of interest, removing unwanted areas, and obtaining a better color balance. In photography, many rules, such as the rule of thirds, the rule of visual balance, and the rule of diagonal superiority, are clearly defined as being used to create well-composed images.
- An automatic image cropping method can help novice photographers and ordinary people to provide beautiful cropping suggestions, and help users save a lot of time.
- Existing automatic composition methods mainly include two types. One is to perform saliency detection on the original image, and then use the results of the saliency detection to calculate the saliency energy. Finally, only the original image is over-segmented and the optimal conditions are determined by constraints In the composition area, it is difficult to find an accurate subject in this way, and it may even introduce unwanted interference, making it difficult to obtain a beautiful composition.
- the other type is based on learning methods, trying to automatically learn composition rules or scores of cropped images from a large number of training sets. In the training set, the image is segmented, and each segmented crop is scored a corresponding score, which is used as the label of the crop to train the model. This method avoids the problem of manually designing composition rules, and can achieve an end-to-end solution. The method of composition based on learning may result in lack of training data, which leads to a poor cropping effect of the model finally learned.
- the invention provides an image processing method and device.
- an image processing method comprising:
- composition area is determined as the target image.
- an image processing device comprising:
- Storage device for storing program instructions
- the processor invokes the program instructions stored in the storage device, and when the program instructions are executed, it is used to:
- composition area is determined as the target image.
- a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the image processing method of the first aspect.
- the visual interest area in the original image is detected based on the saliency detection method to obtain the saliency map corresponding to the original image, and then the target subject is determined based on the saliency map to effectively eliminate the cluttered background Interference, and then find the best composition area according to the determined target subject and preset rules, so as to obtain a target image with better composition.
- FIG. 1 is a method flowchart of an image processing method in an embodiment of the invention
- FIG. 2 is a usage scene diagram of an image processing method in an embodiment of the invention.
- FIG. 3 is another usage scene diagram of the image processing method in an embodiment of the present invention.
- FIG. 4 is a flowchart of a specific method of the image processing method in the embodiment shown in FIG. 1;
- FIG. 5 is a flowchart of another specific method of the image processing method in the embodiment shown in FIG. 1;
- FIG. 6 is a flowchart of still another specific method of the image processing method in the embodiment shown in FIG. 1;
- FIG. 7 is a structural block diagram of an image processing apparatus in an embodiment of the present invention.
- the existing automatic composition method based on saliency detection first of all, visual saliency is not necessarily accurate, which leads to inaccurate detection of the target subject; second, the saliency detection is used to calculate the significant energy, but the target subject is not clear, and finally It is difficult to find the exact subject only by over-segmenting the original image and then passing the constraints. At the same time, it will greatly increase the traversal range, increase the amount of calculation, and may even introduce unwanted interference.
- an image processing method and device proposed in the embodiments of the present invention can first detect a region of visual interest based on saliency detection, and then use a method based on saliency distribution to determine the target subject on this basis, effectively eliminating cluttered backgrounds
- the search range is limited; the constraints are redesigned to find the best composition.
- FIG. 1 is a method flowchart of an image processing method in an embodiment of the invention. As shown in FIG. 1, the image processing method may include the following steps:
- Step S101 Perform saliency detection on the original image to obtain a saliency map
- the execution subject of the image processing method of the embodiment of the present invention is an image processing device.
- the original image may include an image acquired in real time by the image processing device and/or a local image of the image processing device.
- the image processing device communicates with the shooting device, and the original image may be an image collected by the shooting device in real time.
- the image processing device is a part of the photographing device.
- the photographing device may further include an image sensor, the image sensor communicates with the image processing device, and the original image may be an image collected by the image sensor in real time.
- the original image is a local image of the image processing device, and the local image is an image stored in the image processing device in advance.
- the image processing device after acquiring the original image, directly performs saliency detection on the original image to obtain a saliency map.
- the image processing device after the image processing device obtains the original image, it performs preliminary processing on the original image, and then performs saliency detection on the image obtained by the preliminary processing, thereby obtaining a saliency map.
- the image processing device needs to convert the color space of the original image to a specific color space, such as converting the original image to the Lab color space, so that the converted image is closer to the human eye visual feeling.
- the specific color space may also be an RGB color space or a YUV color space.
- the image processing apparatus before implementing step S101, needs to adjust the size of the original image to a preset size to meet the requirements of image processing. If the pixel size of the original image is 4000*3000, the pixel size of the original image can be adjusted to 480*360 to reduce the amount of subsequent calculations.
- step S101 may include steps S1011 to S1013.
- step S1011 perform at least two layers of pyramid decomposition for each color channel
- the color channel in step S1011 is the color channel corresponding to the color space of the original image. If the image processing device converts the color space of the original image after acquiring the original image, the color channel in step S1011 is the color channel corresponding to the color space of the image obtained after the color space of the original image is converted.
- the color channels include three color channels corresponding to the Lab color space. In other embodiments, the color channels include three color channels corresponding to the RGB color space. In still other embodiments, the color channels include three-color channels corresponding to the YUV color space.
- the number of layers that the image processing device can perform pyramid decomposition on each color channel can be 2, 3, 4, 5, 6, 7, 8, 9, or even more, You can choose according to your needs.
- Step S1012 determine the first saliency map of each layer of pyramids
- determining the first saliency map of each layer of the pyramid may include but not limited to the following steps:
- Superpixel segmentation algorithms such as slic algorithm or other algorithms can be used to perform superpixel segmentation on the image of each color channel in each layer of the pyramid.
- the histogram of the superpixel block is counted, and the histogram of the superpixel block is determined and the other of the color channels in the layer pyramid Differences between histograms of superpixel blocks.
- the first fusion weight of the superpixel block is determined, and according to the histogram of the superpixel block and other superpixels of the color channel in the layer pyramid The difference between the histograms of the blocks and the first fusion weight of the superpixel block determine the saliency response value of the pixel block.
- each superpixel block of each color channel in each layer of the pyramid according to the height of each column of the histogram of the superpixel block, the histogram of other superpixel blocks of the color channel in the layer of the pyramid.
- the height of each bar of the graph and the first preset parameter determine the difference between the histogram of the superpixel block and the histograms of other superpixel blocks of the color channel in the layer pyramid.
- the height of each bar is used to characterize the number of pixels in a specific pixel value range.
- the first fusion weight determination process includes: for each superpixel block of each color channel in each layer of pyramids, determine the relationship between the superpixel block and other superpixel blocks of that color channel in the layer of pyramids The distance, and determine the first fusion weight of the superpixel block according to the distance between the superpixel block and other superpixel blocks of the color channel in the layer pyramid and a second preset coefficient.
- the second preset coefficient can be set as required.
- the distance between each superpixel block of each color channel in each layer of the pyramid may be determined in different ways. In one of the embodiments, the distance between each superpixel block of each color channel in each layer of pyramids is the Euclidean distance between each superpixel block of each color channel in each layer of pyramids.
- the coordinates of a particular location (such as the center) of the superpixel block in the image coordinate system are used to calculate the Euclidean distance between each superpixel block. It can be understood that the distance between each superpixel block of each color channel in each layer of the pyramid may also be determined by using the Mahalanobis distance or other distance calculation methods. In this step, the distance between the superpixel block and other superpixel blocks of the color channel in the layer pyramid is the Euclidean distance between the superpixel block and other superpixel blocks of the color channel in the layer pyramid .
- the saliency response value of the i-th superpixel block The calculation formula is as follows:
- n is a positive integer
- i, j are natural numbers, and i ⁇ n-1, j ⁇ n-1;
- ⁇ is the second preset parameter, and also the empirical parameter
- dist_coord (i,j) is the Euclidean distance between the i- th superpixel block and the j-th superpixel block;
- dist_hist (i,j) is the difference between the histogram of the i- th superpixel block and the histogram of the j-th superpixel block;
- the difference between the histogram of the i-th superpixel block and the histogram of the j-th superpixel block is calculated as follows:
- m is a positive integer and k is a natural number, which is used to characterize the serial number of the columnar bar, and k ⁇ m-1;
- ⁇ k is the first preset parameter and also an empirical parameter.
- ⁇ k is related to k, that is, different weights can be set for different columnar bars;
- hist i [k] is the height of the k-th bar of the histogram of the i-th superpixel block
- hist j [k] is the height of the k-th bar of the histogram of the j-th superpixel block.
- the Euclidean distance dist_coord (i,j) between the i-th superpixel block and the j-th superpixel block is calculated as follows:
- dist_coord (i,j) ((center_x i -center_x j ) 2 +(center_y i -center_y j ) 2 ) 1/2 (3)
- (center_x i , center_y i ) is the center coordinate or center of gravity coordinate of the i-th superpixel block
- center_x j , center_y j is the center coordinate or the center of gravity coordinate of the histogram of the j-th superpixel block.
- the central coordinate of the superpixel block is the sum of the x or y coordinates of all pixels in the superpixel block directly divided by the total number of pixels in the superpixel block.
- the center-of-gravity coordinates of a superpixel block are the x coordinate of all pixels in the superpixel block multiplied by the saliency value of the pixel or the y coordinate multiplied by the saliency value of the pixel divided by the superpixel block The total number of pixels.
- this step specifically includes: for each color channel in each layer of the pyramid, according to the saliency response values of all superpixels of the color channel, normalize the saliency response value of each superpixel, According to the normalized saliency response value of each superpixel, the second saliency map of the color channel is determined.
- calculation formula of the second saliency map is as follows:
- max_global_diff is the maximum value of the saliency response values of all superpixel blocks of the color channel where the i-th superpixel block is located
- min_global_diff is the minimum value of the saliency response values of all superpixel blocks of the color channel where the i-th superpixel block is located.
- each layer of the pyramid includes the second saliency map of the L color channel, the second saliency map of the a color channel, and the second saliency map of the b color channel.
- the second saliency map of the L color channel and the second saliency map of the a color channel The second saliency map of the image and the b color channel are directly spliced to obtain the first saliency map of the layer pyramid.
- Step S1013 fuse the first saliency map of at least two layers of pyramids to obtain a saliency map.
- three first saliency maps can be obtained, and then the three first saliency maps can be fused to obtain a saliency map of the original image.
- the first saliency map of at least two layers of pyramids is fused based on the pyramid fusion algorithm to obtain a saliency map. It can be understood that the first saliency map of at least two layers of pyramids may also be fused based on other image fusion algorithms to obtain a saliency map.
- the image processing device fuses the first saliency maps of at least two layers of pyramids according to a preset second fusion weight of each layer of pyramids to obtain a saliency map.
- the size of the second fusion weight can be set as needed.
- each color channel is decomposed into at least two layers of pyramids, and then the first saliency map of each layer of pyramids is determined based on the superpixel method, and finally all the first saliency maps are weighted and fused, and there is no obvious saliency map obtained.
- Block effect is convenient for the subsequent determination of the target subject.
- the pyramid decomposition of the original image is used first, and then the saliency detection of each layer of pyramid is performed, which is equivalent to the saliency detection of the multi-scale image.
- the small-scale image can obtain the saliency of the contour, and the large-scale image can obtain the image details.
- the saliency of the final fusion of the pyramids of each layer is equivalent to the fusion of the saliency of the outline and details, making the saliency detection effect better.
- Step S102 Determine the target subject based on the saliency map
- the image processing apparatus may include steps S1021 to S1023 when determining the target subject based on the saliency map.
- step S1021 perform binary processing on the saliency map to obtain multiple connected regions
- the image processing device when the image processing device performs binarization processing on the saliency map to obtain multiple connected regions, first, the saliency map is segmented based on a preset algorithm to determine the segmentation threshold. Next, based on the segmentation threshold, the saliency map is binarized.
- the preset algorithm may be an ostu algorithm or other image segmentation algorithms.
- the image processing device when segmenting the saliency map, first segments the foreground and background of the saliency map based on a preset algorithm, determines the first threshold, and then determines the segmentation threshold according to the first threshold.
- the preset algorithm as the ostu algorithm as an example
- the first threshold is the optimal threshold auto_thresh for the image processing device to obtain the foreground and background of the segmented saliency map based on the ostu algorithm.
- the image processing device determines the segmentation threshold according to the sum of the first threshold and the preset threshold.
- the segmentation threshold is the sum of the first threshold and the preset threshold.
- the size of the preset threshold can be set as needed, and the preset threshold can be 0.2, 0.15, 0.16, 0.17, 0.18, 0.19, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29 , 0.30, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.40.
- the image processing apparatus may also select other binarization methods to perform binarization processing on the saliency map.
- Step S1022 Count the areas of the connected areas to determine the connected area with the largest area and the connected area with the second largest area;
- the image processing device may also perform an operation on the saliency map after the binarization process, that is, the operation of corroding and then expanding the saliency map after the binarization process To remove small defects between the connected areas, for example, to remove the connected parts between the connected areas.
- each connected region in the saliency map after the opening operation is marked.
- the saliency map after the opening operation includes For the 5 connected areas, you can mark the 5 connected areas as 0, 1, 2, 3, and 4 by serial number to better correspond the serial number to the area.
- Step S1023 Determine the target subject according to the area of the connected area with the largest area and the area of the connected area with the second largest area.
- the image processing device when determining the target subject based on the area of the largest connected area and the area of the second largest connected area, the image processing device needs to first calculate the area of the second largest connected area and the area of the largest connected area Proportion (that is, the area of the second largest connected area/the area of the largest connected area). Next, compare the area ratio with the preset ratio threshold, and when the ratio ⁇ the preset ratio threshold, the connected area with the largest area and the connected area with the second largest area are determined as the target subject area. At this time, it is considered that the saliency map includes two subjects (that is, the subject corresponding to the largest connected area and the subject corresponding to the second largest connected area), and the target subject includes these two subjects.
- the connected area with the largest area is determined as the area of the target subject. At this time, it is considered that there is only one subject in the saliency map (ie, the subject corresponding to the connected area with the largest area), and the other connected areas It may be interference. Finally, according to the target subject's area, determine the target subject.
- the size of the preset ratio threshold can be set according to requirements.
- 25% ⁇ preset ratio threshold for example, 30%, 35%, 40%, etc.
- steps S1022 and S1023 can be replaced by: counting the area of each connected area, based on the area of the largest connected area and the area of other connected areas (except for the largest connected area) To determine the target subject. Specifically, for each connected region other than the largest connected region, calculate the ratio of the area of the connected region to the largest connected region (that is, the area of the connected region/the area of the largest connected region) , And then compare each ratio with the preset ratio threshold, determine the connected area whose ratio is greater than or equal to the preset ratio threshold as the target subject area, and connect the proportion smaller than the preset ratio threshold The area is determined as the area of the non-target subject. In an alternative embodiment, subjects that may have 3 or more subjects in the saliency map are determined as target subjects.
- the image processing apparatus determines the target subject according to the area of the target subject, it specifically includes determining the center of gravity position, width, and height of the target subject according to the area of the target subject.
- the width and height of the target body constitute the size of the target body.
- a binary method is used to determine the position, width and height of the target subject, so that the target subject found is more accurate; at the same time, the number of subsequent traversals is greatly reduced and the amount of calculation is reduced.
- Step S103 Determine the composition area in the original image according to the target subject and preset rules
- step S103 when the image processing apparatus implements step S103, it may include steps S1031 to S1035.
- step S1031 determine the initial target frame according to the center of gravity position, width and height of the target body;
- step S1031 the image processing apparatus determines the center of gravity of the target subject as the center position of the initial target frame, and determines the width of the initial target frame according to the width of the target subject and the first preset scale factor, and the height of the target subject And the second preset scale factor to determine the height of the initial target frame.
- the part of the target subject in the width direction is usually kept as much as possible, so the size of the first preset coefficient is set to be greater than or equal to 1.
- the part of the target subject in the high direction can be partially cut off or completely retained, depending on the need.
- the size of the second preset coefficient is less than 1.
- the second preset coefficient is greater than or equal to 1.
- the size of the first preset coefficient and the second preset coefficient are both greater than or equal to 1 as an example to describe the initial target frame.
- the initial target frame is the area of the target body, and the size of the initial target frame is (w 0 , h 0 ).
- the first preset proportional coefficient>1, and the second preset coefficient 1.
- the width of the initial target frame is larger than the width w 0 of the target body, but the height of the initial target frame is still the height h 0 of the target body.
- the first preset proportional coefficient 1 and the second preset coefficient>1.
- the width of the initial target frame is still the width w 0 of the target body, but the height of the initial target frame is greater than the height h 0 of the target body.
- the width of the initial target frame is greater than the width w 0 of the target body, and the height of the initial target frame is also greater than the height h 0 of the target body.
- the image processing device determines the width w of the initial target frame according to the preset aspect ratio and the height of the initial target frame. For example, if the preset aspect ratio is M:N, the width of the initial target frame Optionally, the height h of the initial target frame is h 0 . Optionally, M:N is 16:9, 7:5, 5:4, 5:3, 4:3, 3:2, 1:1.
- the reason for choosing the height preservation is that the target aspect ratio of the shooting is usually less than 16:9, and the width of the initial target frame is mapped based on 16:9 w will be larger than the width w 0 of the target body, so that the target body will not be cut off in the width direction at the beginning.
- the height h of the initial target frame may also be 0.7, 0.8 , 0.9, 1.1, 1.2, or 1.3 times of h 0. Specifically, the height h of the initial target frame may be set as needed.
- Step S1032 Change the size of the initial target frame according to the first preset step size and the first preset number of steps to obtain multiple target frames;
- the first preset step length is used to characterize the size parameter of the size of the initial target frame each time the size is changed, and the first preset step number is used to characterize the number of times the size of the initial target frame is changed.
- the image processing apparatus simultaneously increases the width and height of the initial target frame according to the first preset step size and the first preset step number, and the target frame includes the frame obtained each time the size is increased.
- the width and height of the initial target frame are synchronously increased according to the gradually increasing width and height of the initial target frame, which may include two implementation methods: as a feasible implementation method, each time the width of the initial target frame is adjusted The height and height are increased, and the step size for each size increase is based on the first preset step size and the current number of times to increase the size (the first time to increase the size, the current number of times to increase the size is 1, the second When increasing the size, the current number of times to increase the size is 2, and so on.
- the first time is to increase the height and width of the initial target frame at the same time as the first preset step*1
- the second time is to increase the height and width of the initial target frame at the same time as the first preset step*2 Until the last time the height and width of the initial target frame are increased by the first preset step * the first preset number of steps.
- the current size increase frame is the previous size increase obtaining frame
- the step size for each size increase is the first preset step size.
- the first time is to increase the height and width of the initial target frame at the same time by the first preset step size * 1 to obtain the first frame, the size of the first frame is (w + first preset step size * 1, h+first preset step*1); the second time is to increase the size of the first preset step*1 to the width and height of the first frame at the same time to obtain the second frame, the size of the second frame is (w+ ⁇ One preset step*2, h+first preset step*2), and so on, until the last time the width and height of the penultimate frame are increased by the first preset step*1 at the same time.
- the image processing apparatus simultaneously reduces the width and height of the initial target frame according to the first preset step size and the first preset step number, and the target frame includes the frame obtained after each size reduction.
- the width and height of the initial target frame are increased synchronously in accordance with the gradual decrease of the width and height of the initial target frame, which may include two implementation methods: as a feasible implementation method, each time the width of the initial target frame is adjusted And the height is reduced and the step size for each size reduction is based on the first preset step and the current number of size reductions (the first time the size reduction is performed, the current number of size reductions is 1, When the size is reduced for the second time, the current number of times of size reduction is 2, and so on.
- the first time is to simultaneously reduce the height and width of the initial target frame by the size of the first preset step*1
- the second time is to reduce the height and width of the initial target frame by the first preset step synchronously The size of *2, and so on, until the last time the height and width of the initial target frame are simultaneously reduced by the first preset step * the size of the first preset step.
- the current size reduction frame is the previous size reduction obtaining frame
- the step size for each size reduction is the first preset step size.
- the height and width of the initial target frame are simultaneously reduced by the size of the first preset step*1 to obtain the first frame, and the size of the first frame is (w+first preset step*1 , H + the first preset step *1); the second time is to reduce the size of the first preset step *1 synchronously with the width and height of the first frame to obtain the second frame, the size of the second frame is ( w+first preset step*2, h+first preset step*2), and so on, until the last time the width and height of the penultimate frame are reduced by the first preset step*1 size.
- the image processing apparatus simultaneously increases the width and height of the initial target frame according to the first preset step and the first preset step, and according to the first preset step and the first preset step, Simultaneously reduce the width and height of the initial target frame.
- the target frame includes the frame obtained each time the size is increased and the frame obtained each time the size is reduced.
- the image processing device increases the width and height of the initial target frame synchronously according to the first preset step and the first preset step, and the image processing device decreases synchronously according to the first preset step and the first preset step
- the realization of the width and height of the small initial target frame is similar to the above embodiment, and will not be repeated here.
- the first preset step is denoted as stride1
- the first preset step is denoted as steps1
- the width and height of the initial target frame are changed from -stride1*steps1 to stride1*steps1, respectively, to obtain multiple different sizes.
- Target box the first preset step is denoted as stride1
- steps1 the first preset step is denoted as steps1
- the width and height of the initial target frame are changed from -stride1*steps1 to stride1*steps1, respectively, to obtain multiple different sizes.
- the initial target frame is the initial target frame determined according to S1031, and the size of the initial target frame is (w, h).
- the target frame in this embodiment includes (w-stride1*steps1, h-stride1*steps1) To (w+stride1*steps1, h+stride1*steps1).
- the initial target frame includes not only the initial target frame determined according to S1031, but also one or more first target frames obtained by resizing the initial target frame determined in S1031.
- the image processing device after determining the initial target frame according to the position, width and height of the center of gravity of the target body, changes the size of the initial target frame according to the first preset step and the first preset number of steps to obtain multiple Before the target frame, it is necessary to increase the width and height of the initial target frame synchronously according to the third preset step stride3 to obtain the first target frame until the width of the first target frame is the preset multiple of the initial target frame, and The height of the first target frame is a preset multiple of the initial target frame.
- the image processing device changes the size of the initial target frame and the first target frame obtained each time according to the first preset step size and the first preset step number to obtain multiple target frames.
- the preset multiple >1, such as 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, etc., the preset multiple can be set according to specific needs.
- the first target frame in this embodiment may include one or more, and the number of the first target frame is determined according to stride3 and a preset multiple.
- the preset multiple is set to 1.7 because of composition considerations. From the perspective of three-point composition, it is understood that the area of the target subject determined in step S102 is just pulled out, and the composition is not beautiful. In the final target image, when the height of the target subject occupies 2/3 of the image height, it will be relatively beautiful, so setting the preset multiple to 1.7 can make the reserved space around the target subject in the target image give a better vision buffer.
- Step S1033 Traverse all target frames to obtain feature information of each target frame
- the characteristic information of the target frame may include the energy of all pixels in the target frame and/or the average gradient of pixels on at least one side of the target frame. It can be understood that the feature information of the target frame is not limited to the energy of all pixels in the target frame, the average gradient of pixels on at least one side, and may also include other feature information of the target frame.
- the energy sum of all pixels in the target frame is the energy sum of pixels in the corresponding area of the target frame in the saliency map.
- the energy sum of all pixels in the target frame is directly determined according to the energy sum of each pixel in the corresponding area of the target frame in the saliency map.
- the image processing device needs to statistically determine the average value ⁇ and variance ⁇ of all pixels of the saliency map, and determine each pixel point of the saliency map according to the average value ⁇ and variance ⁇ Energy, reducing the amount of calculation in the traversal process.
- the image processing device determines the energy of each pixel in the saliency map according to the average and variance, specifically setting the energy of the pixel less than the sum of the average and variance ( ⁇ + ⁇ ) in the saliency map to 0 , And the energy of pixels greater than or equal to the sum of average and variance ( ⁇ + ⁇ ) is set to the value of the original saliency map.
- the average gradient of pixels of at least one side of the target frame is the average gradient of pixels of at least one side of the corresponding area of the target frame in the original image.
- the feature information of the target frame includes an average gradient of pixels on all four sides of the target frame.
- the feature information of the target frame includes an average gradient of pixels on three sides of the target frame.
- the feature information of the target frame includes an average gradient of pixels on both sides of the target frame.
- the feature information of the target frame includes the average gradient of pixels on one side of the target frame.
- the feature information of the target frame includes the average gradient of pixels on three sides of the target frame, the average gradient of pixels on both sides of the target frame or the average gradient of pixels on one side of the target frame
- the feature information of all target frames corresponds to all target frames
- the average gradient of pixels on a side is taken as an example.
- the feature information of the target frame includes the average gradient of pixels on a single side.
- the average gradient of pixels on the upper, lower, left, or right sides of all target frames is obtained, and all pixels on the single side are taken. The sum of the gradients of is divided by the number of pixels on that side.
- the top, bottom, left, or right of the target frame corresponds to the up, down, left, and right directions of the original image.
- the upper and lower sides are the long sides of the target frame, that is, the sides of the target frame along the width direction.
- the left and right sides are the wide sides of the target frame, that is, the sides of the target frame along the height direction.
- Step S1034 Determine the area to be composed according to the feature information of all target frames.
- step S1034 specifically includes: determining the image region corresponding to the target frame whose feature information meets the preset rule to be composed.
- this embodiment calculates the energy and/or/and The average gradient of pixels on at least one side of the target frame is limited, so that the boundary of the target frame is more concise, thereby obtaining a more concise composition of the space (ie, the target image).
- Step S1034 when specifically implemented, compares the energy sum of all pixels in the target frame with the energy sum of all pixels in other target frames for each target frame, and/or at least one side of the target frame
- the average pixel gradient is compared with the average pixel gradient of at least one side of each other target frame. If the energy sum of all pixels in the current target frame is greater than the energy sum of all pixels in other target frames, and/or the average gradient of pixels on at least one side of the current target frame is less than the average gradient of pixels on at least one side of other target frames ,
- the image area corresponding to the current target frame is determined as the area to be composed.
- the objective function used in this embodiment simultaneously considers maximizing the energy sum and minimizing the average gradient, that is, by applying the constraints of the maximum energy and the minimum average gradient to the final crop, the aesthetics and the integrity of the target subject are better to be composed area.
- step S1034 specifically includes: according to each target The feature information of the frame and the first preset strategy score all the target frames, and determine the target frame with the highest score as the area to be framed, so as to obtain the area to be framed with better aesthetics.
- scoring all target frames according to the characteristic information of each target frame and the first preset strategy specifically includes: for each target frame, determining the first according to the energy sum of all pixels in the target frame Score, and determine the second score according to the average gradient of pixels on at least one side of the target frame, and then determine the score of the target frame according to the first score and the second score.
- the first score of the target frame is based on the energy of all pixels in the target frame and the determined value, for example, the energy sum of all pixels in the target frame is substituted by the energy of all pixels in the target frame The sum is a function of the independent variable, and the first score of the target box can be obtained.
- the second score of the target frame is a value determined based on the average gradient of pixels of at least one side of the target frame, and the average gradient of pixels of at least one side of the target frame is substituted into the function taking the average gradient of pixels of at least one side as an independent variable, Get the second score of the target box.
- the score of the target frame is the sum obtained by directly summing the first score of the target frame and the second score of the target frame.
- the score of the target frame is the sum of the weighted sum of the first score of the target frame and the second score of the target frame, wherein the weight of the first score and the value of the second score
- the weights are preset. According to the scene of the preset image, the priority feature information can be determined, and the score weight corresponding to the priority feature information can be designed to be larger. For example, if the energy sum of all pixels in the target frame is prioritized, the first The weight of the score is designed to be greater than the weight of the second score.
- scoring all target frames according to the characteristic information of each target frame and the first preset strategy specifically includes: for each target frame, according to the energy sum of all pixels in the target frame, the target The average gradient of pixels on at least one side of the frame and a preset function determine the score of the target frame.
- the independent variables of the preset function include the energy sum of all pixels in the target frame and the average gradient of pixels on at least one side of the target frame.
- Step S1035 Determine the composition area according to the area to be composed.
- the area to be composed is the best composition
- the composition area is the area to be composed.
- Step S1035 when specifically implemented, specifically includes the following steps: determining the position of the center of gravity of the target frame corresponding to the area to be framed, and changing the target frame corresponding to the area to be framed according to the second preset step stride2 and the second preset step number steps2 Height, get multiple new target boxes. Then, traverse all the new target frames to obtain the feature information of each new target frame, and then determine the new area to be composed according to the feature information of all the new target frames, and determine the new area to be composed as the composition area.
- This implementation fixes the width of the area to be framed and the position of the abscissa of the area to be framed, and adjusts the height of the area to be framed, so that the edges of the final framed area are better and neat and the position of the target subject will be closer to the position of the third point.
- determining the new area to be framed based on the feature information of all new target frames specifically includes: determining the new target frame whose feature information meets the preset rules based on the feature information of all new target frames, And the image area corresponding to the new target frame whose feature information meets the preset rules is determined as the new composition area.
- the height of the target frame corresponding to the area to be framed is changed to obtain multiple new target frames.
- the implementation principle is the same as the first preset in the above embodiment
- the step size is similar to the first preset step number and the principle of changing the height of the initial target frame, which will not be repeated here.
- the feature information of the new target frame includes the energy of all pixels in the new target frame and/or the average gradient of pixels on at least one side of the new target frame, and may also include other feature information.
- the average gradient of pixels on at least one side of the new target frame includes at least the average gradient of pixels on one wide side of the new target frame.
- the average gradient of pixels on at least one side of the new target frame is the average gradient of pixels on the two wide sides of the new target frame.
- the feature information of the new target frame includes an average pixel gradient of at least one side of the new target frame, the at least one side includes at least one wide side, traverses all the new target frames, and obtains at least one new target frame
- the implementation principle of the average gradient of the pixels on the sides is similar to the implementation principle of traversing all the target frames in the above embodiment to obtain the average gradient of the pixels on at least one side of each target frame, which will not be repeated here.
- the average gradient of pixels on at least one side of the new target frame is compared with the average gradient of pixels on at least one side of each new target frame.
- the image area corresponding to the current new target frame is determined as the area to be framed, and only the minimum is considered
- the average gradient of the pixels on at least one side is the objective function, and the height of the area to be framed is adjusted. In the final composition area, the space of the target subject in the height direction is more tidy.
- the center of gravity (x 1 , y 1 ) of the area to be framed is used as the initial center, and only the vertical coordinate (ie height) is changed.
- all new target frames are scored according to the feature information of each new target frame and the second preset strategy, and the image area corresponding to the new target frame with the highest score is determined as a new pending Composition area.
- scoring all the new target frames according to the characteristic information of each new target frame and the second preset strategy specifically includes: for each new target frame, according to all pixels in the new target frame The energy sum of the points determines the third score, and determines the fourth score based on the average gradient of pixels on at least one side of the new target frame, and then determines the new target frame based on the third score and the fourth score Score.
- the third score of the new target frame is based on the energy and determined values of all pixels in the new target frame, for example, the energy sum of all pixels in the new target frame is substituted by the new
- the energy sum of all pixels in the target frame is a function of the independent variable, and the third score of the new target frame can be obtained.
- the fourth score of the new target frame is the value determined based on the average gradient of pixels on at least one side of the new target frame, and the average gradient of pixels on at least one side of the new target frame is substituted by the average gradient of pixels on at least one side as The function of the variable can obtain the fourth score of the new target box.
- the score of the new target frame is the sum of the third score of the new target frame and the fourth score of the new target frame.
- the score of the new target frame is the sum of the third score of the new target frame and the fourth score of the new target frame by weighted summation, wherein the weight of the third score is The weight of the fourth score is preset.
- the priority feature information can be determined, and the score weight corresponding to the priority feature information can be designed to be larger. For example, if the energy sum of all pixels in the new target frame is prioritized, then the The weight of the third score is designed to be greater than the weight of the fourth score.
- scoring all the new target frames specifically includes: for each new target frame, according to the new target frame The energy sum of the pixels, the average gradient of pixels on at least one side of the new target frame and a preset function determine the score of the new target frame.
- the independent variables of the preset function include the energy sum of all pixels in the new target frame and the average gradient of pixels on at least one side of the new target frame.
- Step S104 Determine the composition area as the target image.
- the portion of the original image other than the composition area (the composition area determined in step S103) is cut out, and the obtained image is the target image.
- an original image with poor original composition can be input, and a target image with a clear subject, neat edges, and a relatively good composition of the target subject position close to three points in the figure can be output, thereby improving the visual quality of the image .
- the visual interest area in the original image is detected based on the saliency detection method to obtain the saliency map corresponding to the original image, and then the target subject is determined based on the saliency map to effectively eliminate the interference of the messy background. According to the determined target subject and preset rules, find the best composition area, so as to obtain a target image with better composition.
- an embodiment of the present invention further provides an image processing device.
- 7 is a structural block diagram of an image processing apparatus according to an embodiment of the present invention.
- the image processing device may include a storage device and a processor.
- the storage device is used to store program instructions.
- the processor calls the program instructions stored in the storage device.
- the program instructions When executed, it is used to perform saliency detection on the original image, obtain a saliency map, and determine the target subject based on the saliency map. According to the target subject and preset rules, The composition area is determined in the original image, and the composition area is determined as the target image.
- the processor may implement the corresponding method as shown in the embodiments of FIG. 1, FIG. 4 to FIG. 6 of the present invention.
- the processor may implement the corresponding method as shown in the embodiments of FIG. 1, FIG. 4 to FIG. 6 of the present invention.
- reference may be made to the image processing method in Embodiment 1 above to describe the image processing apparatus in this embodiment, and details are not described here. .
- the storage device may include volatile memory (volatile memory), such as random-access memory (RAM); the storage device may also include non-volatile memory (non-volatile memory) ), such as flash memory (flash memory), hard disk (hard disk drive), or solid-state drive (SSD); the storage device may also include a combination of the aforementioned types of memory.
- volatile memory volatile memory
- non-volatile memory non-volatile memory
- flash memory flash memory
- hard disk hard disk drive
- SSD solid-state drive
- the storage device may also include a combination of the aforementioned types of memory.
- the processor may be a central processing unit (central processing unit, CPU).
- the processor may further include a hardware chip.
- the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
- the PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
- an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the image processing method of the foregoing embodiment.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
一种图像处理方法和装置,所述方法包括:对原始图像进行显著性检测,获得显著图;基于所述显著图,确定目标主体;根据所述目标主体和预设规则,在所述原始图像中确定构图区域;将所述构图区域确定为目标图像。本发明先基于显著性检测方法检测出原始图像中视觉感兴趣的区域,获得原始图像对应的显著图,然后基于显著图确定目标主体,有效排除杂乱背景的干扰,再根据所确定的目标主体和预设规则,寻找最佳的构图区域,从而获得构图较佳的目标图像。
Description
本发明涉及图像处理领域,尤其涉及一种图像处理方法和装置。
视觉显著性,即面对一个场景时,人类自动地对感兴趣区域进行处理而选择性地忽略不感兴趣区域,这些人类感兴趣区域被称之为显著性区域。
图像裁剪是图像编辑中的一项重要任务,用于提高图像的美学质量,主要目标是改善图像的构图,例如通过强调感兴趣的对象,去除不想要的区域,以及获得更好的色彩平衡。在摄影中,许多规则如三分法则、视觉平衡法则和对角优势法则等都被明确地定义为用于创作构图良好的图像。一种自动图像裁剪的方法则可以帮助新手摄影师和普通人士提供美观的裁剪建议,并帮助用户节省大量的时间。
现有自动构图方法主要包括两类,一类是对原始图像进行显著性检测,再利用显著性检测的结果来计算显著性能量,最后仅依靠将原始图像过度分割再通过约束条件确定最佳的构图区域,这种方式很难找到精确的主体,甚至有可能引入不需要的干扰,难以得到美观的构图。另一类基于学习方法,试图从大量的训练集自动学习构图规则或裁剪图像的得分。在训练集中对图像进行过分割,对每个分割的裁剪图打一个相应的得分,以此作为该裁剪图的标签来训练模型。这种方法避免了手工设计构图规则的问题,可以实现端到端的方案。基于学习进行构图的方法可能会因为缺乏训练数据,而导致最终学习到模型的裁剪效果不好。
发明内容
本发明提供一种图像处理方法和装置。
具体地,本发明是通过如下技术方案实现的:
根据本发明的第一方面,提供一种图像处理方法,所述方法包括:
对原始图像进行显著性检测,获得显著图;
基于所述显著图,确定目标主体;
根据所述目标主体和预设规则,在所述原始图像中确定构图区域;
将所述构图区域确定为目标图像。
根据本发明的第二方面,提供一种图像处理装置,所述图像处理装置包括:
存储装置,用于存储程序指令;
处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,用于:
对原始图像进行显著性检测,获得显著图;
基于所述显著图,确定目标主体;
根据所述目标主体和预设规则,在所述原始图像中确定构图区域;
将所述构图区域确定为目标图像。
根据本发明的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现第一方面所述的图像处理方法的步骤。
由以上本发明实施例提供的技术方案可见,先基于显著性检测方法检测出原始图像中视觉感兴趣的区域,获得原始图像对应的显著图,然后基于显著图确定目标主体,有效排除杂乱背景的干扰,再根据所确定的目标主体和预设规则,寻找最佳的构图区域,从而获得构图较佳的目标图像。
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一实施例中的图像处理方法的方法流程图;
图2是本发明一实施例中的图像处理方法的一种使用场景图;
图3是本发明一实施例中的图像处理方法的另一种使用场景图;
图4是图1所示实施例中的图像处理方法的一种具体方法流程图;
图5是图1所示实施例中的图像处理方法的另一种具体方法流程图;
图6是图1所示实施例中的图像处理方法的又一种具体方法流程图;
图7是本发明一实施例中的图像处理装置的结构框图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
现有的基于显著性检测的自动构图方法,首先视觉显著性不一定准确,进而导致目标主体检测不准;其次,单纯利用显著性检测的结果来计算显著性能量,而目标主体不明确,最后仅依靠将原始图像过度分割再通过约束条件很难找到精确的主体,同时会大大加大遍历的范围,增加计算量,甚至有可能引入不需要的干扰。
因此,本发明实施例提出的一种图像处理方法和装置,首先基于显著性检测能检测出视觉感兴趣的区域,然后在此基础上使用基于显著性分布的方法确定目标主体,有效排除杂乱背景的干扰,最后考虑目标主体在最终裁剪图中的美观性,对搜索范围作了限制;再设计约束条件,以此来寻找最佳的构图。
下面结合附图,对本发明的图像处理方法和装置进行详细说明。在不冲突的情况下,下述的实施例及实施方式中的特征可以相互组合。
图1是本发明一实施例中的图像处理方法的方法流程图。如图1所示,所述图像处理方法可包括如下步骤:
步骤S101:对原始图像进行显著性检测,获得显著图;
本发明实施例的图像处理方法的执行主体为图像处理装置。
原始图像可包括图像处理装置实时获取的图像和/或图像处理装置的本地图像。例如,在一实施例中,参见图2,图像处理装置与拍摄装置通信,原始图像可为拍摄装置实时采集的图像。在另一实施例中,图像处理装置为拍摄装置的一部分,参见图3,拍摄装置还可包括图像传感器,图像传感器与图像处理装置通信,原始图像可为图像传感器实时采集的图像。在又一实施例中,原始图像为图像处理装置的本地图像,该本地图像为预先存储在图像处理装置中的图像。
在一些实施例中,图像处理装置获取到原始图像后,直接对该原始图像进行显著性检测,从而获得显著图。
而在另一些实施例中,图像处理装置获取到原始图像后,对该原始图像进行初步处理,再对初步处理获得的图像进行显著性检测,从而获得显著图。例如,可选的,图像处理装置在实现步骤S101之前,还需将原始图像的颜色空间转换成特定颜色空间,如将原始图像转换至Lab颜色空间,从而使得转换后的图像更接近人眼的视觉感受。可以理解的是,特定颜色空间还可以为RGB颜色空间或YUV颜色空间。可选的,图像处理装置在实现步骤S101之前,还需将原始图像的大小调整为预设大小,以满足图像处理的需求。如原始图像的像素大小为4000*3000,可将原始图像的像素大小调整至480*360,以减少后续的计算量。
可采用不同的显著性检测方法对原始图像进行显著性检测,具体到本实施例中,参见图4,步骤S101的实现过程可包括步骤S1011~步骤S1013。
具体而言,步骤S1011:对每一颜色通道进行至少两层金字塔分解;
图像处理装置在获取到原始图像后,若未对原始图像的颜色空间进行转换,步骤S1011中的颜色通道即为原始图像的颜色空间对应的颜色通道。若图像处理装置在获取到原始图像后,对原始图像的颜色空间进行了转换,步骤S1011中的颜色通道即为原始图像的颜色空间进行转换后获得的图像的颜色空间所对应的颜色通道。
可选的,在一些实施例中,颜色通道包括Lab颜色空间对应的三色通道。在另一些实施例中,颜色通道包括RGB颜色空间对应的三色通道。在又一些实施例中,颜色通道包括YUV颜色空间对应的三色通道。
可以理解的是,在进行显著性检测的过程中,图像处理装置可对每一颜色通道进行金字塔分解的层数可为2、3、4、5、6、7、8、9甚至更多,具体可根据需要选择。
步骤S1012:确定每层金字塔的第一显著图;
在一实施例中,确定每层金字塔的第一显著图可包括但不限于如下几个步骤:
(1)针对每层金字塔,对该层金字塔中每一颜色通道的图像进行超像素分割,获得该层金字塔中每一颜色通道的超像素块;
可采用超像素分割算法如slic算法或其他算法对每层金字塔中每一颜色通道的图像进行超像素分割。
(2)针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素的显著性响应值;
在一实施例中,针对每层金字塔中每一颜色通道的每一超像素块,统计该超像素块的直方图,并确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异。并且,针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块的第一融合权重,以及根据该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异以及该超像素块的第一融合权重,确定该像素块的显著性响应值。
可选的,针对每层金字塔中每一颜色通道的每一超像素块,根据该超像素块的直方图的每一柱状条的高度、该层金字塔中该颜色通道的其他超像素块的直方图的每一柱状条的高度以及第一预设参数,确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异。其中,每一柱状条的高度用于表征特定像素值范围内像素的个数。
可选的,第一融合权重的确定过程包括:针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离,并根据该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离以及第二预设系数,确定该超像素块的第一融合权重。其中,第二预设系数可根据需要设置。另外,可采用不同的方式来确定每层金字塔中每一颜色通道的每一超像素块之间的距离。在其中一个实施例中,每层金字塔中每一颜色通道的每一超像素块之间的距离为每层金字塔中每一颜色通道的每一超像素块之间的欧氏距离,可根据各个超像素块在图像坐标系中特定位置(如中心)的坐标来计算各个超像素块之间的欧氏距离。可以理解,也可采用马氏距离或其他距离计算方式来确定每层金字塔中每一颜色通道的每一超像素块之间的距离。该步骤中,该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离即为该超像素块与该层金字塔中该颜色通道的其他超像素块之间的欧氏距离。
其中,n为正整数,i,j为自然数,且i≤n-1,j≤n-1;
dist_coord
(i,j)为第i个超像素块与第j个超像素块之间的欧氏距离;
dist_hist
(i,j)为第i个超像素块的直方图与第j个超像素块的直方图之间的差异;
由公式(1)可以确定出,dist_coord
(i,j)越大,第j个超像素块对第i个超像素块的显著性响应值贡献越小,dist_hist
(i,j)越大,第j个超像素块对第i个超像素块的显著性响应值贡献越大。并且,global_color_diff
i等于第i个超像素块和其他所有超像素块(该像素块所在层金字塔的该颜色通道中其他超像素块)的直方图差异的加权平均,直方图差异由dist_hist表示,加权平均由exp(-dist_coord)表示。
可以理解的是,也可根据需要对公式(1)进行适应性调整。
第i个超像素块的直方图与第j个超像素块的直方图之间的差异dist_hist
(i,j)的计算公式如下:
公式(2)中,m为正整数,k为自然数,用于表征柱状条的序号,且k≤m-1;
ω
k为第一预设参数,也为经验参数,可选的,ω
k与k相关,即可对不同的柱状条设置不同的权重;
hist
i[k]为第i个超像素块的直方图的第k个柱状条的高度;
hist
j[k]为第j个超像素块的直方图的第k个柱状条的高度。
可以理解的是,也可根据需要对公式(2)进行适应性调整。
第i个超像素块与第j个超像素块的之间的欧氏距离dist_coord
(i,j)的计算公式如下:
dist_coord
(i,j)=((center_x
i-center_x
j)
2+(center_y
i-center_y
j)
2)
1/2 (3)
公式(3)中,(center_x
i,center_y
i)为第i个超像素块的中心坐标或重心坐标;
相应的,(center_x
j,center_y
j)为第j个超像素块的直方图的中心坐标或重心坐标。
其中,超像素块的中心坐标为该超像素块内所有像素点的x或y坐标直接相加的和值除以该超像素块的总像素数目。超像素块的重心坐标为该超像素块内所有像素点的x坐标乘该像素点的显著性值或y坐标乘以该像素点的显著性值后相加的和值除以该超像素块的总像素数目。
可以理解的是,如使用其它的距离形式来确定第i个超像素块与第j个超像素块之间的距离,那么需要对公式(3)进行适用性调整。
(3)针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,确定该颜色通道的第二显著图;
在一实施例中,该步骤具体包括:针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,对每一超像素的显著性响应值进行归一化,并根据归一化后的每一超像素的显著性响应值,确定该颜色通道的第二显著图。
可选的,第二显著图的计算公式如下:
其中,公式(4)中,max_global_diff为第i个超像素块所在的颜色通道的所有超像素块的显著性响应值中的最大值,
min_global_diff为第i个超像素块所在的颜色通道的所有超像素块的显著性响应值中的最小值。
可以理解的是,也可根据需要对公式(4)进行适应性调整。
(4)针对每层金字塔,根据该层金字塔所有颜色通道的第二显著图,确定该层金字塔的第一显著图。
本实施例中,在获得每层金字塔所有颜色通道的第二显著图后,对该层金字塔所有颜色通道的第二显著图直接拼接即可获得该层金字塔的第一显著图。比如,每层金字塔包括L颜色通道的第二显著图、a颜色通道的第二显著图和b颜色通道的第二显著图,将L颜色通道的第二显著图、a颜色通道的第二显著图和b颜色通道的第二显著图直接拼接,即可获得该层金字塔的第一显著图。
步骤S1013:对至少两层金字塔的第一显著图进行融合,获得显著图。
例如,对于3层金字塔,则能够获得3幅第一显著图,再对这3幅第一显著图 进行融合,即可获得原始图像的显著图。
可选的,基于金字塔融合算法对至少两层金字塔的第一显著图进行融合,获得显著图。可以理解,也可基于其他图像融合算法,对至少两层金字塔的第一显著图进行融合,从而获得显著图。
本实施例中,图像处理装置根据预设的每层金字塔的第二融合权重,对至少两层金字塔的第一显著图进行融合,获得显著图。其中,第二融合权重的大小可根据需要设置。
本实施例将每一颜色通道进行至少两层金字塔分解,再基于超像素方式确定每层金字塔的第一显著图,最终再将所有第一显著图进行加权融合,获得的显著图不存在明显的块效应,便于后续进行目标主体的确定。
由于采用了对原始图像先进行建立金字塔分解,再对各层金字塔进行显著性检测,相当于对多尺度图像进行显著性检测,小尺度图像可以获得轮廓的显著性,大尺度图像可以获得图像细节的显著性,最终融合各层金字塔,相当于融合了轮廓和细节的显著性,使得显著性检测的效果更佳。
步骤S102:基于显著图,确定目标主体;
具体的,参见图5,图像处理装置在基于显著图,确定目标主体时,可包括步骤S1021~步骤S1023。
具体而言,步骤S1021:对显著图进行二值化处理,获得多个连通区域;
本实施例中,图像处理装置在对显著图进行二值化处理,获得多个连通区域时,首先,基于预设算法对显著图进行分割,确定分割阈值。接着,基于分割阈值,对显著图进行二值化处理。其中,预设算法可为ostu算法,也可为其他图像分割算法。
可选的,图像处理装置在对显著图进行分割时,先基于预设算法分割显著图的前景和背景,确定第一阈值,再根据第一阈值,确定分割阈值。以预设算法为ostu算法为例,第一阈值为图像处理装置基于ostu算法获取分割显著图的前景和背景的最佳阈值auto_thresh。
进一步的,图像处理装置在根据第一阈值,确定分割阈值时,根据第一阈值和预设阈值之和,确定分割阈值。本实施例中,分割阈值为第一阈值和预设阈值之和。其中,该预设阈值的大小可根据需要设置,预设阈值可为0.2,还可为0.15、0.16、0.17、 0.18、0.19、0.21、0.22、0.23、0.24、0.25、0.26、0.27、0.28、0.29、0.30、0.31、0.32、0.33、0.34、0.35、0.36、0.37、0.38、0.39、0.40。
可以理解的是,图像处理装置还可选择其他二值化方式对显著图进行二值化处理。
步骤S1022:统计各连通区域的面积,确定面积最大的连通区域和面积次大的连通区域;
可选的,图像处理装置在实现完步骤S1021之后,实现步骤S1022之前,还会对二值化处理后的显著图进行开操作,即对二值化处理后的显著图先腐蚀后膨胀的操作,去除各个连通区域之间小的毛疵,例如,去除各个连通区域之间连通的部分。
进一步的,本实施例的图像处理装置在对二值化处理后的显著图进行开操作之后,会对开操作之后的显著图中的各个连通区域进行标记,例如,开操作之后的显著图包括5个连通区域,则可以序号方式将这5个连通区域分别标记为0、1、2、3、4,以更好的将序号与面积对应。
步骤S1023:根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体。
连通区域面积越大,表明该连通区域为目标主体的区域的可能性越大,从而可依据面积最大的连通区域的面积和面积次大的连通区域的面积来确定目标主体,确定的目标主体的精确度高。并且,目标主体清晰确定后,有利于后续构图,能够避免目标主体被裁剪掉。
具体的,图像处理装置在根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体时,需先计算面积次大的连通区域的面积与面积最大的连通区域的面积的占比(即面积次大的连通区域的面积/面积最大的连通区域的面积)。接着,将该面积占比与预设占比阈值进行比较,当该占比≥预设占比阈值时,将面积最大的连通区域和面积次大的连通区域均确定为目标主体的区域,此时,认为显著图中包括两个主体(即面积最大的连通区域对应的主体和面积次大的连通区域对应的主体),目标主体包括这两个主体。当该占比<预设占比阈值时,将面积最大的连通区域确定为目标主体的区域,此时,认为显著图中只有一个主体(即面积最大的连通区域对应的主体),其他连通域可能是干扰。最后,根据目标主体的区域,确定目标主体。
预设占比阈值的大小可根据需要设置,可选的,25%<预设占比阈值<50%,如, 30%、35%、40%等等。
在一可替代实施例中,步骤S1022和S1023可被替换成:统计各连通区域的面积,根据面积最大的连通区域的面积和其他连通区域(除面积最大的连通区域外的连通区域)的面积,确定目标主体。具体的,针对每一除面积最大的连通区域外的连通区域,计算该连通区域的面积与面积最大的连通区域的面积的占比(即该连通区域的面积/面积最大的连通区域的面积),再将每一占比与预设占比阈值进行比较,将占比大于或等于预设占比阈值的连通区域均确定为目标主体的区域,而将占比小于预设占比阈值的连通区域确定为非目标主体的区域。在替代实施例中,显著图中可能具有3个及以上的主体被确定为目标主体。
另外,图像处理装置在根据目标主体的区域,确定目标主体时,具体包括:根据目标主体的区域,确定目标主体的重心位置、宽度以及高度。其中,目标主体的宽度和高度构成目标主体的尺寸。可选的,将该步骤确定的目标主体的重心位置标记为(x
0,y
0),目标主体的尺寸标记为(w
0,h
0),其中,w
0为目标主体的宽度,h
0为目标主体的高度。
在显著性检测的基础上,使用了二值化的方法来确定目标主体的位置、宽度和高度,使得找到的目标主体更佳准确;同时大大减少后续遍历的次数,减少计算量。
步骤S103:根据目标主体和预设规则,在原始图像中确定构图区域;
在一实施例中,参见图6,图像处理装置在实现步骤S103时,可包括步骤S1031~步骤S1035。
具体而言,步骤S1031:根据目标主体的重心位置、宽度以及高度,确定初始目标框;
在步骤S1031中,图像处理装置会将目标主体的重心位置确定为初始目标框的中心位置,并根据目标主体的宽度和第一预设比例系数,确定初始目标框的宽度,根据目标主体的高度和第二预设比例系数,确定初始目标框的高度。
实际构图时,通常会将目标主体在宽度方向的部分尽可能保留,故将第一预设系数的大小设置成大于或等于1。而目标主体在高方向的部分可部分剪切掉或者完全保留,具体根据需要设定。可选的,第二预设系数的大小小于1。可选的,第二预设系数大于或等于1。
以下实施例以第一预设系数和第二预设系数的大小均大于或等于1为例,对初 始目标框进行说明。
作为第一种可行的实现方式,第一预设比例系数=1,第二预设系数=1。本实现方式中,初始目标框即为目标主体的区域,初始目标框的尺寸为(w
0,h
0)。
作为第二种可行的实现方式,第一预设比例系数>1,第二预设系数=1。本实现方式中,初始目标框的宽度比目标主体的宽度w
0大,但初始目标框的高度仍为目标主体的高度h
0。
作为第三种可行的实现方式,第一预设比例系数=1,第二预设系数>1。本实现方式中,初始目标框的宽度仍为目标主体的宽度w
0,但初始目标框的高度比目标主体的高度h
0大。
作为第四种可行的实现方式,第一预设比例系数>1,第二预设系数>1。本实现方式中,初始目标框的宽度比目标主体的宽度w
0大,初始目标框的高度也比目标主体的高度h
0大。
在一实施例中,图像处理装置在确定初始目标框的高度h后,根据预设宽高比和初始目标框的高度,确定初始目标框的宽度w。比如,预设宽高比为M:N,则初始目标框的宽度
可选的,初始目标框的高度h为h
0。可选的,M:N为16:9、7:5、5:4、5:3、4:3、3:2、1:1。
以M:N为16:9、初始目标框的高度h为h
0为例,选择保高度的原因是拍摄的目标宽高比通常小于16:9,基于16:9映射出初始目标框的宽度w会比目标主体的宽度w
0大一些,不至于一开始在宽度方向将目标主体切割掉。
可以理解的是,在其他实施例中,初始目标框的高度h也可为h
0的0.7、0.8、0.9、1.1、1.2、1.3倍,具体可根据需要设定初始目标框的高度h。
步骤S1032:按照第一预设步长和第一预设步数,改变初始目标框的尺寸,获得多个目标框;
第一预设步长用于表征初始目标框的尺寸每次改变的大小参数,第一预设步数用于表征改变初始目标框的尺寸的次数参数。
在一些实施例中,图像处理装置根据第一预设步长和第一预设步数,同步增加初始目标框的宽度和高度,目标框包括每次增加尺寸后获得的框。可选的,按照初始目标框的宽度和高度逐渐递增的方式同步增加初始目标框的宽度和高度,可包括两种 实现方式:作为一种可行的实现方式,每次都对初始目标框的宽度和高度进行增加,并且,每次进行尺寸增加的步长为根据第一预设步长以及当前增加尺寸的次数(第一次进行尺寸增加时,当前增加尺寸的次数即为1、第二次进行尺寸增加时,当前增加尺寸的次数即为2,以此类推)确定的数值。具体的,第一次是将初始目标框的高度和宽度同时增加第一预设步长*1的大小、第二次是将初始目标框的高度和宽度同时增加第一预设步长*2的大小、依次类推,直至最后一次将初始目标框的高度和宽度同时增加第一预设步长*第一预设步数的大小。作为另一种可行的实现方式,当前次进行尺寸增加的框为前一次增加尺寸获得框,每次进行尺寸增加的步长均为第一预设步长。具体的,第一次是将初始目标框的高度和宽度同时增加第一预设步长*1的大小,获得第一框,第一框的尺寸为(w+第一预设步长*1,h+第一预设步长*1);第二次是对第一框的宽度和高度同时增加第一预设步长*1的大小,获得第二框,第二框的尺寸为(w+第一预设步长*2,h+第一预设步长*2),以此类推,直至最后一次将倒数第二个框的宽度和高度同时增加第一预设步长*1的大小。
在另一些实施例中,图像处理装置根据第一预设步长和第一预设步数,同步减小初始目标框的宽度和高度,目标框包括每次减小尺寸后获得的框。可选的,按照初始目标框的宽度和高度逐渐递减的方式同步增加初始目标框的宽度和高度,可包括两种实现方式:作为一种可行的实现方式,每次都对初始目标框的宽度和高度进行减小并且,每次进行尺寸减小的步长为根据第一预设步长以及当前减小尺寸的次数(第一次进行尺寸减小时,当前减小尺寸的次数即为1、第二次进行尺寸减小时,当前减小尺寸的次数即为2,以此类推)确定的数值。具体的,第一次是将初始目标框的高度和宽度同步减小第一预设步长*1的大小、第二次是将初始目标框的高度和宽度同步减小第一预设步长*2的大小、依次类推,直至最后一次将初始目标框的高度和宽度同步减小第一预设步长*第一预设步数的大小。作为另一种可行的实现方式,当前次进行尺寸减小的框为前一次减小尺寸获得框,每次进行尺寸减小的步长均为第一预设步长。具体的,第一次是将初始目标框的高度和宽度同步减小第一预设步长*1的大小,获得第一框,第一框的尺寸为(w+第一预设步长*1,h+第一预设步长*1);第二次是对第一框的宽度和高度同步减小第一预设步长*1的大小,获得第二框,第二框的尺寸为(w+第一预设步长*2,h+第一预设步长*2),以此类推,直至最后一次将倒数第二个框的宽度和高度同步减小第一预设步长*1的大小。
在又一些例子中,图像处理装置根据第一预设步长和第一预设步数,同步增加初始目标框的宽度和高度,并根据第一预设步长和第一预设步数,同步减小初始目标 框的宽度和高度,目标框包括每次增加尺寸后获得的框以及每次减小尺寸后获得的框。其中,图像处理装置根据第一预设步长和第一预设步数,同步增加初始目标框的宽度和高度以及图像处理装置根据第一预设步长和第一预设步数,同步减小初始目标框的宽度和高度的实现与上述实施例类似,此处不再赘述。
比如,第一预设步长记为stride1,第一预设步数记为steps1,将初始目标框的宽度和高度分别从-stride1*steps1至stride1*steps1变化,获得多个尺寸大小各不相同的目标框。
stride1和steps1的大小可根据需要设定,例如,stride1=1、steps1=3,或者其他。
在一些例子中,初始目标框即为根据S1031确定的初始目标框,初始目标框的尺寸为(w,h),本实施例的目标框包括(w-stride1*steps1,h-stride1*steps1)至(w+stride1*steps1,h+stride1*steps1)的区间。
在另一些例子中,初始目标框不仅包括根据S1031确定的初始目标框,还包括对S1031确定的初始目标框进行尺寸调整获得的一个或多个第一目标框。本实施例中,图像处理装置在根据目标主体的重心位置、宽度以及高度,确定初始目标框之后,按照第一预设步长和第一预设步数,改变初始目标框的尺寸,获得多个目标框之前,还需根据第三预设步长stride3,同步增大初始目标框的宽度和高度,获得第一目标框,直至第一目标框的宽度为初始目标框的预设倍数,且第一目标框的高度为初始目标框的预设倍数。本实施例中,图像处理装置按照第一预设步长和第一预设步数,改变初始目标框以及每次获得的第一目标框的尺寸,获得多个目标框。
其中,预设倍数>1,如1.3、1.4、1.5、1.6、1.7、1.8、1.9、2.0、2.1、2.2、2.3等,可根据具体需求设置预设倍数的大小。
本实施例的第一目标框可包括一个或多个,第一目标框的个数根据stride3以及预设倍数确定。其中,stride3以及预设倍数的大小可根据需要设定,例如,stride3=0.1,预设倍数=1.7,则第一目标框包括7个,7个第一目标框的尺寸各不相同。将预设倍数设置为1.7是出于构图的考虑,从三分构图角度理解,刚好把步骤S102确定的目标主体的区域抠出来,构图并不美观。最终目标图像中,目标主体的高度占到图像高度的2/3时候会相对美观,故将预设倍数设置为1.7能够使得目标图像中的目标主体的周围预留空间能给人较好的视觉缓冲。
步骤S1033:遍历所有目标框,获得各目标框的特征信息;
目标框的特征信息可包括目标框中所有像素点的能量和或/和目标框至少一条边的像素平均梯度。可以理解,目标框的特征信息并不限于目标框中所有像素点的能量、至少一条边的像素平均梯度,还可包括目标框的其他特征信息。
其中,目标框中所有像素点的能量和为目标框在显著图中对应区域的各像素点的能量和。
在一些例子中,目标框中所有像素点的能量和是直接根据该目标框在显著图中对应区域的各像素点的能量和确定的。
在另一些例子中,图像处理装置在实现步骤S1033之前,还需对显著图进行统计确定所有像素点的平均值μ和方差σ,并根据平均值μ和方差σ,确定显著图中各像素点的能量,降低遍历过程中的计算量。可选的,图像处理装置在根据平均值和方差,确定显著图中各像素点的能量,具体是将显著图中小于平均值和方差之和(μ+σ)的像素点的能量设置为0,而大于或等于平均值和方差之和(μ+σ)的像素点的能量设为原来显著图的值。
目标框至少一条边的像素平均梯度为目标框在原始图像中对应区域至少一条边的像素平均梯度。可选的,目标框的特征信息包括目标框四边的像素平均梯度。可选的,目标框的特征信息包括目标框三边的像素平均梯度。可选的,目标框的特征信息包括目标框两边的像素平均梯度。可选的,目标框的特征信息包括目标框单边的像素平均梯度。需要说明的是,当目标框的特征信息包括目标框三边的像素平均梯度、目标框两边的像素平均梯度或目标框单边的像素平均梯度时,所有目标框的特征信息为所有目标框对应边的像素平均梯度,以目标框的特征信息包括单边的像素平均梯度为例,本实施例获取所有目标框的上边、下边、左边或右边的像素平均梯度,取该单边上所有像素点的梯度之和除以该单边上所有的像素点数。其中,目标框的上边、下边、左边或右边对应原始图像的上下左右方向。上边和下边为目标框的长边,即目标框沿着宽度方向的边。左边和右边为目标框的宽边,即目标框沿着高度方向的边。
步骤S1034:根据所有目标框的特征信息,确定待构图区域。
在一实施例中,步骤S1034具体包括:将特征信息满足预设规则的目标框对应的图像区域确定待构图区域。
当目标框的特征信息包括目标框中所有像素点的能量和或/和目标框至少一条 边的像素平均梯度时,本实施例根据预设规则对目标框中所有像素点的能量和或/和目标框至少一边的像素平均梯度进行限制,使得目标框的边界更加简洁,从而获得空间更加简洁的构图(即目标图像)。
步骤S1034在具体实现时,针对每一目标框,将该目标框中所有像素点的能量和与其他各个目标框中所有像素点的能量和进行比较,和/或将该目标框至少一条边的像素平均梯度与其他各个目标框至少一条边的像素平均梯度进行比较。若当前目标框中所有像素点的能量和大于其他各个目标框中所有像素点的能量和,和/或当前目标框至少一条边的像素平均梯度小于其他各个目标框中至少一条边的像素平均梯度,则将当前目标框对应的图像区域确定为待构图区域。本实施例使用的目标函数同时考虑最大化能量和以及最小化平均梯度,即通过对最终的裁剪施加最大能量和及最小平均梯度的约束,获得美观性和目标主体的完整性较好的待构图区域。
在一些情况下,可能不存在特征信息满足预设规则的目标框,即不存在同时满足能量和最大及平均梯度最小的目标框,故在另一实施例中,步骤S1034具体包括:根据各目标框的特征信息以及第一预设策略,对所有目标框进行打分,并将分值最高的目标框确定为待构图区域,从而获得美观性较佳的待构图区域。
可采用不同的策略对各目标框进行打分。在一实施例中,根据各目标框的特征信息以及第一预设策略,对所有目标框进行打分具体包括:针对每一目标框,根据该目标框中所有像素点的能量和,确定第一分值,并根据该目标框至少一条边的像素平均梯度,确定第二分值,再根据第一分值和第二分值,确定该目标框的分值。本实施例中,目标框的第一分值为基于该目标框中所有像素点的能量和确定的数值,例如,将目标框中所有像素点的能量和代入以目标框中所有像素点的能量和为自变量的函数,即可获得该目标框的第一分值。目标框的第二分值为基于该目标框至少一条边的像素平均梯度确定的数值,将目标框至少一条边的像素平均梯度代入以至少一条边的像素平均梯度为自变量的函数,即可获得该目标框的第二分值。
进一步的,在一些实施例中,目标框的分值为目标框的第一分值和目标框的第二分值直接求和获得的和值。在另一些实施例中,目标框的分值为目标框的第一分值和目标框的第二分值加权求和获得的和值,其中,第一分值的权重和第二分值的权重预先设定。可根据预设图像的场景,确定优先考虑的特征信息,并将优先考虑的特征信息对应的分值权重设计的较大,例如,优先考虑目标框中所有像素点的能量和,则将第一分值的权重设计成大于第二分值的权重。
在另一实施例中,根据各目标框的特征信息以及第一预设策略,对所有目标框进行打分具体包括:针对每一目标框,根据该目标框中所有像素点的能量和、该目标框至少一条边的像素平均梯度以及预设函数,确定该目标框的分值。其中,预设函数的自变量包括目标框中所有像素点的能量和以及目标框至少一条边的像素平均梯度。
步骤S1035:根据待构图区域,确定构图区域。
确定构图区域的策略可包括多种,作为第一种可行的实现方式,待构图区域即为最佳构图,则构图区域为待构图区域。
作为第二种可行的实现方式,待构图区域中目标主体在高度方向的空间可能不够整洁,故需要对待构图区域进一步进行调整,使得最终确定的构图区域中目标主体在高度方向的空间更加整洁。步骤S1035在具体实现时,具体包括如下步骤:确定待构图区域对应的目标框的重心位置,并按照第二预设步长stride2和第二预设步数steps2,改变待构图区域对应的目标框的高度,获得多个新的目标框。接着,遍历所有新的目标框,获得各新的目标框的特征信息,再根据所有新的目标框的特征信息,确定新的待构图区域,并将新的待构图区域确定为构图区域。本实现方式固定待构图区域的宽度和待构图区域的横坐标的位置,调整待构图区域的高度,使得最终的构图区域边缘更佳整洁且目标主体所在的位置会更加接近三分点的位置。
可根据需要采用不同的方式确定新的待构图区域。例如,在一实施例中,根据所有新的目标框的特征信息,确定新的待构图区域具体包括:根据所有新的目标框的特征信息,确定特征信息满足预设规则的新的目标框,并将特征信息满足预设规则的新的目标框对应的图像区域确定为新的构图区域。
其中,按照第二预设步长stride2和第二预设步数steps2,改变待构图区域对应的目标框的高度,获得多个新的目标框的实现原理与上述实施例中按照第一预设步长和第一预设步数,改变初始目标框的高度的实现原理相类似,此处不再赘述。
新的目标框的特征信息包括新的目标框中所有像素点的能量和或/和新的目标框至少一条边的像素平均梯度,还可包括其他特征信息。可选的,新的目标框至少一条边的像素平均梯度至少包括新的目标框其中一条宽边的像素平均梯度。可选的,新的目标框至少一条边的像素平均梯度为新的目标框两条宽边的像素平均梯度。
可选的,新的目标框的特征信息包括新的目标框至少一条边的像素平均梯度,所述至少一条边至少包括一条宽边,遍历所有新的目标框,获得各新的目标框至少一 条边的像素平均梯度的实现原理与上述实施例中遍历所有目标框,获得各目标框至少一条边的像素平均梯度实现原理相类似,此处不再赘述。本实现方式中,针对每一新的目标框,将该新的目标框至少一条边的像素平均梯度与其他各个新的目标框至少一条边的像素平均梯度进行比较。若当前新的目标框至少一条边的像素平均梯度小于其他各个新的目标框中至少一条边的像素平均梯度,则将当前新的目标框对应的图像区域确定为待构图区域,只考虑最小化至少一条边的像素平均梯度为目标函数,调整待构图区域的高度,最终获得的构图区域中目标主体在高度方向的空间更加整洁。本实现方式中以待构图区域的重心坐标(x
1,y
1)为初始中心,只变化纵坐标(即高度),并考虑最小化至少一条边的像素平均梯度为目标函数,使得最终获得的构图区域中目标主体在高度方向的空间更加整洁。可选的,stride2=1、steps2=6,当然,stride2=1和steps2的大小也可设置成其他大小。
在另一实施例中,根据各新的目标框的特征信息以及第二预设策略,对所有新的目标框进行打分,将分值最高的新的目标框对应的图像区域确定为新的待构图区域。
可采用不同的策略对各新的目标框进行打分。在一实施例中,根据各新的目标框的特征信息以及第二预设策略,对所有新的目标框进行打分具体包括:针对每一新的目标框,根据该新的目标框中所有像素点的能量和,确定第三分值,并根据该新的目标框至少一条边的像素平均梯度,确定第四分值,再根据第三分值和第四分值,确定该新的目标框的分值。本实施例中,新的目标框的第三分值为基于该新的目标框中所有像素点的能量和确定的数值,例如,将新的目标框中所有像素点的能量和代入以新的目标框中所有像素点的能量和为自变量的函数,即可获得该新的目标框的第三分值。新的目标框的第四分值为基于该新的目标框至少一条边的像素平均梯度确定的数值,将新的目标框至少一条边的像素平均梯度代入以至少一条边的像素平均梯度为自变量的函数,即可获得该新的目标框的第四分值。
进一步的,在一些实施例中,新的目标框的分值为新的目标框的第三分值和新的目标框的第四分值直接求和获得的和值。在另一些实施例中,新的目标框的分值为新的目标框的第三分值和新的目标框的第四分值加权求和获得的和值,其中,第三分值的权重和第四分值的权重预先设定。可根据预设图像的场景,确定优先考虑的特征信息,并将优先考虑的特征信息对应的分值权重设计的较大,例如,优先考虑新的目标框中所有像素点的能量和,则将第三分值的权重设计成大于第四分值的权重。
在另一实施例中,根据各新的目标框的特征信息以及第一预设策略,对所有新 的目标框进行打分具体包括:针对每一新的目标框,根据该新的目标框中所有像素点的能量和、该新的目标框至少一条边的像素平均梯度以及预设函数,确定该新的目标框的分值。其中,预设函数的自变量包括新的目标框中所有像素点的能量和以及新的目标框至少一条边的像素平均梯度。
步骤S104:将构图区域确定为目标图像。
本实施例中,将原始图像中构图区域(步骤S103确定出的构图区域)以外的部分裁减掉,获得的图像即为目标图像。
通过上述过程即可由输入一张原来构图不好的原始图像,输出一张主体明确、边缘整洁且目标主体位置在图中接近三分的位置构图相对较好的目标图像,从而提高图像的视觉质量。
本发明实施例的图像处理方法,先基于显著性检测方法检测出原始图像中视觉感兴趣的区域,获得原始图像对应的显著图,然后基于显著图确定目标主体,有效排除杂乱背景的干扰,再根据所确定的目标主体和预设规则,寻找最佳的构图区域,从而获得构图较佳的目标图像。
对应于上述实施例的图像处理方法,本发明实施例还提供一种图像处理装置。图7是本发明实施例提供的一种图像处理装置的结构框图。参见图7,该图像处理装置可包括存储装置和处理器。
其中,存储装置,用于存储程序指令。处理器,调用存储装置中存储的程序指令,当程序指令被执行时,用于对原始图像进行显著性检测,获得显著图,并基于显著图,确定目标主体,根据目标主体和预设规则,在原始图像中确定构图区域,并将构图区域确定为目标图像。
处理器可以实现如本发明图1、图4至图6实施例中所示的相应方法,具体可参见上述实施例一的图像处理方法对本实施例的图像处理装置进行说明,此处不再赘述。
在本实施例中,所述存储装置可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储装置也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储装置还可以包括上述种类的存储器的组合。
所述处理器可以是中央处理器(central processing unit,CPU)。所述处理器还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
此外,本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例的图像处理方法的步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本发明部分实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。
Claims (69)
- 一种图像处理方法,其特征在于,所述方法包括:对原始图像进行显著性检测,获得显著图;基于所述显著图,确定目标主体;根据所述目标主体和预设规则,在所述原始图像中确定构图区域;将所述构图区域确定为目标图像。
- 根据权利要求1所述的方法,其特征在于,所述基于所述显著图,确定目标主体,包括:对所述显著图进行二值化处理,获得多个连通区域;统计各连通区域的面积,确定面积最大的连通区域和面积次大的连通区域;根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体。
- 根据权利要求2所述的方法,其特征在于,所述根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体,包括:当所述面积次大的连通区域的面积与所述面积最大的连通区域的面积占比大于或等于预设占比阈值时,将所述面积最大的连通区域和所述面积次大的连通区域确定为所述目标主体的区域;当所述面积次大的连通区域的面积与所述面积最大的连通区域的面积占比小于预设占比阈值时,将所述面积最大的连通区域确定为所述目标主体的区域;根据所述目标主体的区域,确定所述目标主体。
- 根据权利要求3所述的方法,其特征在于,所述根据所述目标主体的区域,确定所述目标主体,包括:根据所述目标主体的区域,确定所述目标主体的重心位置、宽度以及高度。
- 根据权利要求4所述的方法,其特征在于,所述根据所述目标主体和预设规则,在所述原始图像中确定构图区域,包括:根据所述目标主体的重心位置、宽度以及高度,确定初始目标框;按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸,获得多个目标框;遍历所有目标框,获得各目标框的特征信息;根据所有目标框的特征信息,确定待构图区域;根据所述待构图区域,确定构图区域。
- 根据权利要求5所述的方法,其特征在于,所述根据所有目标框的特征信息,确定待构图区域,包括:根据所有目标框的特征信息,确定特征信息满足预设规则的目标框;将特征信息满足预设规则的目标框对应的图像区域确定为待构图区域;或者所述根据所有目标框的特征信息,确定待构图区域,包括:根据各目标框的特征信息以及第一预设策略,对各目标框进行打分;将分值最高的目标框对应的图像区域确定为待构图区域。
- 根据权利要求5所述的方法,其特征在于,所述构图区域为所述待构图区域;或者,所述根据所述待构图区域,确定构图区域,包括:确定所述待构图区域对应的目标框的重心位置;按照第二预设步长和第二预设步数,改变所述待构图区域对应的目标框的高度,获得多个新的目标框;遍历所有新的目标框,获得各新的目标框的特征信息;根据所有新的目标框的特征信息,确定新的待构图区域;将所述新的待构图区域确定为构图区域。
- 根据权利要求7所述的方法,其特征在于,所述根据所有新的目标框的特征信息,确定新的待构图区域,包括:根据所有新的目标框的特征信息,确定特征信息满足所述预设规则的新的目标框;将特征信息满足所述预设规则的新的目标框对应的图像区域确定为新的待构图区域;或者,所述根据所有新的目标框的特征信息,确定新的待构图区域,包括:根据各新的目标框的特征信息以及第二预设策略,对所有新的目标框进行打分;将分值最高的新的目标框对应的图像区域确定为新的待构图区域。
- 根据权利要求5至8任一项所述的方法,其特征在于,所述目标框的特征信息包括所述目标框中所有像素点的能量和;和/或所述目标框至少一条边的像素平均梯度。
- 根据权利要求9所述的方法,其特征在于,所述目标框中所有像素点的能量和为所述目标框在所述显著图中对应区域的各像素点的能量和;所述目标框至少一条边的像素平均梯度为所述目标框在所述原始图像中对应区域所述至少一条边的像素平均梯度。
- 根据权利要求9所述的方法,其特征在于,所述将特征信息满足预设规则的目标框对应的图像区域确定待构图区域,包括:若当前目标框中所有像素点的能量和大于其他各个目标框中所有像素点的能量和;和/或当前目标框至少一条边的像素平均梯度小于其他各个目标框中所述至少一条边的像素平均梯度,则将所述当前目标框对应的图像区域确定为待构图区域。
- 根据权利要求9所述的方法,其特征在于,所述根据各目标框的特征信息以及第一预设策略,对各目标框进行打分,包括:针对每一目标框,根据该目标框中所有像素点的能量和,确定第一分值,并根据该目标框至少一条边的像素平均梯度,确定第二分值;根据所述第一分值和所述第二分值,确定该目标框的分值;或者,所述根据各目标框的特征信息以及第一预设策略,对各目标框进行打分, 包括:针对每一目标框,根据该目标框中所有像素点的能量和、该目标框至少一条边的像素平均梯度以及预设函数,确定该目标框的分值;其中,所述预设函数的自变量包括所述目标框中所有像素点的能量和以及所述目标框至少一条边的像素平均梯度。
- 根据权利要求12所述的方法,其特征在于,所述目标框的分值为所述目标框的第一分值和所述目标框的第二分值直接求和获得的和值,或所述目标框的分值为所述目标框的第一分值和所述目标框的第二分值加权求和获得的和值。
- 根据权利要求9所述的方法,其特征在于,所述遍历所有目标框,获得各目标框的特征信息之前,还包括:确定所述显著图中所有像素点的平均值和方差;根据所述平均值和所述方差,确定所述显著图中各像素点的能量。
- 根据权利要求14所述的方法,其特征在于,所述根据所述平均值和所述方差,确定所述显著图中各像素点的能量,包括:将小于所述平均值和所述方差之和的像素点的能量设置为0;大于或等于所述平均值和所述方差之和的像素点的能量设置为所述显著图中的值。
- 根据权利要求5所述的方法,其特征在于,所述根据所述目标主体的重心位置、宽度以及高度,确定初始目标框,包括:将所述目标主体的重心位置确定为所述初始目标框的中心位置;根据所述目标主体的宽度和第一预设比例系数,确定所述初始目标框的宽度;以及,根据所述目标主体的高度和第二预设比例系数,确定所述初始目标框的高度。
- 根据权利要求16所述的方法,其特征在于,所述第一预设比例系数和/或所述第二预设比例系数的大小为1。
- 根据权利要求16所述的方法,其特征在于,所述根据所述目标主体的重心位置、宽度以及高度,确定初始目标框之后,按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸,获得多个目标框之前,还包括:根据第三预设步长,同步增大所述初始目标框的宽度和高度,获得第一目标框,直至所述第一目标框的宽度为所述初始目标框的预设倍数,且所述第一目标框的高度为所述初始目标框的预设倍数,其中,所述预设倍数>1;所述按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸,获得多个目标框具体包括:按照第一预设步长和第一预设步数,改变所述初始目标框以及每次获得的第一目标框的尺寸,获得多个目标框。
- 根据权利要求2所述的方法,其特征在于,所述对所述显著图进行二值化处 理,包括:基于预设算法对所述显著图进行分割,确定分割阈值;基于所述分割阈值,对所述显著图进行二值化处理。
- 根据权利要求19所述的方法,其特征在于,所述基于预设算法对所述显著图进行分割,确定分割阈值,包括:基于预设算法分割所述显著图的前景和背景,确定第一阈值;根据所述第一阈值,确定分割阈值。
- 根据权利要求19或20所述的方法,其特征在于,所述预设算法为ostu算法。
- 根据权利要求2所述的方法,其特征在于,所述对所述显著图进行二值化处理,获得多个连通区域之后,统计各连通区域的面积之前,还包括:对二值化处理后的显著图进行开操作。
- 根据权利要求1所述的方法,其特征在于,所述对原始图像进行显著性检测,获得显著图,包括:对每一颜色通道进行至少两层金字塔分解;确定每层金字塔的第一显著图;对所述至少两层金字塔的第一显著图进行融合,获得显著图。
- 根据权利要求23所述的方法,其特征在于,所述确定每层金字塔的第一显著图,包括:针对每层金字塔,对该层金字塔中每一颜色通道的图像进行超像素分割,获得该层金字塔中每一颜色通道的超像素块;针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素的显著性响应值;针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,确定该颜色通道的第二显著图;针对每层金字塔,根据该层金字塔所有颜色通道的第二显著图,确定该层金字塔的第一显著图。
- 根据权利要求24所述的方法,其特征在于,所述针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素的显著性响应值,包括:针对每层金字塔中每一颜色通道的每一超像素块,统计该超像素块的直方图;并确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异;以及确定该超像素块的第一融合权重;还根据该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异以及该超像素块的第一融合权重,确定该像素块的显著性响应值。
- 根据权利要求25所述的方法,其特征在于,所述针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异,包括:针对每层金字塔中每一颜色通道的每一超像素块,根据该超像素块的直方图的每一柱状条的高度、该层金字塔中该颜色通道的其他超像素块的直方图的每一柱状条的高度以及第一预设参数,确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异。
- 根据权利要求25所述的方法,其特征在于,针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块的第一融合权重,包括:针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离;以及,根据该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离以及第二预设系数,确定该超像素块的第一融合权重。
- 根据权利要求27所述的方法,其特征在于,所述针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离,包括:针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的欧氏距离。
- 根据权利要求24所述的方法,其特征在于,针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,确定该颜色通道的第二显著图,包括:针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,对每一超像素的显著性响应值进行归一化;以及,根据归一化后的每一超像素的显著性响应值,确定该颜色通道的第二显著图。
- 根据权利要求23所述的方法,其特征在于,所述对所述至少两层金字塔的第一显著图进行融合,获得显著图,包括:根据预设的每层金字塔的第二融合权重,对所述至少两层金字塔的第一显著图进行融合,获得显著图。
- 根据权利要求23所述的方法,其特征在于,所述对每层金字塔中每一颜色通道的图像进行超像素分割,包括:采样slic算法对每层金字塔中每一颜色通道的图像进行超像素分割。
- 根据权利要求23所述的方法,其特征在于,所述颜色通道包括Lab颜色空间对应的三色通道;或者,所述颜色通道包括RGB颜色空间对应的三色通道;或者所述颜色通道包括YUV颜色空间对应的三色通道。
- 根据权利要求1或32所述的方法,其特征在于,所述对原始图像进行显著性检测,获得显著图之前,还包括:将所述原始图像的颜色空间转换成特定颜色空间。
- 根据权利要求1所述的方法,其特征在于,所述对原始图像进行显著性检测, 获得显著图之前,还包括:将所述原始图像的大小调整为预设大小。
- 一种图像处理装置,其特征在于,所述图像处理装置包括:存储装置,用于存储程序指令;处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,用于:对原始图像进行显著性检测,获得显著图;基于所述显著图,确定目标主体;根据所述目标主体和预设规则,在所述原始图像中确定构图区域;将所述构图区域确定为目标图像。
- 根据权利要求35所述的装置,其特征在于,所述处理器在基于所述显著图,确定目标主体时,具体用于:对所述显著图进行二值化处理,获得多个连通区域;统计各连通区域的面积,确定面积最大的连通区域和面积次大的连通区域;根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体。
- 根据权利要求36所述的装置,其特征在于,所述处理器在根据面积最大的连通区域的面积和面积次大的连通区域的面积,确定目标主体时,具体用于:当所述面积次大的连通区域的面积与所述面积最大的连通区域的面积占比大于或等于预设占比阈值时,将所述面积最大的连通区域和所述面积次大的连通区域确定为所述目标主体的区域;当所述面积次大的连通区域的面积与所述面积最大的连通区域的面积占比小于预设占比阈值时,将所述面积最大的连通区域确定为所述目标主体的区域;根据所述目标主体的区域,确定所述目标主体。
- 根据权利要求37所述的装置,其特征在于,所述处理器在根据所述目标主体的区域,确定所述目标主体时,具体用于:根据所述目标主体的区域,确定所述目标主体的重心位置、宽度以及高度。
- 根据权利要求38所述的装置,其特征在于,所述处理器在根据所述目标主体和预设规则,在所述原始图像中确定构图区域时,具体用于:根据所述目标主体的重心位置、宽度以及高度,确定初始目标框;按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸,获得多个目标框;遍历所有目标框,获得各目标框的特征信息;根据所有目标框的特征信息,确定待构图区域;根据所述待构图区域,确定构图区域。
- 根据权利要求39所述的装置,其特征在于,所述处理器在根据所有目标框的特征信息,确定待构图区域时,具体用于:根据所有目标框的特征信息,确定特征信息满足预设规则的目标框将特征信息满足预设规则的目标框对应的图像区域确定待构图区域;或者所述处理器在根据所有目标框的特征信息,确定待构图区域时,具体用于:根据各目标框的特征信息以及第一预设策略,对各目标框进行打分;将分值最高的目标框对应的图像区域确定为待构图区域。
- 根据权利要求39所述的装置,其特征在于,所述构图区域为所述待构图区域;或者,所述处理器在根据所述待构图区域,确定构图区域时,具体用于:确定所述待构图区域对应的目标框的重心位置;按照第二预设步长和第二预设步数,改变所述待构图区域对应的目标框的高度,获得多个新的目标框;遍历所有新的目标框,获得各新的目标框的特征信息;根据所有新的目标框的特征信息,确定新的待构图区域;将所述新的待构图区域确定为构图区域。
- 根据权利要求41所述的装置,其特征在于,所述处理器在根据所有新的目标框的特征信息,确定新的待构图区域时,具体用于:根据所有新的目标框的特征信息,确定特征信息满足所述预设规则的新的目标框;将特征信息满足所述预设规则的新的目标框对应的图像区域确定为新的待构图区域;或者所述处理器在根据所有新的目标框的特征信息,确定新的待构图区域时,具体用于:根据各新的目标框的特征信息以及第二预设策略,对所有新的目标框进行打分;将分值最高的新的目标框对应的图像区域确定为新的待构图区域。
- 根据权利要求39至42任一项所述的装置,其特征在于,所述目标框的特征信息包括所述目标框中所有像素点的能量和;和/或所述目标框至少一条边的像素平均梯度。
- 根据权利要求43所述的装置,其特征在于,所述目标框中所有像素点的能量和为所述目标框在所述显著图中对应区域的各像素点的能量和;所述目标框至少一条边的像素平均梯度为所述目标框在所述原始图像中对应区域所述至少一条边的像素平均梯度。
- 根据权利要求43所述的装置,其特征在于,所述处理器在将特征信息满足预设规则的目标框对应的图像区域确定待构图区域时,具体用于:若当前目标框中所有像素点的能量和大于其他各个目标框中所有像素点的能量和;和/或当前目标框至少一条边的像素平均梯度小于其他各个目标框中所述至少一条边的像素平均梯度,则将所述当前目标框对应的图像区域确定为待构图区域。
- 根据权利要求43所述的装置,其特征在于,所述处理器在根据各目标框的特征信息以及第一预设策略,对各目标框进行打分时,具体用于:针对每一目标框,根据该目标框中所有像素点的能量和,确定第一分值,并根据该目标框至少一条边的像素平均梯度,确定第二分值;根据所述第一分值和所述第二分值,确定该目标框的分值;或者,所述处理器在根据各目标框的特征信息以及第一预设策略,对各目标框进行打分时,具体用于:针对每一目标框,根据该目标框中所有像素点的能量和、该目标框至少一条边的像素平均梯度以及预设函数,确定该目标框的分值;其中,所述预设函数的自变量包括所述目标框中所有像素点的能量和以及所述目标框至少一条边的像素平均梯度。
- 根据权利要求46所述的装置,其特征在于,所述目标框的分值为所述目标框的第一分值和所述目标框的第二分值直接求和获得的和值,或所述目标框的分值为所述目标框的第一分值和所述目标框的第二分值加权求和获得的和值。
- 根据权利要求43所述的装置,其特征在于,所述处理器在遍历所有目标框,获得各目标框的特征信息之前,还用于:确定所述显著图中所有像素点的平均值和方差;根据所述平均值和所述方差,重新确定所述显著图中各像素点的能量。
- 根据权利要求48所述的装置,其特征在于,所述处理器在根据所述平均能量和所述能量方差,重新确定所述显著图中各像素点的能量时,具体用于:将小于所述平均值和所述方差之和的像素点的能量设置为0;大于或等于所述平均值和所述方差之和的像素点的能量设置为所述显著图中的值。
- 根据权利要求39所述的装置,其特征在于,所述处理器在根据所述目标主体的重心位置、宽度以及高度,确定初始目标框时,具体用于:将所述目标主体的重心位置确定为所述初始目标框的中心位置;根据所述目标主体的宽度和第一预设比例系数,确定所述初始目标框的宽度;以及,根据所述目标主体的高度和第二预设比例系数,确定所述初始目标框的高度。
- 根据权利要求50所述的装置,其特征在于,所述第一预设比例系数和/或所述第二预设比例系数的大小为1。
- 根据权利要求50所述的装置,其特征在于,所述处理器在根据所述目标主体的重心位置、宽度以及高度,确定初始目标框之后,按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸,获得多个目标框之前,还用于:根据第三预设步长,同步增大所述初始目标框的宽度和高度,获得第一目标框,直至所述第一目标框的宽度为所述初始目标框的预设倍数,且所述第一目标框的高度为所述初始目标框的预设倍数,其中,所述预设倍数>1;所述处理器在按照第一预设步长和第一预设步数,改变所述初始目标框的尺寸, 获得多个目标框具体时,具体用于:按照第一预设步长和第一预设步数,改变初始目标框以及每次获得的第一目标框的尺寸,获得多个目标框。
- 根据权利要求36所述的装置,其特征在于,所述处理器在对所述显著图进行二值化处理时,具体用于:基于预设算法对所述显著图进行分割,确定分割阈值;基于所述分割阈值,对所述显著图进行二值化处理。
- 根据权利要求53所述的装置,其特征在于,所述处理器在基于预设算法对所述显著图进行分割,确定分割阈值时,具体用于:基于预设算法分割所述显著图的前景和背景,确定第一阈值;根据所述第一阈值,确定分割阈值。
- 根据权利要求53或54所述的装置,其特征在于,所述预设算法为ostu算法。
- 根据权利要求36所述的装置,其特征在于,所述处理器在对所述显著图进行二值化处理,获得多个连通区域之后,统计各连通区域的面积之前,还用于:对二值化处理后的显著图进行开操作。
- 根据权利要求35所述的装置,其特征在于,所述处理器在对原始图像进行显著性检测,获得显著图时,具体用于:对每一颜色通道进行至少两层金字塔分解;确定每层金字塔的第一显著图;对所述至少两层金字塔的第一显著图进行融合,获得显著图。
- 根据权利要求57所述的装置,其特征在于,所述处理器在确定每层金字塔的第一显著图时,具体用于:针对每层金字塔,对该层金字塔中每一颜色通道的图像进行超像素分割,获得该层金字塔中每一颜色通道的超像素块;针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素的显著性响应值;针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,确定该颜色通道的第二显著图;针对每层金字塔,根据该层金字塔所有颜色通道的第二显著图,确定该层金字塔的第一显著图。
- 根据权利要求58所述的装置,其特征在于,所述处理器在针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素的显著性响应值时,具体用于:针对每层金字塔中每一颜色通道的每一超像素块,统计该超像素块的直方图;并确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异;以及确定该超像素块的第一融合权重;还根据该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图 之间的差异以及该超像素块的第一融合权重,确定该像素块的显著性响应值。
- 根据权利要求59所述的装置,其特征在于,所述处理器在针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异时,具体用于:针对每层金字塔中每一颜色通道的每一超像素块,根据该超像素块的直方图的每一柱状条的高度、该层金字塔中该颜色通道的其他超像素块的直方图的每一柱状条的高度以及第一预设参数,确定该超像素块的直方图与该层金字塔中该颜色通道的其他超像素块的直方图之间的差异。
- 根据权利要求59所述的装置,其特征在于,所述处理器在针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块的第一融合权重时,具体用于:针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离;以及,根据该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离以及第二预设系数,确定该超像素块的第一融合权重。
- 根据权利要求61所述的装置,其特征在于,所述处理器在针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的距离时,具体用于:针对每层金字塔中每一颜色通道的每一超像素块,确定该超像素块与该层金字塔中该颜色通道的其他超像素块之间的欧氏距离。
- 根据权利要求58所述的装置,其特征在于,所述处理器在针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,确定该颜色通道的第二显著图时,具体用于:针对每层金字塔中每一颜色通道,根据该颜色通道的所有超像素的显著性响应值,对每一超像素的显著性响应值进行归一化;以及,根据归一化后的每一超像素的显著性响应值,确定该颜色通道的第二显著图。
- 根据权利要求57所述的装置,其特征在于,所述处理器在对所述至少两层金字塔的第一显著图进行融合,获得显著图时,具体用于:根据预设的每层金字塔的第二融合权重,对所述至少两层金字塔的第一显著图进行融合,获得显著图。
- 根据权利要求58所述的装置,其特征在于,所述处理器在对每层金字塔中每一颜色通道的图像进行超像素分割时,具体用于:采样slic算法对每层金字塔中每一颜色通道的图像进行超像素分割。
- 根据权利要求58所述的装置,其特征在于,所述颜色通道包括Lab颜色空间对应的三色通道;或者,所述颜色通道包括RGB颜色空间对应的三色通道;或者所述颜色通道包括YUV颜色空间对应的三色通道。
- 根据权利要求35或66所述的装置,其特征在于,所述处理器在对原始图像进行显著性检测,获得显著图之前,还用于:将所述原始图像的颜色空间转换成特定颜色空间。
- 根据权利要求35所述的装置,其特征在于,所述处理器在对原始图像进行显著性检测,获得显著图之前,还用于:将所述原始图像的大小调整为预设大小。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至34任一项所述的图像处理方法的步骤。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/124724 WO2020133170A1 (zh) | 2018-12-28 | 2018-12-28 | 图像处理方法和装置 |
CN201880068933.0A CN111279389A (zh) | 2018-12-28 | 2018-12-28 | 图像处理方法和装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/124724 WO2020133170A1 (zh) | 2018-12-28 | 2018-12-28 | 图像处理方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020133170A1 true WO2020133170A1 (zh) | 2020-07-02 |
Family
ID=70999740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/124724 WO2020133170A1 (zh) | 2018-12-28 | 2018-12-28 | 图像处理方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111279389A (zh) |
WO (1) | WO2020133170A1 (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112270657A (zh) * | 2020-11-04 | 2021-01-26 | 成都寰蓉光电科技有限公司 | 一种基于天空背景的目标检测与跟踪算法 |
CN112348013A (zh) * | 2020-10-27 | 2021-02-09 | 上海眼控科技股份有限公司 | 目标检测方法、装置、计算机设备和可读存储介质 |
CN112489086A (zh) * | 2020-12-11 | 2021-03-12 | 北京澎思科技有限公司 | 目标跟踪方法、目标跟踪装置、电子设备及存储介质 |
CN112907617A (zh) * | 2021-01-29 | 2021-06-04 | 深圳壹秘科技有限公司 | 一种视频处理方法及其装置 |
CN113469976A (zh) * | 2021-07-06 | 2021-10-01 | 浙江大华技术股份有限公司 | 一种对象检测的方法、装置及电子设备 |
CN113643266A (zh) * | 2021-08-20 | 2021-11-12 | 百度在线网络技术(北京)有限公司 | 图像检测方法、装置及电子设备 |
CN114359323A (zh) * | 2022-01-10 | 2022-04-15 | 浙江大学 | 一种基于视觉注意机制的图像目标区域检测方法 |
CN114841896A (zh) * | 2022-05-31 | 2022-08-02 | 歌尔股份有限公司 | 图像处理方法、设备以及计算机可读存储介质 |
CN116433672A (zh) * | 2023-06-15 | 2023-07-14 | 山东九思新材料科技有限责任公司 | 基于图像处理的硅片表面质量检测方法 |
CN116993745A (zh) * | 2023-09-28 | 2023-11-03 | 山东辉瑞管业有限公司 | 基于图像处理给水管表面漏损检测方法 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016548B (zh) * | 2020-10-15 | 2021-02-09 | 腾讯科技(深圳)有限公司 | 一种封面图展示方法及相关装置 |
CN113473137A (zh) * | 2021-06-29 | 2021-10-01 | Oppo广东移动通信有限公司 | 编码方法、终端及存储介质 |
US20230153383A1 (en) * | 2021-11-18 | 2023-05-18 | International Business Machines Corporation | Data augmentation for machine learning |
CN114359561A (zh) * | 2022-01-10 | 2022-04-15 | 北京百度网讯科技有限公司 | 一种目标检测方法及目标检测模型的训练方法、装置 |
CN116708995B (zh) * | 2023-08-01 | 2023-09-29 | 世优(北京)科技有限公司 | 摄影构图方法、装置及摄影设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133940A (zh) * | 2017-03-28 | 2017-09-05 | 深圳市金立通信设备有限公司 | 一种构图方法及终端 |
CN107545576A (zh) * | 2017-07-31 | 2018-01-05 | 华南农业大学 | 基于构图规则的图像编辑方法 |
CN108776970A (zh) * | 2018-06-12 | 2018-11-09 | 北京字节跳动网络技术有限公司 | 图像处理方法和装置 |
CN108989665A (zh) * | 2018-06-26 | 2018-12-11 | Oppo(重庆)智能科技有限公司 | 图像处理方法、装置、移动终端及计算机可读介质 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574866A (zh) * | 2015-12-15 | 2016-05-11 | 努比亚技术有限公司 | 一种实现图像处理的方法及装置 |
-
2018
- 2018-12-28 WO PCT/CN2018/124724 patent/WO2020133170A1/zh active Application Filing
- 2018-12-28 CN CN201880068933.0A patent/CN111279389A/zh active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107133940A (zh) * | 2017-03-28 | 2017-09-05 | 深圳市金立通信设备有限公司 | 一种构图方法及终端 |
CN107545576A (zh) * | 2017-07-31 | 2018-01-05 | 华南农业大学 | 基于构图规则的图像编辑方法 |
CN108776970A (zh) * | 2018-06-12 | 2018-11-09 | 北京字节跳动网络技术有限公司 | 图像处理方法和装置 |
CN108989665A (zh) * | 2018-06-26 | 2018-12-11 | Oppo(重庆)智能科技有限公司 | 图像处理方法、装置、移动终端及计算机可读介质 |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112348013A (zh) * | 2020-10-27 | 2021-02-09 | 上海眼控科技股份有限公司 | 目标检测方法、装置、计算机设备和可读存储介质 |
CN112270657A (zh) * | 2020-11-04 | 2021-01-26 | 成都寰蓉光电科技有限公司 | 一种基于天空背景的目标检测与跟踪算法 |
CN112489086A (zh) * | 2020-12-11 | 2021-03-12 | 北京澎思科技有限公司 | 目标跟踪方法、目标跟踪装置、电子设备及存储介质 |
CN112907617B (zh) * | 2021-01-29 | 2024-02-20 | 深圳壹秘科技有限公司 | 一种视频处理方法及其装置 |
CN112907617A (zh) * | 2021-01-29 | 2021-06-04 | 深圳壹秘科技有限公司 | 一种视频处理方法及其装置 |
CN113469976A (zh) * | 2021-07-06 | 2021-10-01 | 浙江大华技术股份有限公司 | 一种对象检测的方法、装置及电子设备 |
CN113643266A (zh) * | 2021-08-20 | 2021-11-12 | 百度在线网络技术(北京)有限公司 | 图像检测方法、装置及电子设备 |
CN113643266B (zh) * | 2021-08-20 | 2024-04-05 | 百度在线网络技术(北京)有限公司 | 图像检测方法、装置及电子设备 |
CN114359323A (zh) * | 2022-01-10 | 2022-04-15 | 浙江大学 | 一种基于视觉注意机制的图像目标区域检测方法 |
CN114359323B (zh) * | 2022-01-10 | 2024-07-05 | 浙江大学 | 一种基于视觉注意机制的图像目标区域检测方法 |
CN114841896A (zh) * | 2022-05-31 | 2022-08-02 | 歌尔股份有限公司 | 图像处理方法、设备以及计算机可读存储介质 |
CN116433672B (zh) * | 2023-06-15 | 2023-08-25 | 山东九思新材料科技有限责任公司 | 基于图像处理的硅片表面质量检测方法 |
CN116433672A (zh) * | 2023-06-15 | 2023-07-14 | 山东九思新材料科技有限责任公司 | 基于图像处理的硅片表面质量检测方法 |
CN116993745A (zh) * | 2023-09-28 | 2023-11-03 | 山东辉瑞管业有限公司 | 基于图像处理给水管表面漏损检测方法 |
CN116993745B (zh) * | 2023-09-28 | 2023-12-19 | 山东辉瑞管业有限公司 | 基于图像处理给水管表面漏损检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111279389A (zh) | 2020-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020133170A1 (zh) | 图像处理方法和装置 | |
JP6636154B2 (ja) | 顔画像処理方法および装置、ならびに記憶媒体 | |
CN108446617B (zh) | 抗侧脸干扰的人脸快速检测方法 | |
WO2022161009A1 (zh) | 图像处理方法及装置、存储介质、终端 | |
CN109918969B (zh) | 人脸检测方法及装置、计算机装置和计算机可读存储介质 | |
CN103929596B (zh) | 引导拍摄构图的方法及装置 | |
CN105184763B (zh) | 图像处理方法和装置 | |
WO2022078041A1 (zh) | 遮挡检测模型的训练方法及人脸图像的美化处理方法 | |
CN108537782B (zh) | 一种基于轮廓提取的建筑物图像匹配与融合的方法 | |
US20170294000A1 (en) | Sky editing based on image composition | |
CN108446694B (zh) | 一种目标检测方法及装置 | |
CN107452010A (zh) | 一种自动抠图算法和装置 | |
CN108665463A (zh) | 一种基于对抗式生成网络的宫颈细胞图像分割方法 | |
JP2002109525A (ja) | 画像の顕著性及びアピール性に基づく画像処理パス変更方法 | |
CN110909724B (zh) | 一种多目标图像的缩略图生成方法 | |
CN105869159A (zh) | 一种图像分割方法及装置 | |
EP4252148A1 (en) | Lane line detection method based on deep learning, and apparatus | |
CN112579823B (zh) | 基于特征融合和增量滑动窗口的视频摘要生成方法及系统 | |
US10491804B2 (en) | Focus window determining method, apparatus, and device | |
WO2018082308A1 (zh) | 一种图像处理方法及终端 | |
CN108921856B (zh) | 图像裁剪方法、装置、电子设备及计算机可读存储介质 | |
CN110268442B (zh) | 在图像中检测背景物上的外来物的计算机实现的方法、在图像中检测背景物上的外来物的设备以及计算机程序产品 | |
WO2020042126A1 (zh) | 一种对焦装置、方法及相关设备 | |
WO2022261828A1 (zh) | 图像处理方法、装置、电子设备及计算机可读存储介质 | |
CN106295639A (zh) | 一种虚拟现实终端以及目标图像的提取方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18944406 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18944406 Country of ref document: EP Kind code of ref document: A1 |