WO2018233708A1 - 图像显著性物体检测方法和装置 - Google Patents

图像显著性物体检测方法和装置 Download PDF

Info

Publication number
WO2018233708A1
WO2018233708A1 PCT/CN2018/092514 CN2018092514W WO2018233708A1 WO 2018233708 A1 WO2018233708 A1 WO 2018233708A1 CN 2018092514 W CN2018092514 W CN 2018092514W WO 2018233708 A1 WO2018233708 A1 WO 2018233708A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
map
feature maps
processed
Prior art date
Application number
PCT/CN2018/092514
Other languages
English (en)
French (fr)
Inventor
侯淇彬
程明明
白蔚
周迅溢
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP18820395.4A priority Critical patent/EP3633611A4/en
Publication of WO2018233708A1 publication Critical patent/WO2018233708A1/zh
Priority to US16/723,539 priority patent/US11430205B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/10Image enhancement or restoration using non-spatial domain filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of computer image processing, and more particularly to an image saliency object detecting method and apparatus.
  • Significant object detection is the process of detecting the region of an object that is most likely to cause visual attention from the human eye.
  • the existing image saliency object detection method is based on the existing convolutional neural network architecture, and fine-tunes the convolutional neural network architecture to realize saliency object detection of images. Specifically, as shown in FIG. 1, each convolutional layer in the convolutional neural network architecture is connected with a side output layer, and all of the side output layers are connected to a fusion layer. When the image is processed, the image to be processed passes through the convolution layer and then outputs the feature maps of different resolutions.
  • the feature maps of different resolutions are separately sampled by the side output layer to obtain the side output feature map, and finally The fusion layer sums the side output feature maps obtained from different levels to obtain the saliency detection result of the image to be processed, thereby realizing the detection of the significant object of the image.
  • the fusion layers directly have different levels of sides. The output feature map is simply fused, and the resulting saliency detection result of the image to be processed is not ideal.
  • the present application provides an image saliency object detecting method and apparatus to improve the effect of image saliency object detection.
  • a method for detecting an image saliency object comprising: performing convolution processing corresponding to at least two convolution layers on the image to be processed, respectively, to obtain at least two first feature maps of the image to be processed
  • the resolution of the at least two first feature maps is smaller than the resolution of the image to be processed, and the resolutions of any two of the at least two first feature maps are different
  • the two first feature maps are processed to obtain at least two second feature maps of the image to be processed; and the at least two second feature maps are stitched to obtain a saliency map of the image to be processed.
  • the at least one second feature map of the at least two second feature maps is obtained by superimposing a plurality of the first feature maps of the at least two first feature maps, the at least two The resolution of any two second feature maps of the two feature maps is different. Furthermore, the resolution of the at least one second feature map is greater than or equal to a maximum resolution in the plurality of first feature maps in the at least one second feature map.
  • the present application processes at least two first feature maps of an image to be processed to obtain a resolution greater than or equal to the at least two first feature maps, and any two of the at least two second feature maps.
  • the resolution of the graph is different, and the at least two feature maps having different resolutions are spliced, thereby obtaining a saliency map with better effect.
  • the most significant region of the first feature map of the higher resolution may be assisted according to the first feature map of the lower resolution, and may also be a higher resolution first feature map to improve the sparseness and irregularity of the lower resolution first feature map, such that the second feature map obtained by superimposing at least two first feature maps in the overlay set is finally obtained
  • the saliency region in the image can be better displayed, and then the saliency map with better effect can be obtained by splicing at least two second feature images.
  • the convolution kernel of the convolution processing used may have a size of 1, and at least two convolution layers are convoluted.
  • the effect of the processing may be to extract a feature map required for significant segmentation from the image to be processed.
  • performing superposition processing on the plurality of first feature maps in the at least two first feature maps including: in the plurality of first feature maps a first feature map having a resolution smaller than a resolution of the at least one second feature to be obtained is upsampled, so that a third feature map corresponding to the first feature map is obtained, and a resolution of the third feature map is equal to a resolution of the at least one second feature map to be obtained; superimposing the upsampled third feature map with the unupsampled first feature map of the plurality of first feature maps, thereby The at least one second feature map is obtained.
  • the resolution of some first feature maps may be smaller than the resolution of some second feature maps. In this case, by first upsampling the first feature maps with smaller resolutions, each of the first The resolution of the feature map remains the same, which ensures the effect of the overlay processing.
  • the first feature map that is not upsampled in the plurality of first feature maps may not exist, for example, all of the plurality of feature maps may be upsampled, and the obtained The three feature maps are subjected to superimposition processing to obtain the at least one feature map.
  • the weight is trained based on a difference between a saliency map of the training image and a reference saliency map corresponding to the training image.
  • the splicing of the at least two second feature maps of the image to be processed to obtain a saliency map of the image to be processed includes: A weight corresponding to each of the at least two second feature maps is spliced, and the at least two second feature maps are spliced to obtain a saliency map of the image to be processed.
  • the weight corresponding to each of the at least two second feature maps may be multiplied by the pixel value of each second feature map, and then the result of multiplying the weights by the pixel values may be added.
  • the result of the summed pixel value is taken as the pixel value of the saliency map of the image to be processed, thereby obtaining a saliency map of the image to be processed.
  • the weight of each of the at least two second feature maps is corresponding to the training image according to a saliency map of the training image The difference is determined by reference to the significance map.
  • the performing the upsampled third feature map with the unsampled first feature map of the plurality of first feature maps And superimposing, to obtain the at least one second feature map, comprising: superposing the upsampled third feature map with the unupsampled first feature image of the plurality of first feature maps, The convolution and pooling process results in the at least one second feature map.
  • the at least two second feature maps are spliced to obtain a saliency map of the image to be processed, including: at least two The two feature maps are subjected to convolution processing to obtain features of the at least two second feature maps; and the features of the at least two second feature maps are stitched to obtain a saliency map of the image to be processed.
  • the size of the convolution kernel when the convolution process is performed on the at least two second feature maps may be 1, and the feature in the second feature map can be further extracted by the convolution process, so that the processed image is The local features are more distinguishable, resulting in better saliency detection.
  • Convolution processing of the image before image stitching can further extract the features of the image, and use the extracted feature map as the basis of subsequent stitching, which can reduce the complexity of subsequent stitching, and can eliminate the value by feature extraction. Small features, which can improve the effect of the resulting saliency map.
  • the method further includes: performing guided filtering on the saliency map of the image to be processed according to the image to be processed, to obtain segmentation of the image to be processed image.
  • the image segmentation edge in the saliency map of the image to be processed can be further optimized by the guide filtering, and a segmented image with better effect can be obtained.
  • the saliency map is a first saliency map
  • the resolution of the first saliency map is smaller than a resolution of the image to be processed
  • the third processing module is configured to: perform upsampling on the first saliency map to obtain a second saliency map having the same resolution as the image to be processed; and using the to-be-processed image to the second saliency
  • the map performs guided filtering to obtain a divided image of the image to be processed.
  • an image saliency object detecting method comprising: performing convolution processing corresponding to at least two convolution layers on a processed image, respectively, to obtain at least two first features of the image to be processed
  • the resolution of the at least two first feature maps is smaller than the resolution of the image to be processed, and the resolutions of any two of the at least two first feature maps are different;
  • At least two first feature maps included in the overlay set in the set are superimposed to obtain at least two second feature maps of the image to be processed, wherein the at least two sets respectively correspond to different resolutions
  • the at least two sets are in one-to-one correspondence with the at least two second feature maps, and the resolution of the first feature map included in the overlay set is less than or equal to the resolution of the second feature map corresponding to the overlay set
  • And splicing the at least two second feature maps to obtain a saliency map of the image to be processed.
  • the present application does not directly superimpose the at least two first feature maps as in the prior art to obtain a final saliency map, but first according to the resolution. Rate determining at least two sets, and superimposing the included feature maps in the superimposed set, and then splicing the second feature maps obtained by each set to obtain a saliency map of the image to be processed, and performing In the process of superposition and splicing, the characteristics of the feature maps with different resolutions are fully considered, and a saliency map with better effects can be obtained.
  • the higher resolution resolution may be assisted according to the lower resolution first feature map in the superimposed set.
  • the sparseness and irregularity of the first feature map of the lower resolution may be improved according to the first feature map of the higher resolution, so that the final
  • the second feature image obtained by the at least two first feature image superimposition processes can better display the saliency region in the image of the image, and then perform the at least two second feature images obtained by the at least two sets of the superimposed processing. After splicing, a better saliency map can be obtained.
  • the foregoing superposition set may refer to a set including at least two first feature maps in at least two sets, and further, the at least two sets may include other sets in addition to the superposition set, for example, at least two A collection of only one first feature map may also be included in the collection.
  • the first feature map included in the set is not superimposed, but the first feature map may be directly determined as the second feature map corresponding to the set.
  • the resolution corresponding to each of the at least two sets may refer to the resolution of the second feature image obtained by superimposing the first feature image in the set.
  • the convolution kernel of the convolution processing used may have a size of 1, and performing convolution processing corresponding to at least two convolution layers.
  • the function may be to extract a feature map required for significant segmentation from the image to be processed, and then further process the extracted feature map to obtain a saliency map of the image to be processed.
  • the performing respectively, performing superposition processing on at least two first feature maps included in the superposition set in the at least two sets, including: in the superimposed set Upsampling a first feature map having a resolution smaller than a resolution of the second feature map corresponding to the overlay set, thereby obtaining at least two third features having the same resolution of the second feature map corresponding to the overlay set
  • the at least two third feature maps are in one-to-one correspondence with the at least two first feature maps; and the at least two third feature maps are superimposed to obtain a second feature map corresponding to the overlay set.
  • the resolution of some of the first feature maps in the above-mentioned overlay set may be smaller than the resolution of the second feature map corresponding to the overlay set, and at this time, the first feature map with a smaller resolution is performed.
  • the sampling can make the resolution of each first feature map in the superposition set consistent, and the effect of the superposition processing is ensured.
  • the superposing the at least two third feature maps to obtain the second feature map corresponding to the superposition set including: according to the at least two The weights corresponding to the third feature map are superimposed on the at least two third feature maps to obtain the second feature map.
  • the weight corresponding to each of the at least two third feature maps may be multiplied by the pixel value of each third feature map, and the multiplied results may be summed, and the sum will be added.
  • the result obtained is taken as the pixel value of the second feature map, and thus the second feature map.
  • the weight of each of the at least one third feature map is a reference corresponding to the training image according to a saliency map of the training image The difference in the significance map is obtained by training.
  • the acquisition process of the saliency map of the above training image and the acquisition process of the saliency map of the image to be processed may be consistent. Therefore, before processing the image to be processed, the saliency map of the training image may be acquired according to the flow of the method in the second aspect, and then the difference between the saliency map corresponding to the training image and the reference saliency map according to the training image.
  • the weights of each of the at least one third feature map are trained to obtain weights for each of the third feature maps.
  • the splicing of the at least two second feature images of the image to be processed to obtain a saliency map of the image to be processed includes: A weight corresponding to each of the at least two second feature maps is spliced, and the at least two second feature maps are spliced to obtain a saliency map of the image to be processed.
  • the weight corresponding to each of the at least two second feature maps may be multiplied by the pixel value of each second feature map, and then the result of multiplying the weights by the pixel values may be added.
  • the result of the summed pixel value is taken as the pixel value of the saliency map of the image to be processed, thereby obtaining a saliency map of the image to be processed.
  • the weight of each of the at least two second feature maps is corresponding to the training image according to a saliency map of the training image The difference is determined by reference to the significance map.
  • the acquisition process of the saliency map of the above training image and the acquisition process of the saliency map of the image to be processed may be consistent. Therefore, before processing the image to be processed, the saliency map of the training image may be acquired according to the flow of the method in the second aspect, and then the difference between the saliency map corresponding to the training image and the reference saliency map according to the training image.
  • the weights of each of the at least two second feature maps are trained to obtain weights for each of the second feature maps.
  • the at least two first feature maps included in the superposition set in the at least two sets are superimposed to obtain at least the image to be processed
  • the two second feature maps include: superimposing at least two first feature maps included in the superposition set of the at least two sets; and convolving at least two feature maps obtained after the superimposing process Processing to obtain at least two feature maps after the convolution process, wherein the convolution process is used to extract features of at least two feature maps obtained after the superposition process; At least two feature maps are pooled to obtain the at least two second feature maps.
  • the size of the convolution kernel when the convolution process is performed on the at least two feature maps obtained after the superimposition process may be 1.
  • at least two feature maps obtained after the superposition processing can be integrated and collected, and the feature is collected. The high-value features in the picture are highlighted.
  • the features of the superimposed image can be further extracted, and the extracted features are used as the second feature map, which can reduce the computation amount of the subsequent processing.
  • feature extraction can eliminate the features with less value and improve the effect of the resulting saliency map.
  • the at least two second feature patterns are spliced to obtain a saliency map of the image to be processed, including: at least two The two feature maps are subjected to convolution processing to obtain features of the at least two second feature maps; and the features of the at least two second feature maps are stitched to obtain a saliency map of the image to be processed.
  • the size of the convolution kernel when the convolution process is performed on the at least two second feature maps may be 1, and the feature in the second feature map can be further extracted by the convolution process, so that the processed image is The local features are more distinguishable, resulting in better saliency detection.
  • Convolution processing of the image before image stitching can further extract the features of the image, and use the extracted feature map as the basis of subsequent stitching, which can reduce the complexity of subsequent stitching, and can eliminate the value by feature extraction. Small features, which can improve the effect of the resulting saliency map.
  • the method further includes: performing a guide filtering on the saliency map of the image to be processed according to the image to be processed, to obtain a segmentation of the image to be processed image.
  • the image segmentation edge in the saliency map of the image to be processed can be further optimized by the guide filtering, and a segmented image with better effect can be obtained.
  • the saliency map is a first saliency map
  • the resolution of the first saliency map is smaller than a resolution of the image to be processed
  • the third processing module is configured to: perform upsampling on the first saliency map to obtain a second saliency map having the same resolution as the image to be processed; and using the to-be-processed image to the second saliency
  • the map performs guided filtering to obtain a divided image of the image to be processed.
  • an image saliency object detecting apparatus comprising means for performing the method of the first aspect or various implementations thereof.
  • an image saliency object detecting apparatus comprising: a storage medium, and a central processing unit, wherein the storage medium stores a computer executable program, and the central processing unit is connected to the storage medium, And executing the computer executable program to implement the method of the first aspect or various implementations thereof.
  • a computer readable medium storing program code for device execution, the program code comprising instructions for performing the method of the first aspect or various implementations thereof.
  • FIG. 1 is a schematic diagram of a network architecture of a conventional image saliency object detecting method.
  • FIG. 2 is a schematic flowchart of an image saliency object detecting method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a convolutional neural network architecture of an embodiment of the present application.
  • FIG. 4 is a schematic diagram of processing a first feature map.
  • FIG. 5 is a comparison diagram of the saliency map obtained by the embodiment of the present application and the saliency map obtained by other methods.
  • FIG. 6 is a schematic diagram of an image saliency object detecting method according to an embodiment of the present application.
  • FIG. 7 is a comparison diagram of the saliency map obtained by the embodiment of the present application and the saliency map obtained by other methods.
  • FIG. 8 is a schematic block diagram of an image saliency object detecting apparatus according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an image saliency object detecting apparatus according to an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an image saliency object detecting method according to an embodiment of the present application.
  • the method of FIG. 2 can be performed in a network architecture of a convolutional neural network, and the method 200 of FIG. 2 specifically includes:
  • the convolution process corresponding to the at least two convolution layers is performed on the image to be processed, respectively, to obtain at least two first feature images of the image to be processed, wherein the resolution of the at least two first feature images is smaller than the to-be-processed
  • the resolution of the image, the resolution of any two of the at least two first feature maps is different.
  • the size of the convolution kernel used in the convolution processing of the image to be processed may be 1, and the convolution processing may be performed to extract a feature map required for significant segmentation from the image to be processed, and then extract the The feature map is further processed to obtain a saliency map of the image to be processed.
  • the image to be processed may be an original image to be processed, or may be an image obtained by downsampling the original image. Down-sampling the original image before acquiring the saliency map of the image can reduce the resolution of the image and reduce the complexity of the subsequent processed image.
  • the resolution of the first feature map may be smaller than the resolution of the image to be processed. For example, if the image to be processed is an image with a resolution of 256 ⁇ 256, the resolution of the first feature image may be 128 ⁇ 128, 64 ⁇ 64. 32 ⁇ 32, 16 ⁇ 16, 8 ⁇ 8, and so on.
  • the image to be processed in different convolution layers may be subjected to convolution processing to obtain a first feature map having different resolutions, for example, the resolution of the image to be processed is 256 ⁇
  • the image of 256 is subjected to convolution processing on the four convolutional layers to obtain four first feature maps having resolutions of 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8, respectively.
  • the superposition set in the above step 220 may refer to a set including at least two first feature maps in at least two sets.
  • the at least two sets may include other sets in addition to the superposition set, for example, A set of only one first feature map may also be included in at least two of the above sets.
  • the first feature map included in the set is not superimposed, but the first feature map may be directly determined as the second feature map corresponding to the set.
  • the resolution corresponding to each of the at least two sets may refer to the resolution of the second feature image obtained by superimposing the first feature image in the set.
  • the higher resolution resolution may be assisted according to the lower resolution first feature map in the superposition set.
  • the sparseness and irregularity of the first feature map of the lower resolution may be improved according to the first feature map of the higher resolution, so that the final
  • the second feature image obtained by the at least two first feature image superimposition processes can better display the saliency region in the image of the image, and then perform the at least two second feature images obtained by the at least two sets of the superimposed processing. After splicing, a better saliency map can be obtained.
  • the at least two second feature maps of the at least two sets are obtained in combination with a specific case. For example, after performing convolution processing on the image to be processed, four first feature maps are obtained, and the four first feature maps are respectively A, B, C, D, wherein the resolutions of A, B, C, and D are 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8, respectively, and the resolutions corresponding to set 1 to set 4 are 64 respectively. ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, 8 ⁇ 8. Then, set 1 contains A, B, C, and D, set 2 contains B, C, and D, set 3 contains C and D, and set 4 contains only D. Each of the set 1 to the set 3 includes at least two first feature maps.
  • the set 1 to the set 3 may be referred to as an overlay set. Since the set 4 includes only one first feature map, the set 4 does not belong to the overlay. set.
  • the second feature map corresponding to the set 1 can be obtained by superimposing A, B, C, and D.
  • the corresponding one of the set 2 can be obtained by superposing B, C, and D.
  • the second feature map, for the set 3, the second feature map corresponding to the set 2 can be obtained by superimposing C and D, and for the set 4, since only D is included, the D can be directly determined to correspond to the set 4 The second feature map.
  • performing the superposition processing on the at least two first feature images included in the superposition set in the at least two sets in the foregoing step 220 specifically: the second feature corresponding to the resolution in the superimposed set is smaller than the superimposed set
  • the first feature map of the resolution of the graph is upsampled to obtain at least two third feature maps having the same resolution of the second feature map corresponding to the overlay set, at least two third feature maps and at least two first
  • the feature maps are in one-to-one correspondence; at least two third feature maps are superimposed to obtain a second feature map corresponding to the superposition set.
  • the resolution of some of the first feature maps in the above-mentioned overlay set may be smaller than the resolution of the second feature map corresponding to the overlay set, and at this time, the first feature map with a smaller resolution is performed.
  • the sampling can make the resolution of each first feature map in the superposition set consistent, and the effect of the superposition processing is ensured.
  • the foregoing at least two third feature maps are superimposed to obtain a second feature map corresponding to the superimposed set, and specifically, the method includes: performing, according to weights corresponding to the at least two third feature maps, the at least two third feature maps. Superimposed to obtain a second feature map.
  • the superimposing the at least two third feature maps according to the weights corresponding to the at least two third feature maps may specifically refer to: corresponding to each of the at least two third feature maps.
  • the weight is multiplied by the pixel value of each third feature map, and the multiplied results are summed, and the summed result is used as the pixel value of the second feature map, thereby the second feature map.
  • three third feature maps are obtained, assuming that the three feature maps have weights of X, Y, Z, and X, Y, and Z, respectively. 30%, 30%, 40% respectively, then when X, Y, Z are superimposed, you can add 30% of the pixel value of X, 30% of the pixel value of Y, and 40% of the pixel value of Z. And, the result obtained by the addition is taken as the pixel value of the second feature map W obtained by the superposition.
  • the weight of each of the at least one third feature map may be trained according to a difference between the saliency map of the training image and the reference saliency map corresponding to the training image.
  • the acquisition process of the saliency map of the above training image and the acquisition process of the saliency map of the image to be processed may be consistent. Therefore, before processing the image to be processed, the saliency map of the training image may be acquired according to the flow of the method in the above first aspect, and then the difference between the saliency map corresponding to the training image and the reference saliency map according to the training image may be obtained according to the saliency map of the training image.
  • the weights of each of the at least one third feature map are trained to obtain weights for each of the third feature maps.
  • the at least two first feature maps are not directly superimposed as in the prior art to obtain a final saliency map, but Determining at least two sets according to the resolution, and superimposing the included feature maps in the superimposed set, and then splicing the second feature maps obtained by each set to obtain a saliency map of the image to be processed, and In the process of superimposition and splicing, the characteristics of the feature maps with different resolutions are fully considered, and a saliency map with better effects can be obtained.
  • the higher resolution resolution may be assisted according to the lower resolution first feature map in the superposition set.
  • the sparseness and irregularity of the first feature map of the lower resolution may be improved according to the first feature map of the higher resolution, so that the final
  • the second feature image obtained by the at least two first feature image superimposition processes can better display the saliency region in the image of the image, and then perform the at least two second feature images obtained by the at least two sets of the superimposed processing. After splicing, a better saliency map can be obtained.
  • the at least two second feature maps are spliced in the foregoing step 230, so that obtaining the saliency map of the to-be-processed image specifically includes: corresponding to each of the at least two second feature maps. Weighting, splicing at least two second feature maps to obtain a saliency map of the image to be processed.
  • the splicing of the at least two second feature maps according to the weights corresponding to the at least two second feature maps may be the weights corresponding to each of the at least two second feature maps.
  • the pixel values of the two feature maps are multiplied, and then the result of multiplying the weights by the pixel values is added, and the result of the summed pixel values is taken as the pixel value of the saliency map of the image to be processed, thereby obtaining the image to be processed.
  • the weight of each of the at least two second feature maps may be determined according to a difference between a saliency map of the training image and a reference saliency map corresponding to the training image.
  • the acquisition process of the saliency map of the above training image and the acquisition process of the saliency map of the image to be processed may be identical. Therefore, before processing the image to be processed, the saliency map of the training image may be acquired according to the flow of the method in the above first aspect, and then the difference between the saliency map corresponding to the training image and the reference saliency map according to the training image may be obtained according to the saliency map of the training image.
  • the weights of each of the at least two second feature maps are trained to obtain weights for each of the second feature maps.
  • the reference saliency map corresponding to the above training image may be a saliency map of manual marking, or a saliency map with better effect recognized by the machine.
  • the difference between the saliency map of the above training image and the reference saliency map corresponding to the training image can be represented by a function value of the loss function.
  • the size of the function value of the loss function may be back-propagated in the convolutional neural network, and each weight is adjusted.
  • the loss function may be The direction in which the function value decreases is performed until a global optimal solution is reached (the final adjustment result may be such that the function value of the loss function is minimized or the function value of the loss function is less than a certain threshold).
  • the output loss function is Then, the output loss function of all paths is Where ⁇ m is the weight of the output loss of the mth path, and the loss function when the path fusion module processes the 4 second feature maps output by the 4 paths is L fuse , then the final loss of the image to be processed is processed.
  • the final loss function here can indicate the difference between the saliency map of the above training image and the reference saliency map corresponding to the training image.
  • performing superposition processing on the at least two first feature maps included in the superposition set in the at least two sets in the foregoing step 220 specifically includes: in the superposition set in the at least two sets At least two first feature maps are included for superposition processing; at least two feature maps obtained after the superimposition processing are convoluted to obtain at least two feature maps after convolution processing, wherein the convolution processing is used for extracting
  • the at least two feature maps obtained by the superimposition process are subjected to a pooling process to obtain at least two second feature maps in step 220.
  • the size of the convolution kernel used in convolution processing of at least two feature maps obtained after the superposition processing may be 1, and by convolution processing on at least two feature maps obtained after the superposition processing, The at least two feature maps are integrated and collected, and the high-value features in the feature map are highlighted.
  • the features of the image obtained by superimposition can be further extracted, and the extracted features are used as the second feature map, which can reduce the calculation amount of subsequent processing.
  • feature extraction can eliminate the features with less value and improve the effect of the resulting saliency map.
  • the splicing of the at least two second feature images of the to-be-processed image in the foregoing step 230 includes: performing convolution processing on the at least two second feature images to obtain at least two The feature of the second feature map; splicing the features of the at least two second feature maps to obtain a saliency map of the image to be processed.
  • the convolution kernel may have a size of 1 when the convolution processing is performed on the at least two second feature maps, and the convolution processing may further extract features in the second feature map such that local features in the processed image There is a greater degree of discrimination between them, resulting in better saliency detection.
  • Convolution processing of the image before image stitching can further extract the features of the image, and use the extracted feature map as the basis of subsequent stitching, which can reduce the complexity of subsequent stitching, and can eliminate the value by feature extraction. Small features, which can improve the effect of the resulting saliency map.
  • step 220 and step 230 performing superposition processing on at least two first feature maps included in the superposition set in the at least two sets is equivalent to respectively respectively pairing at least two first features along different paths.
  • the image is superimposed, and the splicing of the at least two second feature images is equivalent to splicing the second feature image obtained by at least two different paths.
  • the convolutional neural network architecture includes four levels (each level is equivalent to one convolution layer), four paths, and one path fusion module, wherein levels 1 to 4 respectively treat the image ( The resolution of the image to be processed shown in Fig. 3 is 256 ⁇ 256.
  • Convolution processing is performed to obtain four first feature maps having resolutions of 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8, respectively.
  • the first feature map in each path is processed along the path 1 to the path 4 respectively, and the specific process is: the resolutions along the path 1 are 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, 8 ⁇ 8, respectively.
  • the four first feature maps are superimposed to obtain a second feature map; along the path 2, three first feature maps having resolutions of 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8 are superimposed to obtain one.
  • a second feature map superimposing two first feature maps with resolutions of 16 ⁇ 16 and 8 ⁇ 8 along path 3 to obtain a second feature map; and having a resolution of 8 ⁇ 8 along path 4
  • the first feature map is processed to obtain a second feature map (in the path 4, the first feature map with a resolution of 8 ⁇ 8 can be directly determined as the second feature map corresponding to the path 4).
  • four second feature maps were obtained along four paths.
  • the path fusion module splices the second feature maps of the paths 1 to 4 to obtain a saliency map of the image to be processed.
  • the first feature map in each path has a corresponding weight
  • the path fusion module is in the pair of paths 1
  • the second feature maps acquired from each path also have their own weights.
  • These weights can be trained according to the function value of the loss function to obtain these weights.
  • the function value of the loss function can be backpropagated along the architecture of FIG. 3, and the weights are adjusted in a direction that causes the function value of the loss function to decrease until the global optimal solution is reached (the final adjustment result can be The function value of the loss function is minimized or the function value of the loss function is less than a certain threshold).
  • the path fusion module may first perform convolution processing on the second feature map of the path 1 to the path 4, and then perform splicing.
  • the first feature maps of Level 1 to Level 4 are respectively acquired (the resolutions of the first feature maps obtained from Level 1 to Level 4 are 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, 8 ⁇ , respectively. 8), the first feature map acquired by level 2 to level 4 is upsampled according to the resolution of 64 ⁇ 64 (the resolution of the first feature map acquired by level 1 is 64 ⁇ 64, therefore, no longer The first feature map acquired by level 1 is upsampled, but the first feature map obtained by level 1 can be directly sampled to obtain a third feature map, and finally four fourth resolutions with resolutions of 64 ⁇ 64 are obtained.
  • Feature map next, four fourth feature maps with resolutions of 64 ⁇ 64 are superimposed to obtain a fourth feature map; finally, the fourth feature map is convoluted and pooled, and in addition,
  • the activation function for example, the current linear rectification function or the Rectified Linear Units (ReLU) fine-tunes the image obtained by the convolution processing and the pooling processing, and finally obtains the second characteristic map corresponding to the path 1.
  • convolving and splicing the at least two second feature maps of the image to be processed, and obtaining the saliency map of the image to be processed specifically: performing upsampling on the at least two second feature images to obtain at least the same resolution.
  • Table 1 shows the comparison results of the method of image saliency object detection and the detection data of other methods in the embodiment of the present application, wherein uss represents the detection data of the method of the present application, RC [7], CHM [29], DSR [30], DRFI [22], MC [49], ELD [12], MDF [27], DS [12], RFCN [45], DHS [34], and DCL [28] correspond to the detection of other methods.
  • F ⁇ value the lower the Mean Squared Error (MAE) value, indicating better algorithm performance.
  • FIG. 5 shows a comparison result of the method for image saliency object detection of the embodiment of the present application and other methods for the saliency map obtained by the original image processing, wherein DCL, DHS, RFCN, DS, MDF, ELD, MC
  • the DRFI and the DSR correspond to the saliency maps obtained by the other methods for the original image processing.
  • the saliency map obtained by the method of the present application is closer to the true saliency map than the other methods (this is truly significant).
  • the saliency map can be obtained by manual labeling), and therefore, the saliency map obtained by the method of the present application is more effective.
  • the segmentation image of the image to be processed can also be obtained by combining the image to be processed and the saliency map of the image to be processed.
  • the saliency map of the image to be processed may be guided and filtered according to the image to be processed to obtain a segmented image of the image to be processed. It should be understood that the segmented image of the image to be processed may also be considered as one of the saliency maps.
  • the image segmentation edge in the saliency map of the image to be processed can be further optimized by the guide filtering, and a segmented image with better effect can be obtained.
  • the first saliency map may be first upsampled to obtain resolution.
  • a second saliency map whose rate is consistent with the image to be processed, and then the second saliency map is guided and filtered according to the image to be processed to obtain a segmented image of the image to be processed.
  • the resolution of the saliency image of the image to be processed is 64 ⁇ 64, and the resolution of the image to be processed is 256 ⁇ 256, then the resolution of the image to be processed may be adjusted to 256 ⁇ 256, and then according to The saliency map of the image to be processed is subjected to guided filtering to obtain a segmented image of the image to be processed.
  • FIG. 6 is a schematic diagram of an image saliency object detecting method according to an embodiment of the present application.
  • the method 300 specifically includes:
  • the original image here may be a photo containing a portrait, which may be a photo taken by a mobile phone.
  • the original image may be downsampled to obtain an image with a resolution of 128 ⁇ 128.
  • the convolutional neural network model herein can be as shown in FIG. 3, and when training the convolutional neural network model, different data sets can be adopted according to different scenarios, for example, when performing portrait segmentation, The convolutional neural network model is trained by the portrait segmentation dataset. When the vehicle is segmented, the convolutional neural network model can be trained using the vehicle segmentation dataset.
  • step 330 Upsample the saliency map obtained in step 320 to obtain a saliency map of the same size as the original image.
  • the resolution of the original image is 256 ⁇ 256
  • the resolution of the saliency map obtained in step 320 is 128 ⁇ 128. Then, the resolution of the saliency map can be adjusted from 128 ⁇ 128 to 256 ⁇ 256 by upsampling.
  • step 340 Perform guiding filtering on the saliency map finally obtained in step 330 according to the original image to obtain a segmented image of the image to be processed.
  • the image edge of the saliency map obtained in step 330 can be optimized, thereby obtaining a segmented image with better effect.
  • the divided image of the original image can be obtained.
  • the original image can be subjected to portrait beautification, large aperture effect and the like according to the segmented image of the original image, so as to realize the beautification processing of the original image and improve the original.
  • the display of the image is not limited to portrait beautification, large aperture effect and the like according to the segmented image of the original image, so as to realize the beautification processing of the original image and improve the original.
  • the method for image saliency object detection in the embodiments of the present application can implement segmentation of objects in an image in various scenarios, for example, segmentation of important objects such as characters, vehicles, animals, and the like in the image can be realized.
  • the application of the image saliency object detection method in the embodiments of the present application in the two common scenarios of portrait segmentation and vehicle segmentation will be described in detail below with reference to the first and second embodiments.
  • the convolutional neural network model can be as shown in FIG.
  • the portrait segmentation data set includes a portrait image (a picture containing a portrait) and a true saliency map corresponding to the portrait image.
  • a portrait image a picture containing a portrait
  • a true saliency map corresponding to the portrait image.
  • the input portrait image is downsampled, which can reduce the resolution of the image and reduce the complexity of the subsequent processed image.
  • the filtered output image is: Where r is the filter radius and eps is the smoothing parameter.
  • the guided filtering further optimizes the segmentation edge of the portrait, so that the edge of the segmentation image is more clear.
  • the portrait edge cannot be accurately attached, and a false detection or a missed detection may occur in a partial region of the image, and the method of the present application can accurately locate the portrait in a complicated scene, and can simultaneously Accurately fit the edge of the portrait to achieve a better segmentation effect.
  • an image with a lower resolution can be obtained, and then basic portrait segmentation can be realized on the image with lower resolution.
  • the method of the present application can automatically detect a portrait in an image without manual interaction, and realize segmentation of a portrait.
  • FIG. 7 The image segmentation result of the present application and the prior art are shown in FIG. 7.
  • the image saliency detection method of the embodiment of the present application can accurately display the image with significant features.
  • the area of the object is distinguished, and the effect of the saliency analysis is better.
  • Vehicle segmentation image S l For the input road scene picture I h , first down-sample to obtain a low-resolution picture I l , and then process the down-sampled low-resolution picture through the trained convolutional neural network, and finally output low resolution.
  • Vehicle segmentation image S l For the input road scene picture I h , first down-sample to obtain a low-resolution picture I l , and then process the down-sampled low-resolution picture through the trained convolutional neural network, and finally output low resolution.
  • Vehicle segmentation image S l For the input road scene picture I h , first down-sample to obtain a low-resolution picture I l , and then process the down-sampled low-resolution picture through the trained convolutional neural network, and finally output low resolution.
  • Vehicle segmentation image S l For the input road scene picture I h , first down-sample to obtain a low-resolution picture I l , and then process the down-sampled
  • the image saliency object detection method of the embodiment of the present application can be applied to other scenarios, as long as The training data in this scenario trains the convolutional neural network, and then the processed image is processed accordingly, which can also achieve better results.
  • the method of the present application can realize basic vehicle segmentation on a lower resolution image, and can ensure semantic accuracy in a complex and varied background environment, and finally, by performing guided filtering on a high resolution image, it can be ensured.
  • the degree of detail on the edge can automatically detect and segment the vehicle in the image piece without manual interaction, and can assist the automatic driving to make the decision.
  • the invention can effectively segment the vehicle edge and improve the vehicle pose estimation, the vehicle distance and the like. Judging ability.
  • the model shown in Figure 3 is a network model based on the ResNet-101 architecture, in which there are 4 levels, 4 paths (where the path is equivalent to the above set), and a multipath fusion
  • the module wherein the resolutions corresponding to the levels 1 to 4 are 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8, respectively.
  • Each of the four paths receives at least one of the four levels.
  • Feature map as input path 1 receives feature maps of 4 levels (level 1 to level 4), path 2 receives feature maps of 3 levels (level 2 to level 4), and path 3 receives 2 levels (level 3 to level) 4)
  • the feature map, path 4 receives a feature map of one level (level 4).
  • Level 1 to Level 4 acquire feature maps of corresponding resolutions from the image to be processed.
  • Level 1 to Level 4 respectively acquire feature maps having resolutions of 64 ⁇ 64, 32 ⁇ 32, 16 ⁇ 16, and 8 ⁇ 8 from the image to be processed.
  • the paths 1 to 4 respectively fuse the feature maps of the at least one layer.
  • path 1 receives the feature maps of level 1 to level 4, and upsamples the feature maps of path 1 to path 4 to obtain 4 images with the same resolution, and then, Four images with the same resolution are summed to obtain the summed feature map, and then the summed feature map is convoluted and pooled, and the linear rectification function is used to convolve the pooled features. The figure is fine-tuned to finally obtain the feature map of path 1.
  • the multipath fusion module fuses the feature maps of path 1 to path 4.
  • the multipath fusion module upsamples the feature maps of path 1 to path 4, and samples four feature maps with a resolution of 64 ⁇ 64.
  • convolution and splicing operations are performed on the four feature maps.
  • the feature map obtained by the convolution and splicing operation is upsampled to the size of the image to be processed (resolution 128 ⁇ 128), and the saliency map of the image to be processed is obtained.
  • the network architecture shown in FIG. 3 is only one possible architecture of the method in the embodiment of the present application.
  • several improvements and replacements may be performed on the basis of the network architecture shown in FIG. 3, for example, changing a volume.
  • the number of layers and paths, the correspondence between the paths and the convolutional layer, and the like, and the improvement and replacement of the network architecture are all within the scope of the present application.
  • the image saliency object detecting method of the embodiment of the present application can be applied to segmentation of important objects in a picture.
  • the portraits in the picture are segmented with other background objects, and the portrait and the background are separately processed (for example, the skin is processed for the portrait, the background is blurred, and the background color is enhanced, The four corners of the background are subjected to the darkening treatment, etc., so as to finally achieve the artistic effect of highlighting the portrait and beautifying the portrait.
  • the method in the embodiment of the present application can apply the selfie in the portrait mode and the large aperture effect.
  • the method of the embodiment of the present application can also be applied to portrait stylization, portrait beautification, portrait background editing and synthesis (for example, generation of a photo ID, synthesis of a photo of a photo spot, etc.).
  • the portrait image in the original image may be stylized according to the saliency map obtained by the analysis, or the portrait may be beautified. Process, or replace the background in the original image.
  • the method in the embodiment of the present application may also be applied to object segmentation, object recognition, and the like in an image.
  • the image saliency object detecting method of the embodiment of the present application is described in detail with reference to FIG. 2 to FIG. 7 .
  • the image saliency object detecting device of the embodiment of the present application will be described below with reference to FIG. 8 and FIG. 9 .
  • 8 and the apparatus of Fig. 9 are capable of performing the corresponding steps of the image saliency object detecting method hereinabove, and for the sake of brevity, the repeated description is appropriately omitted below.
  • FIG. 8 is a schematic block diagram of an image saliency object detecting apparatus according to an embodiment of the present application.
  • the apparatus 800 of Figure 8 includes:
  • the convolution module 810 is configured to perform convolution processing corresponding to at least two convolution layers on the to-be-processed image, to obtain at least two first feature images of the to-be-processed image, and the resolution of the at least two first feature images
  • the ratio is smaller than the resolution of the image to be processed, and the resolutions of any two of the at least two first feature maps are different;
  • the superimposing module 820 is configured to perform superposition processing on at least two first feature images included in the superposition set in the at least two sets, thereby obtaining at least two second feature images of the to-be-processed image, where the at least The two sets respectively correspond to different resolutions, and the at least two sets are in one-to-one correspondence with the at least two second feature maps, and the resolution of the first feature map included in the superposition set is less than or equal to the superposition The resolution of the corresponding second feature map of the set;
  • the splicing module 830 is configured to splicing the at least two second feature images to obtain a saliency map of the image to be processed.
  • the superimposing module 820 is specifically configured to: perform upsampling on a first feature image in which the resolution in the superposition set is smaller than a resolution of the second feature image corresponding to the superposition set, Thereby obtaining at least two third feature maps having the same resolution of the second feature map corresponding to the superposition set, the at least two third feature maps being in one-to-one correspondence with the at least two first feature maps; And superposing the at least two third feature maps to obtain a second feature map corresponding to the overlay set.
  • the superimposing module 820 is specifically configured to: superimpose the at least two third feature maps according to respective weights of the at least two third feature maps to obtain the Two feature maps.
  • the weight of each third feature map in the at least one third feature map is trained according to a difference between a saliency map of the training image and a reference saliency map corresponding to the training image. of.
  • the splicing module 830 is specifically configured to: splicing the at least two second feature maps according to respective weights of the at least two second feature maps, to obtain the Process the saliency map of the image.
  • the weight of each of the at least two second feature maps is determined according to a difference between a saliency map of the training image and a reference saliency map corresponding to the training image. of.
  • the superimposing module 820 is specifically configured to: perform superposition processing on at least two first feature images included in the superposition set in the at least two sets; At least two feature maps are subjected to convolution processing to obtain at least two feature maps after the convolution processing, wherein the convolution processing is used to extract features of at least two feature maps obtained after the superimposition processing And performing at least two feature maps obtained by the convolution process to perform pooling processing to obtain the at least two second feature maps.
  • the splicing module 830 is specifically configured to: perform convolution processing on the at least two second feature maps to obtain features of the at least two second feature maps; The features of the at least two second feature maps are stitched to obtain a saliency map of the image to be processed.
  • the apparatus 800 further includes: a filtering module 840, configured to perform filtering and filtering on the saliency map of the image to be processed according to the image to be processed, to obtain a segmentation of the image to be processed. image.
  • a filtering module 840 configured to perform filtering and filtering on the saliency map of the image to be processed according to the image to be processed, to obtain a segmentation of the image to be processed. image.
  • the saliency map is a first saliency map
  • the resolution of the first saliency map is smaller than the resolution of the image to be processed
  • the filtering module 840 is specifically configured to: Upsampling the first saliency map to obtain a second saliency map having the same resolution as the image to be processed; and performing guided filtering on the second saliency map according to the image to be processed, to obtain the A segmented image of the image to be processed.
  • FIG. 9 is a schematic block diagram of an image saliency object detecting apparatus according to an embodiment of the present application.
  • the apparatus 900 of Figure 9 includes:
  • a memory 910 configured to store a program
  • the processor 920 is configured to execute a program stored in the memory 910. When the program of the memory 910 is executed, the processor 920 is specifically configured to: perform convolution corresponding to at least two convolution layers on the image to be processed respectively. Processing, obtaining at least two first feature maps of the image to be processed, the resolution of the at least two first feature maps being smaller than a resolution of the image to be processed, in the at least two first feature maps The resolutions of any two first feature maps are different; at least two first feature maps included in the superposition set of the at least two sets are superimposed to obtain at least two second feature maps of the to-be-processed image The at least two sets respectively correspond to different resolutions, and the at least two sets are in one-to-one correspondence with the at least two second feature maps, and the resolution of the first feature map included in the superposition set And less than or equal to a resolution of the second feature map corresponding to the superposition set; splicing the at least two second feature maps to obtain a saliency map of the to-be-processe
  • the processor 920 is specifically configured to: perform upsampling, on a first feature map in which the resolution in the overlay set is smaller than a resolution of the second feature map corresponding to the overlay set, Thereby obtaining at least two third feature maps having the same resolution of the second feature map corresponding to the superposition set, the at least two third feature maps being in one-to-one correspondence with the at least two first feature maps; And superposing the at least two third feature maps to obtain a second feature map corresponding to the overlay set.
  • the processor 920 is specifically configured to: superimpose the at least two third feature maps according to respective weights of the at least two third feature maps to obtain the Two feature maps.
  • the weight of each third feature map in the at least one third feature map is trained according to a difference between a saliency map of the training image and a reference saliency map corresponding to the training image. of.
  • the processor 920 is specifically configured to: splicing the at least two second feature maps according to respective weights of the at least two second feature maps, to obtain the Process the saliency map of the image.
  • the weight of each of the at least two second feature maps is determined according to a difference between a saliency map of the training image and a reference saliency map corresponding to the training image. of.
  • the processor 920 is specifically configured to: perform superposition processing on at least two first feature images included in the superposition set in the at least two sets; At least two feature maps are subjected to convolution processing to obtain at least two feature maps after the convolution processing, wherein the convolution processing is used to extract features of at least two feature maps obtained after the superimposition processing And performing at least two feature maps obtained by the convolution process to perform pooling processing to obtain the at least two second feature maps.
  • the processor 920 is specifically configured to: perform convolution processing on the at least two second feature maps to obtain features of the at least two second feature maps; The features of the at least two second feature maps are stitched to obtain a saliency map of the image to be processed.
  • the processor 920 is further configured to: perform guided filtering on the saliency map of the to-be-processed image according to the to-be-processed image to obtain a segmented image of the to-be-processed image.
  • the saliency map is a first saliency map
  • the resolution of the first saliency map is smaller than the resolution of the image to be processed
  • the processor 920 is specifically configured to: Upsampling the first saliency map to obtain a second saliency map having the same resolution as the image to be processed; and performing guided filtering on the second saliency map according to the image to be processed, to obtain the A segmented image of the image to be processed.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, at least two units or components may be combined. Or it can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to at least two network units. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

一种图像显著性物体检测方法与装置。该方法包括:分别对待处理图像进行至少两个卷积层对应的卷积处理,得到待处理图像的至少两个第一特征图,至少两个第一特征图的分辨率小于待处理图像的分辨率,至少两个第一特征图中的任意两个第一特征图的分辨率不同(210);对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到待处理图像的至少两个第二特征图,其中,至少两个集合分别对应不同的分辨率,至少两个集合与至少两个第二特征图一一对应,叠加集合中包含的第一特征图的分辨率小于等于叠加集合对应的第二特征图的分辨率(220);对至少两个第二特征图拼接,得到显著性图(230)。该方法能提高显著性物体检测的效果。

Description

图像显著性物体检测方法和装置
本申请要求于2017年06月23日提交中国专利局、申请号为201710488970.4、申请名称为“图像显著性物体检测方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机图像处理领域,并且更具体地,涉及一种图像显著性物体检测方法和装置。
背景技术
显著性物体检测是从图像中检测出最能够引起人眼视觉注意的物体区域的过程。现有的图像显著性物体检测方法是在现有卷积神经网络架构的基础上,对卷积神经网络架构进行微调从而实现对图像的显著性物体检测。具体地,如图1所示,卷积神经网络架构中的每个卷积层后面都连接有一个侧输出层,所有的侧输出层的后面共同连接一个融合层。在对图像进行处理时,待处理图像经过卷积层后分别输出不同分辨率的特征图,接下来,不同分辨率的特征图再分别经过侧输出层简单采样处理后得到侧输出特征图,最后融合层将从不同层次得到的侧输出特征图进行求和,得到待处理图像的显著性检测结果,进而实现对图像的显著性物体检测。但是由于不同侧输出层的侧输出特征图的显著性检测图像相差较大(浅层的特征图过于杂乱,深层提取到的特征图缺乏规律性),因此,在融合层直接将不同层次的侧输出特征图进行简单融合,那么最终得到的待处理图像的显著性检测结果并不理想。
发明内容
本申请提供一种图像显著性物体检测方法和装置,以提高图像显著性物体检测的效果。
第一方面,提供了一种图像显著性物体检测方法,该方法包括:分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;对所述至少两个第一特征图进行处理,从而得到所述待处理图像的至少两个第二特征图;对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。其中,所述至少两个第二特征图中至少一个第二特征图是由对所述至少两个第一特征图中的多个第一特征图进行叠加处理得到的,所述至少两个第二特征图的任意两个第二特征图分辨率不同。此外,所述至少一个第二特征图的分辨率大于或者等于用于得到所述至少一个第二特征图中的所述多个第一特征图中的最大分辨率。
本申请通过对待处理图像的至少两个第一特征图进行处理,得到分辨率大于或者等于 该至少两个第一特征图,且所述至少两个第二特征图中的任意两个第二特征图的分辨率不同,对具有不同分辨率的该至少两个特征图进行拼接,从而得到效果更好的显著性图。
例如,在对至少两个第一特征图进行叠加处理时,可以根据较低分辨率的第一特征图来帮助定位较高分辨率的第一特征图的最显著的区域,另外,也可以根据较高分辨率的第一特征图来改进较低分辨率的第一特征图的稀疏和不规则性,这样最终对该叠加集合中的至少两个第一特征图叠加处理得到的第二特征图就能够较好地显示图图像中的显著性区域,再对至少两个第二特征图进行拼接后就能得到效果较好的显著性图。
应理解,在分别对待处理图像进行至少两个卷积层对应的卷积处理时,采用的卷积处理的卷积核的大小可以为1,并且,进行至少两个卷积层对应的卷积处理的作用可以是从待处理图像提取显著性分割需要的特征图。
结合第一方面,在第一方面的某些实现方式中,对所述至少两个第一特征图中的多个第一特征图进行叠加处理,包括:对所述多个第一特征图中分辨率小于所要得到的所述至少一个第二特征的分辨率的第一特征图进行上采样,从而得所述第一特征图对应的第三特征图,所述第三特征图的分辨率等于所要得到的所述至少一个第二特征图的分辨率;将所述上采样得到的第三特征图与所述多个第一特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图。应理解,某些第一特征图的分辨率可能会小于某些第二特征图的分辨率,这时,通过某些分辨率较小的第一特征图进行上采样,能够使得每个第一特征图的分辨率保持一致,能够保证叠加处理的效果。
在具体实现中,所述多个第一特征图中未经上采样的第一特征图可以不存在,例如,可以对多个特征图中的全部进行上采样,并将上采样的得到的第三特征图进行叠加处理,从而得到所述至少一个特征图。
结合第一方面,在第一方面的某些实现方式中,所述将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图,包括:根据所述各个第三特征图或第一个特征图对应的权重,对所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图。
结合第一方面,在第一方面的某些实现方式中,所述权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
结合第一方面,在第一方面的某些实现方式中,所述对所述待处理图像的至少两个第二特征图进行拼接,得到所述待处理图像的显著性图,包括:根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
具体地,可以将至少两个第二特征图中的每个第二特征图对应的权重与每个第二特征图的像素值相乘,然后将权重与像素值相乘的结果进行加和,将加和得到的像素值的结果作为待处理图像的显著性图的像素值,从而得到待处理图像的显著性图。
结合第一方面,在第一方面的某些实现方式中,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
结合第一方面,在第一方面的某些实现方式中,所述将所述上采样得到的第三特征图 与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图,包括:将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加、卷积以及池化处理得到所述至少一个第二特征图。
结合第一方面,在第一方面的某些实现方式中,对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图,包括:对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
应理解,在对上述至少两个第二特征图进行卷积处理时的卷积核的大小可以为1,通过卷积处理,能够进一步提取第二特征图中的特征,使得处理后的图像中的局部特征之间更有区分度,从而得到更好的显著性检测效果。
在图像拼接之前先对图像进行卷积处理,能够进一步提取图像的特征,并将提取到的特征图作为后续拼接的基础,可以减少后续拼接的复杂度,并且通过特征提取还能排除掉价值较小的特征,从而能够提高最终得到的显著性图的效果。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
通过引导滤波能够进一步优化待处理图像的显著性图中的图像分割边缘,得到效果较好的分割图像。
结合第一方面,在第一方面的某些实现方式中,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述第三处理模块具体用于:对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
第二方面,提供了一种图像显著性物体检测方法,所述方法包括:分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,其中,所述至少两个集合分别对应不同的分辨率,所述至少两个集合与所述至少两个第二特征图一一对应,所述叠加集合中包含的第一特征图的分辨率小于或者等于所述叠加集合对应的第二特征图的分辨率;对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
在通过卷积处理得到了至少两个第一特征图之后,本申请不是像现有技术那样直接将该至少两个第一特征图进行叠加处理来得到最终的显著性图,而是先按照分辨率确定至少两个集合,并对其中的叠加集合中的包含的特征图进行叠加,然后再对各个集合得到的第二特征图进行拼接处理,以得到待处理图像的显著性图,而且在进行叠加和拼接的过程中,充分考虑了不同分辨率的特征图的特点,能够得到效果更好的显著性图。
具体地,例如,在对某个叠加集合中包含的至少两个第一特征图进行叠加处理时,可以根据该叠加集合中的较低分辨率的第一特征图来帮助定位较高分辨率的第一特征图的最显著的区域,另外,也可以根据较高分辨率的第一特征图来改进较低分辨率的第一特征图的稀疏和不规则性,这样最终对该叠加集合中的至少两个第一特征图叠加处理得到的第 二特征图就能够较好地显示图图像中的显著性区域,接下来再对至少两个集合叠加处理后得到的至少两个第二特征图进行拼接后就能得到效果较好的显著性图。
应理解,上述叠加集合可以是指至少两个集合中包括至少两个第一特征图的集合,另外,上述至少两个集合除了包含叠加集合之外还可以包含其它集合,例如,上述至少两个集合中也可以包含仅有一个第一特征图的集合。当某个集合中仅包含一个第一特征图时,不对该集合中包含的第一特征图进行叠加,而是直接可以将该第一特征图确定为该集合对应的第二特征图。
还应理解,上述至少两个集合中的每个集合对应的分辨率可以是指对该集合中的第一特征图进行叠加处理后得到的第二特征图的分辨率。
另外,在分别对待处理图像进行至少两个卷积层对应的卷积处理时,采用的卷积处理的卷积核的大小可以为1,并且,进行至少两个卷积层对应的卷积处理的作用可以是从待处理图像提取显著性分割需要的特征图,接下来再对提取到的特征图进行进一步的处理,以得到待处理图像的显著性图。
结合第二方面,在第二方面的某些实现方式中,所述分别对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,包括:对所述叠加集合中的分辨率小于所述叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与所述叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,所述至少两个第三特征图与所述至少两个第一特征图一一对应;将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图。
应理解,上述叠加集合中的某些第一特征图的分辨率可能会小于该叠加集合对应的第二特征图的分辨率,这时,通过某些分辨率较小的第一特征图进行上采样,能够使得该叠加集合中的每个第一特征图的分辨率保持一致,保证了叠加处理的效果。
结合第二方面,在第二方面的某些实现方式中,所述将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图,包括:根据所述至少两个第三特征图各自对应的权重,对所述至少两个第三特征图进行叠加,得到所述第二特征图。
具体地,可以将至少两个第三特征图中的每个第三特征图对应的权重与每个第三特征图的像素值相乘,并对相乘后的结果进行加和,将加和得到的结果作为第二特征图的像素值,从而第二特征图。
结合第二方面,在第二方面的某些实现方式中,所述至少一个第三特征图中的每个第三特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
上述训练图像的显著性图的获取流程与待处理图像的显著性图的获取流程可以是一致的。因此,在对待处理图像进行处理之前,可以先按照上述第二方面中的方法的流程来获取训练图像的显著性图,然后根据训练图像的显著性图与训练图像对应的参考显著性图的差异对上述至少一个第三特征图中的每个第三特征图的权重进行训练,从而得到每个第三特征图的权重。
结合第二方面,在第二方面的某些实现方式中,所述对所述待处理图像的至少两个第二特征图进行拼接,得到所述待处理图像的显著性图,包括:根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性 图。
具体地,可以将至少两个第二特征图中的每个第二特征图对应的权重与每个第二特征图的像素值相乘,然后将权重与像素值相乘的结果进行加和,将加和得到的像素值的结果作为待处理图像的显著性图的像素值,从而得到待处理图像的显著性图。
结合第二方面,在第二方面的某些实现方式中,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
上述训练图像的显著性图的获取流程与待处理图像的显著性图的获取流程可以是一致的。因此,在对待处理图像进行处理之前,可以先按照上述第二方面中的方法的流程来获取训练图像的显著性图,然后根据训练图像的显著性图与训练图像对应的参考显著性图的差异对上述至少两个第二特征图中的每个第二特征图的权重进行训练,从而得到每个第二特征图的权重。
结合第二方面,在第二方面的某些实现方式中,所述对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,包括:对所述至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;对所述叠加处理后得到的至少两个特征图进行卷积处理,以得到所述卷积处理后的至少两个特征图,其中,所述卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;对所述卷积处理得到的至少两个特征图进行池化处理,以得到所述至少两个第二特征图。
在对叠加处理之后得到的至少两个特征图进行卷积处理时的卷积核的大小可以为1,通过卷积处理,能够对叠加处理后得到的至少两个特征图进行整合收集,将特征图中高价值的特征凸显出来。
通过对叠加之后得到的至少两个特征图进行卷积处理和池化处理,能够进一步提取叠加后得到的图像的特征,并将提取的特征作为第二特征图,能够减少后续处理的运算量,另外,通过特征提取还能排除掉价值较小的特征,提高最终得到的显著性图的效果。
结合第二方面,在第二方面的某些实现方式中,对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图,包括:对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
应理解,在对上述至少两个第二特征图进行卷积处理时的卷积核的大小可以为1,通过卷积处理,能够进一步提取第二特征图中的特征,使得处理后的图像中的局部特征之间更有区分度,从而得到更好的显著性检测效果。
在图像拼接之前先对图像进行卷积处理,能够进一步提取图像的特征,并将提取到的特征图作为后续拼接的基础,可以减少后续拼接的复杂度,并且通过特征提取还能排除掉价值较小的特征,从而能够提高最终得到的显著性图的效果。
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
通过引导滤波能够进一步优化待处理图像的显著性图中的图像分割边缘,得到效果较好的分割图像。
结合第二方面,在第二方面的某些实现方式中,所述显著性图为第一显著性图,所述 第一显著性图的分辨率小于所述待处理图像的分辨率,所述第三处理模块具体用于:对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
第三方面,提供了一种图像显著性物体检测装置,该装置包括用于执行所述第一方面或其各种实现方式中的方法的模块。
第四方面,提供一种图像显著性物体检测装置,该装置包括:存储介质,以及中央处理器,所述存储介质中存储有计算机可执行程序,所述中央处理器与所述存储介质连接,并执行所述计算机可执行程序以实现所述第一方面或其各种实现方式中的方法。
第五方面,提供一种计算机可读介质,所述计算机可读介质存储用于设备执行的程序代码,所述程序代码包括用于执行第一方面或其各种实现方式中的方法的指令。
附图说明
图1是现有的图像显著性物体检测方法的网络架构示意图。
图2是本申请实施例的图像显著性物体检测方法的示意性流程图。
图3是本申请实施例的卷积神经网络架构的示意图。
图4是对第一特征图进行处理的示意图。
图5是本申请实施例得到的显著性图与其它方法得到的显著性图的对比图。
图6是本申请实施例的图像显著性物体检测方法的示意性图。
图7是本申请实施例得到的显著性图与其它方法得到的显著性图的对比图。
图8是本申请实施例的图像显著性物体检测装置的示意性框图。
图9是本申请实施例的图像显著性物体检测装置的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图2是本申请实施例的图像显著性物体检测方法的示意性流程图。图2的方法可以在卷积神经网络的网络架构中执行,图2的方法200具体包括:
210、分别对待处理图像进行至少两个卷积层对应的卷积处理,得到待处理图像的至少两个第一特征图,其中,该至少两个第一特征图的分辨率小于所述待处理图像的分辨率,该至少两个第一特征图中的任意两个第一特征图的分辨率不同。
在上述对待处理图像进行卷积处理时采用的卷积核的大小可以为1,并且,进行卷积处理的作用可以是从待处理图像提取显著性分割需要的特征图,接下来再对提取到的特征图进行进一步的处理,以得到待处理图像的显著性图。
上述待处理图像可以是需要处理的原始图像,也可以是对原始图像进行下采样处理后得到的图像。在获取图像的显著性图之前先对原始图像进行下采样处理,能够降低图像的分辨率,减少后续处理图像的复杂度。
上述第一特征图的分辨率可以小于待处理图像的分辨率,例如,待处理图像为分辨率256×256的图像,那么,第一特征图的分辨率可以是128×128、64×64、32×32、16×16、8×8等等。
另外,在对待处理图像进行卷积处理时,可以是在不同的卷积层分别对待处理图像进 行卷积处理已得到具备不同分辨率的第一特征图,例如,待处理图像为分辨率256×256的图像,在4个卷积层对待处理图像进行卷积处理后得到分辨率分别为64×64、32×32、16×16、8×8的4个第一特征图。
220、对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到待处理图像的至少两个第二特征图,其中,至少两个集合分别对应不同的分辨率,至少两个集合与所述至少两个第二特征图一一对应,叠加集合中包含的第一特征图的分辨率小于或者等于叠加集合对应的第二特征图的分辨率。
应理解,上述步骤220中的叠加集合可以是指至少两个集合中包括至少两个第一特征图的集合,另外,上述至少两个集合除了包含叠加集合之外还可以包含其它集合,例如,上述至少两个集合中也可以包含仅有一个第一特征图的集合。当某个集合中仅包含一个第一特征图时,不对该集合中包含的第一特征图进行叠加,而是直接可以将该第一特征图确定为该集合对应的第二特征图。另外,还应理解,上述至少两个集合中的每个集合对应的分辨率可以是指对该集合中的第一特征图进行叠加处理后得到的第二特征图的分辨率。
具体地,例如,在对某个叠加集合中包含的至少两个第一特征图进行叠加处理时,可以根据该叠加集合中的较低分辨率的第一特征图来帮助定位较高分辨率的第一特征图的最显著的区域,另外,也可以根据较高分辨率的第一特征图来改进较低分辨率的第一特征图的稀疏和不规则性,这样最终对该叠加集合中的至少两个第一特征图叠加处理得到的第二特征图就能够较好地显示图图像中的显著性区域,接下来再对至少两个集合叠加处理后得到的至少两个第二特征图进行拼接后就能得到效果较好的显著性图。
下面结合具体的情况对得到上述至少两个集合的至少两个第二特征图进行说明,例如,对待处理图像进行卷积处理后得到四个第一特征图,该四个第一特征图分别为A、B、C、D,其中,A、B、C、D的分辨率分别为64×64、32×32、16×16、8×8,集合1至集合4对应的分辨率分别为64×64、32×32、16×16、8×8。那么,集合1中包含A、B、C和D,集合2包含B、C和D,集合3包含C和D,集合4仅包含D。其中,集合1至集合3中均包含至少两个第一特征图,因此,集合1至集合3可以被称为叠加集合,由于集合4仅包含一个第一特征图,因此,集合4不属于叠加集合。对于集合1来说,通过对A、B、C和D的叠加可以得到集合1对应的第二特征图,对于集合2来说,通过对B、C和D的叠加可以得到集合2对应的第二特征图,对于集合3来说,通过对C和D的叠加可以得到集合2对应的第二特征图,而对于集合4来说,由于仅包含D,因此,可以直接将D确定集合4对应的第二特征图。
可选地,上述步骤220中分别对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,具体包括:对叠加集合中的分辨率小于叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,至少两个第三特征图与至少两个第一特征图一一对应;将至少两个第三特征图叠加,得到叠加集合对应的第二特征图。
应理解,上述叠加集合中的某些第一特征图的分辨率可能会小于该叠加集合对应的第二特征图的分辨率,这时,通过某些分辨率较小的第一特征图进行上采样,能够使得该叠加集合中的每个第一特征图的分辨率保持一致,保证了叠加处理的效果。
可选地,上述将至少两个第三特征图叠加,得到叠加集合对应的第二特征图,具体包 括:根据至少两个第三特征图各自对应的权重,对至少两个第三特征图进行叠加,得到第二特征图。
应理解,上述根据至少两个第三特征图各自对应的权重,对至少两个第三特征图进行叠加具体可以是指:将至少两个第三特征图中的每个第三特征图对应的权重与每个第三特征图的像素值相乘,并对相乘后的结果进行加和,将加和得到的结果作为第二特征图的像素值,从而第二特征图。
例如,在对某个叠加集合中包含的三个第一特征图进行处理后,得到三个第三特征图,假设这三个特征图分别X,Y,Z,并且X,Y,Z的权重分别是30%、30%、40%,那么再对X,Y,Z进行叠加时,可以将X的像素值的30%、Y的像素值的30%以及Z的像素值的40%进行加和,将加和得到的结果作为叠加后得到的第二特征图W的像素值。
另外,上述至少一个第三特征图中的每个第三特征图的权重可以是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
上述训练图像的显著性图的获取流程与待处理图像的显著性图的获取流程可以是一致的。因此,在对待处理图像进行处理之前,可以先按照上述第一方面中的方法的流程来获取训练图像的显著性图,然后根据训练图像的显著性图与训练图像对应的参考显著性图的差异对上述至少一个第三特征图中的每个第三特征图的权重进行训练,从而得到每个第三特征图的权重。
230、对至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
本申请中,在通过卷积处理得到了至少两个第一特征图之后,不像现有技术那样直接将该至少两个第一特征图进行叠加处理来得到最终的显著性图,而是先按照分辨率确定至少两个集合,并对其中的叠加集合中的包含的特征图进行叠加,然后再对各个集合得到的第二特征图进行拼接处理,以得到待处理图像的显著性图,而且在进行叠加和拼接的过程中,充分考虑了不同分辨率的特征图的特点,能够得到效果更好的显著性图。
具体地,例如,在对某个叠加集合中包含的至少两个第一特征图进行叠加处理时,可以根据该叠加集合中的较低分辨率的第一特征图来帮助定位较高分辨率的第一特征图的最显著的区域,另外,也可以根据较高分辨率的第一特征图来改进较低分辨率的第一特征图的稀疏和不规则性,这样最终对该叠加集合中的至少两个第一特征图叠加处理得到的第二特征图就能够较好地显示图图像中的显著性区域,接下来再对至少两个集合叠加处理后得到的至少两个第二特征图进行拼接后就能得到效果较好的显著性图。
可选地,作为一个实施例,上述步骤230中的对至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图具体包括:根据至少两个第二特征图各自对应的权重,对至少两个第二特征图进行拼接,得到待处理图像的显著性图。
应理解,根据至少两个第二特征图各自对应的权重对至少两个第二特征图进行拼接可以是将至少两个第二特征图中的每个第二特征图对应的权重与每个第二特征图的像素值相乘,然后将权重与像素值相乘的结果进行加和,将加和得到的像素值的结果作为待处理图像的显著性图的像素值,从而得到待处理图像的显著性图。
应理解,上述至少两个第二特征图中的每个第二特征图的权重可以是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
另外,上述训练图像的显著性图的获取流程与待处理图像的显著性图的获取流程可以 是一致的。因此,在对待处理图像进行处理之前,可以先按照上述第一方面中的方法的流程来获取训练图像的显著性图,然后根据训练图像的显著性图与训练图像对应的参考显著性图的差异对上述至少两个第二特征图中的每个第二特征图的权重进行训练,从而得到每个第二特征图的权重。
应理解,上述训练图像对应的参考显著性图可以是人工标记的显著性图,或者是机器识别出来的效果较好的显著性图。上述训练图像的显著性图与训练图像对应的参考显著性图的差异可以用损失函数的函数值来表示。在对权重进行训练的过程中,具体可以将损失函数的函数值的大小在卷积神经网络中反向传播,并对各个权重进行调整,具体地,在调整权重时可以沿着使得损失函数的函数值减小的方向进行,直到达到一个全局最优解(最终的调整结果可以是使得损失函数的函数值最小或者损失函数的函数值小于一定的阈值)。
具体地,在上述方法200中,一共有4个集合,这4个集合可以对应于图3中所示的路径1至路径4,其中,每个路径对该路径对应的第一特征图进行处理时的输出损失函数为
Figure PCTCN2018092514-appb-000001
那么,所有路径的输出损失函数为
Figure PCTCN2018092514-appb-000002
其中α m是第m个路径输出损失的权重,而在路径融合模块对该4个路径输出的4个第二特征图进行处理时的损失函数为L fuse,那么,处理待处理图像的最终损失函数为L final=L fuse+L side。这里的最终损失函数就可以指示上述训练图像的显著性图与训练图像对应的参考显著性图的差异。
可选地,作为一个实施例,在上述步骤220中对至少两个集合中的叠加集合中包含的至少两个第一特征图分别进行叠加处理具体包括:对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;对叠加处理后得到的至少两个特征图进行卷积处理,以得到卷积处理后的至少两个特征图,其中,卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;对卷积处理得到的至少两个特征图进行池化处理,以得到步骤220中的至少两个第二特征图。
应理解,在对叠加处理之后得到的至少两个特征图进行卷积处理时使用的卷积核的大小可以为1,通过对叠加处理之后得到的至少两个特征图进行卷积处理,能够对该至少两个特征图进行整合收集,将特征图中高价值的特征凸显出来。
通过对叠加之后得到的至少两个特征图进行卷积处理和池化处理,能够进一步提取叠加后得到的图像的特征,并将提取的特征作为第二特征图,能够减少后续处理的运算量,另外,通过特征提取还能排除掉价值较小的特征,提高最终得到的显著性图的效果。
可选地,作为一个实施例,上述步骤230中对所述待处理图像的至少两个第二特征图进行拼接具体包括:对至少两个第二特征图进行卷积处理,以得到至少两个第二特征图的特征;对至少两个第二特征图的特征进行拼接,从而得到待处理图像的显著性图。
在对上述至少两个第二特征图进行卷积处理时的卷积核的大小可以为1,通过卷积处理,能够进一步提取第二特征图中的特征,使得处理后的图像中的局部特征之间更有区分度,从而得到更好的显著性检测效果。
在图像拼接之前先对图像进行卷积处理,能够进一步提取图像的特征,并将提取到的特征图作为后续拼接的基础,可以减少后续拼接的复杂度,并且通过特征提取还能排除掉价值较小的特征,从而能够提高最终得到的显著性图的效果。
应理解,在上述步骤220和步骤230中,对至少两个集合中的叠加集合中包含的至少 两个第一特征图进行叠加处理时相当于沿着不同的路径分别对至少两个第一特征图进行叠加处理,在对至少两个第二特征图进行拼接时相当于对至少两个不同路径得到的第二特征图进行拼接。例如,如图3所示,卷积神经网络架构包括4个层次(每个层次相当于一个卷积层),4个路径以及一个路径融合模块,其中,层次1至层次4分别对待处理图像(图3中所示的待处理图像的分辨率为256×256)进行卷积处理,得到分辨率分别为64×64、32×32、16×16、8×8的四个第一特征图。接下来分别沿着路径1至路径4对每个路径中的第一特征图进行处理,具体过程为:沿路径1对分辨率分别为64×64、32×32、16×16、8×8的四个第一特征图进行叠加处理,得到一个第二特征图;沿路径2对分辨率分别为32×32、16×16、8×8的三个第一特征图进行叠加处理,得到一个第二特征图;沿着路径3对分辨率分别为16×16、8×8的两个第一特征图进行叠加处理,得到一个第二特征图;沿着路径4对分辨率为8×8的第一特征图进行处理,得到一个第二特征图(在路径4中可以直接将分辨率为8×8的第一特征图直接确定为路径4对应的第二特征图)。最终沿着四个路径得到了四个第二特征图。接下来,路径融合模块再对路径1至路径4的第二特征图进行拼接,从而得到待处理图像的显著性图。此外,还应理解,沿着路径1至路径4中的每个路径对第一特征图进行处理时,每个路径中的第一特征图都有相应的权重,并且路径融合模块在对路径1至路径4的第二特征图进行拼接时,从每个路径获取的第二特征图也都有各自的权重。可以根据损失函数的函数值对这些权重进行训练,以得到这些权重。具体地,可以将损失函数的函数值沿着图3的架构反向传播,并沿着使得损失函数的函数值减小的方向调整这些权重,直到达到全局最优解(最终的调整结果可以是使得损失函数的函数值最小或者损失函数的函数值小于一定的阈值)。
另外,在图3所示的架构中,在沿着某个路径对至少两个第一特征图进行叠加处理之后并没有直接得到该路径对应的第二特征图,而是对该路径叠加得到的特征图再进行卷积处理和池化处理,然后得到该路径对应的第二特征图。类似地,路径融合模块在对路径1至路径4的第二特征图进行拼接之前,还可以先对路径1至路径4的第二特征图先进行卷积处理,然后再进行拼接。
应理解,假设某个叠加集合中包含分辨率分别为64×64、32×32、16×16、8×8的四个第一特征图,那么,对该叠加集合中的四个第一特征图进行处理相当于沿着图3中的路径1对分辨率分别为64×64、32×32、16×16、8×8的四个第一特征图进行处理。下面结合图4对沿着路径1对四个第一特征图进行处理的过程进行详细介绍。
如图4所示,分别获取层次1至层次4的第一特征图(从层次1至层次4获取的第一特征图的分辨率分别为64×64、32×32、16×16、8×8),按照64×64的分辨率对层次2至层次4获取的第一特征图分别进行上采样处理(由于层次1获取的第一特征图的分辨率为64×64,因此,不再对层次1获取的第一特征图进行上采样,而是可以直接对层次1获取的第一特征图进行正常的采样得到第三特征图),最终得到分辨率均为64×64的四个第三特征图;接下来,对分辨率均为64×64的四个第三特征图叠加,得到一个第四特征图;最后对第四特征图进行卷积处理和池化处理,此外,还可以利用激活函数,例如,现线性整流函数或者修正线性单元(Rectified Linear Units,ReLU)对卷积处理和池化处理得到的图像进行微调,最终得到路径1对应的第二特征图。
可选地,对待处理图像的至少两个第二特征图进行卷积、拼接,得到待处理图像的显 著性图具体包括:对至少两个第二特征图进行上采样,得到分辨率相同的至少两个第二特征图;对至少两个第五特征图进行卷积、拼接,得到待处理图像的显著性图。
表1示出了本申请实施例的图像显著性物体检测的方法与其它方法的检测数据的比较结果,其中,ours表示本申请的方法的检测数据,RC[7]、CHM[29]、DSR[30]、DRFI[22]、MC[49]、ELD[12]、MDF[27]、DS[12]、RFCN[45]、DHS[34]以及DCL[28]对应的是其它方法的检测数据,并且F β值越高,均方误差(Mean Squared Error,MAE)值越低,表示算法性能越好。由表1中的数据可知,本申请实施例的方法的F β值基本上都大于其它方法的F β值,并且本申请实施例的方法的MAE值基本上都小于其它方法的MAE值,因此,本申请实施例的方法取得了较好的效果。
表1
Figure PCTCN2018092514-appb-000003
另外,图5示出了本申请实施例的图像显著性物体检测的方法与其它方法对原始图像处理得到的显著性图的比较结果,其中,DCL、DHS、RFCN、DS、MDF、ELD、MC、DRFI以及DSR对应的是其它方法对原始图像处理得到的显著性图,由图5可知,与其它方法相比,本申请的方法得到的显著性图更接近于真实显著性图(该真实显著性图可以是人工标记得到的),因此,本申请的方法得到的显著性图的效果更好。
在得到待处理图像的显著性图之后,还可以结合待处理图像以及待处理图像的显著性图来得到待处理图像的分割图像。具体地,可以根据待处理图像对待处理图像的显著性图进行引导滤波,得到待处理图像的分割图像。应理解,待处理图像的分割图像也可以视为显著性分布图中的一种。
通过引导滤波能够进一步优化待处理图像的显著性图中的图像分割边缘,得到效果较好的分割图像。
另外,假设,待处理图像的显著性图为第一显著性图,第一显著性图的分辨率小于待处理图像的分辨率,那么可以先对第一显著性图进行上采样处理,得到分辨率与待处理图像一致的第二显著性图,然后再对根据待处理图像对第二显著性图进行引导滤波,得到待处理图像的分割图像。
例如,待处理图像的显著性图的分辨率为64×64,而待处理图像的分辨率为256×256,那么,可以先将待处理图像的分辨率调整为256×256,然后再根据待处理图像对待处理图像的显著性图进行引导滤波,得到待处理图像的分割图像。
图6是本申请实施例的图像显著性物体检测方法的示意性图。该方法300具体包括:
310、获取原始图像。
这里的原始图像可以是包含人像的照片,该照片具体可以通过手机自拍得到的照片。
320、对原始图像进行下采样,然后通过训练好的卷积神经网络模型输出低分辨率的显著性图。
例如,原始图像的分辨率为256×256,那么,可以先对原始图像进行下采样,得到分辨率为128×128的图像。
应理解,这里的卷积神经网络模型可以如图3所示,并且对该卷积神经网络模型进行训练时,可以根据不同的场景采用不同的数据集,例如,在进行人像分割时,可以采用人像分割数据集对卷积神经网络模型进行训练,在进行车辆分割时,可以采用车辆分割数据集对卷积神经网络模型进行训练。
330、对步骤320得到的显著性图进行上采样,得到与原始图像大小相同的显著性图。
例如,原始图像的分辨率为256×256,步骤320得到的显著性图的分辨率为128×128,那么,可以通过上采样将显著性图的分辨率由128×128调整为256×256。
340、根据原始图像对步骤330最终得到的显著性图进行引导滤波,以得到待处理图像的分割图像。
应理解,通过引导滤波,能够实现对步骤330得到的显著性图的图像边缘进行优化,从而得到效果较好的分割图像。
经过上述步骤310至340可以得到原始图像的分割图像,接下来,可以基于该原始图像的分割图像对该原始图像进行人像美化,大光圈效果等处理,以实现对原始图像的美化处理,提高原始图像的显示效果。
应理解,本申请实施例的图像显著性物体检测的方法能够在多种场景下实现对图像中的物体的分割,例如,可以实现对图像中的人物、车辆、动物等重要目标的分割。下面结合实例一和实例二对本申请实施例的图像显著性物体检测的方法在人像分割以及车辆分割这两种较为常见的场景下的应用进行详细的介绍。
实例一:人像分割
410、利用人像分割数据集训练卷积神经网络模型。
该卷积神经网络模型可以如图3所示。
人像分割数据集包括人像图片(包含人像的图片)以及该人像图片对应的真实显著性分布图。另外,为了增加训练效果,还可以对图片进行镜像、旋转以及改变光照等处理,以避免对卷积神经网络训练时出现过拟合。
420、对于输入的人像图片I h,首先下采样得到低分辨率的图片I l,然后通过训练好的卷积神经网络对下采样得到的低分辨率的图片进行处理,最终输出低分辨率的人像分割图像S l
对输入的人像图片首先做下采样处理,可以降低图像的分辨率,减少后续处理图像的复杂度。
430、对人像分割图像S l进行上采样,得到与原先的人像图片大小相同的图片S h
440、根据人像图片I h对图片S h进行引导滤波,得到最终的人像分割图像。
具体地,假设上述引导滤波函数为f(·),那么滤波后的输出图像为:
Figure PCTCN2018092514-appb-000004
其中r为滤波半径,eps为平滑参数。经过引导滤波进一步优化了人像分割边缘,使得人像分割图像的边缘更加清晰。
现有技术在进行人像分割时,无法准确地贴合人像边缘,在图像的局部区域会出现误检或者漏检的情况,而本申请的方法能够在复杂的场景中准确定位人像,同时能够较为准确地贴合人像边缘,取得较好的分割效果。
具体来说,通过对输入的图片先进行下采样,能够得到分辨率较低的图像,然后在分辨率较低的图像上实现基本的人像分割。另外,本申请的方法无需手动交互就可以自动检测出图像中的人像,并实现人像的分割。
本申请的方法与现有技术的图像分割结果如图7所示,由图7可知,本申请实施例的图像显著性检测方法与现有方法相比,能够准确地将图像中具有显著性特征的物体区域区分出来,进行显著性分析的效果更好。
实例二:车辆分割
510、利用车辆分割数据集训练卷积神经网络模型;
520、对于输入的道路场景图片I h,首先下采样得到低分辨率的图片I l,然后通过训练好的卷积神经网络对下采样得到的低分辨率的图片进行处理,最终输出低分辨率的车辆分割图像S l
530、对车辆分割图像S l进行上采样,得到与原先的道路场景图片大小相同的图片S h
540、根据道路场景图片I h对图片S h进行引导滤波,得到最终的车辆分割图像。
应理解,以上只是本申请实施例的图像显著性物体检测的方法具体应用的两个场景,实质上,本申请实施例的图像显著性物体检测的方法还可以应用到别的场景下,只要采用该场景下的训练数据对卷积神经网络进行训练,然后再对待处理图像进行相应的处理,也能达到较好的效果。
本申请的方法能够在较低分辨率的图像上实现基本的车辆分割,在复杂多变的背景环境下也能保证语义的准确性,并且最后通过在高分辨率图像上进行引导滤波,能够保证边缘的细节程度。无需手动交互,既可以自动检测和分割图像片中的车辆,能够辅助自动驾驶进行决策,相对于其它现有方法,本发明能够有效地分割车辆边缘,提高对车辆位姿估计、车距等的判断能力。
下面再结合图3对本申请实施例的图像显著性物体检测方法进行详细的描述。
图3所示的模型是基于ResNet-101架构为基础的网络模型,在该网络模型中,一共有4个层次,4个路径(这里的路径相当于上文中的集合),以及一个多路径融合模块,其中,层次1至层次4对应的分辨率分别是64×64,32×32,16×16,8×8。4个路径中的每个路径分别接收4个层次中的至少一个层次的特征图作为输入,路径1接收4个层次(层次1至层次4)的特征图,路径2接收3个层次(层次2至层次4)的特征图,路径3接收2个层次(层次3至层次4)的特征图,路径4接收1个层次(层次4)的特征图。
假设待处理图像的分辨率为256×256,下面对图3所示的网络模型中的各个层次、各个路径以及路径融合模块的具体操作进行详细的描述。
610、层次1至层次4从待处理图像获取相应分辨率的特征图。
具体地,层次1至层次4分别从待处理图像获取分辨率为64×64,32×32,16×16,8×8 的特征图。
620、路径1至路径4分别对至少一个层次的特征图进行融合。
具体地,以路径1为例,路径1接收层次1至层次4的特征图,并对路径1至路径4的特征图进行上采样,得到4个分辨率相同的图像,接下来,再对这4个分辨率相同的图像进行求和,得到求和后的特征图,然后再对求和后的特征图进行卷积处理和池化处理,并采用线性整流函数对卷积池化后的特征图进行微调,最终得到路径1的特征图。
630、多路径融合模块将路径1至路径4的特征图进行融合。
具体地,多路径融合模块对路径1至路径4的特征图进行上采样,采样得到分辨率大小为64×64的4个特征图,接下来,对这4个特征图进行卷积和拼接操作,最后再将卷积和拼接操作得到的特征图上采样到待处理图像的大小(分辨率128×128),得到待处理图像的显著性图。
应理解,图3所示的网络架构只是本申请实施例的方法的一种可能的架构,事实上,还可以在图3所示的网络架构的基础上进行若干改进和替换,例如,改变卷积层和路径的数目,改变路径与卷积层的对应关系等等,这些改进和替换得到网络架构都在本申请的保护范围内。
本申请实施例的图像显著性物体检测方法可以应用于对图片中的重要目标的分割。例如,在人像模式中,将图片中的人像与其它背景物体的分割,并对人像和背景分别进行不同的处理(例如,对人像进行美肤处理,对背景进行模糊处理,增强背景色彩,对背景的四角进行压暗处理等),从而最终达到突出人像、美化人像的艺术效果。具体地,本申请实施例的方法可以应用人像模式中的自拍照以及大光圈效果。本申请实施例的方法还可以应用在人像风格化、人像美化、人像背景编辑与合成(例如,证件照的生成,景点合影照的合成等)。具体地,在对原始图片进行显著性分析后得到该原始图片的显著性图,那么,接下来就可以根据分析得到的显著性图对原始图片中的人像进行风格化处理,或者对人像进行美化处理,或者对原始图片中的背景进行替换。
可选地,本申请实施例的方法还可以应用于对图像中感兴趣物体分割,物体识别等。
上文结合图2至图7对本申请实施例的图像显著性物体检测方法进行了详细的描述,下面结合图8和图9对本申请实施例的图像显著性物体检测装置进行描述,应理解,图8和图9中的装置能够执行上文中的图像显著性物体检测方法的相应步骤,为了简洁,下面适当省略重复的描述。
图8是本申请实施例的图像显著性物体检测装置的示意性框图。图8的装置800包括:
卷积模块810,用于分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;
叠加模块820,用于对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,其中,所述至少两个集合分别对应不同的分辨率,所述至少两个集合与所述至少两个第二特征图一一对应,所述叠加集合中包含的第一特征图的分辨率小于或者等于所述叠加集合对应的第二特征图的分辨率;
拼接模块830,用于对所述至少两个第二特征图进行拼接,从而得到所述待处理图像 的显著性图。
可选地,作为一个实施例,所述叠加模块820具体用于:对所述叠加集合中的分辨率小于所述叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与所述叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,所述至少两个第三特征图与所述至少两个第一特征图一一对应;将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图。
可选地,作为一个实施例,所述叠加模块820具体用于:根据所述至少两个第三特征图各自对应的权重,对所述至少两个第三特征图进行叠加,得到所述第二特征图。
可选地,作为一个实施例,所述至少一个第三特征图中的每个第三特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
可选地,作为一个实施例,所述拼接模块830具体用于:根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
可选地,作为一个实施例,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
可选地,作为一个实施例,所述叠加模块820具体用于:对所述至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;对所述叠加处理后得到的至少两个特征图进行卷积处理,以得到所述卷积处理后的至少两个特征图,其中,所述卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;对所述卷积处理得到的至少两个特征图进行池化处理,以得到所述至少两个第二特征图。
可选地,作为一个实施例,所述拼接模块830具体用于:对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
可选地,作为一个实施例,所述装置800还包括:滤波模块840,用于根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
可选地,作为一个实施例,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述滤波模块840具体用于:对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
图9是本申请实施例的图像显著性物体检测装置的示意性框图。图9的装置900包括:
存储器910,用于存储程序;
处理器920,用于执行所述存储器910存储的程序,当所述存储器910的程序被执行时,所述处理器920具体用于:分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,其中,所述至少两个集合分别对应不同的分辨率,所述至少两个集合与所述至少两个第二特征图一一对应,所述叠加集合中包含的第一特征图的分辨率小于或者等于所述叠加集合对应的第二特征图的分辨率;对所述至少两个第二特 征图进行拼接,从而得到所述待处理图像的显著性图。
可选地,作为一个实施例,所述处理器920具体用于:对所述叠加集合中的分辨率小于所述叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与所述叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,所述至少两个第三特征图与所述至少两个第一特征图一一对应;将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图。
可选地,作为一个实施例,所述处理器920具体用于:根据所述至少两个第三特征图各自对应的权重,对所述至少两个第三特征图进行叠加,得到所述第二特征图。
可选地,作为一个实施例,所述至少一个第三特征图中的每个第三特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
可选地,作为一个实施例,所述处理器920具体用于:根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
可选地,作为一个实施例,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
可选地,作为一个实施例,所述处理器920具体用于:对所述至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;对所述叠加处理后得到的至少两个特征图进行卷积处理,以得到所述卷积处理后的至少两个特征图,其中,所述卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;对所述卷积处理得到的至少两个特征图进行池化处理,以得到所述至少两个第二特征图。
可选地,作为一个实施例,所述处理器920具体用于:对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
可选地,作为一个实施例,所述处理器920还用于:根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
可选地,作为一个实施例,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述处理器920具体用于:对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如至少两个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点, 所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到至少两个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (40)

  1. 一种图像显著性物体检测方法,其特征在于,包括:
    分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;
    对所述至少两个第一特征图进行处理,从而得到所述待处理图像的至少两个第二特征图,所述至少两个第二特征图中至少一个第二特征图由对所述至少两个第一特征图中的多个第一特征图进行叠加处理得到,所述至少两个第二特征图的任意两个第二特征图分辨率不同,其中,所述至少一个第二特征图的分辨率大于或者等于用于得到所述至少一个第二特征图中的所述多个第一特征图中的最大分辨率;
    对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
  2. 如权利要求1所述的方法,其特征在于,所述对所述至少两个第一特征图中的多个第一特征图进行叠加处理,包括:
    对所述多个第一特征图中分辨率小于所要得到的所述至少一个第二特征的分辨率的第一特征图进行上采样,从而得所述第一特征图对应的第三特征图,所述第三特征图的分辨率等于所要得到的所述至少一个第二特征图的分辨率;
    将所述上采样得到的第三特征图与所述多个第一特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图。。
  3. 如权利要求2所述的方法,其特征在于,所述将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图,包括:
    根据所述各个第三特征图或第一个特征图对应的权重,对所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图。
  4. 如权利要求3所述的方法,其特征在于,所述权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
  5. 如权利要求1-4中任一项所述的方法,其特征在于,所述对所述待处理图像的至少两个第二特征图进行拼接,得到所述待处理图像的显著性图,包括:
    根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
  6. 如权利要求5所述的方法,其特征在于,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
  7. 如权利要求2-4中任一项所述的方法,其特征在于,所述将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图,包括:
    将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征 图进行叠加、卷积以及池化处理得到所述至少一个第二特征图。
  8. 如权利要求1-7中任一项所述的方法,其特征在于,对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图,包括:
    对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;
    对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
  9. 如权利要求1-8中任一项所述的方法,所述方法还包括:
    根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
  10. 如权利要求9所述的方法,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像,包括:
    对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;
    根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
  11. 一种图像显著性物体检测装置,其特征在于,包括:
    卷积模块,用于分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;
    叠加模块,用于对所述至少两个第一特征图进行处理,从而得到所述待处理图像的至少两个第二特征图,所述至少两个第二特征图中至少一个第二特征图由对所述至少两个第一特征图中的多个第一特征图进行叠加处理得到,所述至少两个第二特征图的任意两个第二特征图分辨率不同,其中,所述至少一个第二特征图的分辨率大于或者等于用于得到所述至少一个第二特征图中的所述多个第一特征图中的最大分辨率;
    拼接模块,用于对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
  12. 如权利要求11所述的装置,其特征在于,所述叠加模块具体用于:
    对所述多个第一特征图中分辨率小于所要得到的所述至少一个第二特征的分辨率的第一特征图进行上采样,从而得所述第一特征图对应的第三特征图,所述第三特征图的分辨率等于所要得到的所述至少一个第二特征图的分辨率;
    将所述上采样得到的第三特征图与所述多个第一特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图。
  13. 如权利要求12所述的装置,其特征在于,所述叠加模块具体用于:
    根据所述各个第三特征图或第一个特征图对应的权重,对所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加处理,从而得到所述至少一个第二特征图
  14. 如权利要求13所述的装置,其特征在于,所述权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
  15. 如权利要求11-14中任一项所述的装置,其特征在于,所述拼接模块具体用于:
    根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
  16. 如权利要求15所述的装置,其特征在于,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
  17. 如权利要求12-14中任一项所述的装置,其特征在于,所述叠加模块具体用于:
    将所述上采样得到的第三特征图与所述多个第一个特征图中未经上采样的第一特征图进行叠加、卷积以及池化处理得到所述至少一个第二特征图。
  18. 如权利要求11-17中任一项所述的装置,其特征在于,所述拼接模块具体用于:
    对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;
    对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
  19. 如权利要求11-18中任一项所述的装置,所述装置还包括:
    滤波模块,用于根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
  20. 如权利要求19所述的装置,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述滤波模块具体用于:
    对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;
    根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
  21. 一种图像显著性物体检测方法,其特征在于,包括:
    分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;
    对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,其中,所述至少两个集合分别对应不同的分辨率,所述至少两个集合与所述至少两个第二特征图一一对应,所述叠加集合中包含的第一特征图的分辨率小于或者等于所述叠加集合对应的第二特征图的分辨率;
    对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
  22. 如权利要求21所述的方法,其特征在于,所述分别对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,包括:
    对所述叠加集合中的分辨率小于所述叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与所述叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,所述至少两个第三特征图与所述至少两个第一特征图一一对应;
    将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图。
  23. 如权利要求22所述的方法,其特征在于,所述将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图,包括:
    根据所述至少两个第三特征图各自对应的权重,对所述至少两个第三特征图进行叠加,得到所述第二特征图。
  24. 如权利要求23所述的方法,其特征在于,所述至少一个第三特征图中的每个第三特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
  25. 如权利要求21-24中任一项所述的方法,其特征在于,所述对所述待处理图像的至少两个第二特征图进行拼接,得到所述待处理图像的显著性图,包括:
    根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
  26. 如权利要求25所述的方法,其特征在于,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
  27. 如权利要求21-26中任一项所述的方法,其特征在于,所述对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,包括:
    对所述至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;
    对所述叠加处理后得到的至少两个特征图进行卷积处理,以得到所述卷积处理后的至少两个特征图,其中,所述卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;
    对所述卷积处理得到的至少两个特征图进行池化处理,以得到所述至少两个第二特征图。
  28. 如权利要求21-27中任一项所述的方法,其特征在于,对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图,包括:
    对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;
    对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
  29. 如权利要求21-28中任一项所述的方法,所述方法还包括:
    根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
  30. 如权利要求29所述的方法,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像,包括:
    对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;
    根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割图像。
  31. 一种图像显著性物体检测装置,其特征在于,包括:
    卷积模块,用于分别对待处理图像进行至少两个卷积层对应的卷积处理,得到所述待处理图像的至少两个第一特征图,所述至少两个第一特征图的分辨率小于所述待处理图像的分辨率,所述至少两个第一特征图中的任意两个第一特征图的分辨率不同;
    叠加模块,用于对至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理,从而得到所述待处理图像的至少两个第二特征图,其中,所述至少两个集合分别对 应不同的分辨率,所述至少两个集合与所述至少两个第二特征图一一对应,所述叠加集合中包含的第一特征图的分辨率小于或者等于所述叠加集合对应的第二特征图的分辨率;
    拼接模块,用于对所述至少两个第二特征图进行拼接,从而得到所述待处理图像的显著性图。
  32. 如权利要求31所述的装置,其特征在于,所述叠加模块具体用于:
    对所述叠加集合中的分辨率小于所述叠加集合对应的第二特征图的分辨率的第一特征图进行上采样,从而得到与所述叠加集合对应的第二特征图的分辨率相同的至少两个第三特征图,所述至少两个第三特征图与所述至少两个第一特征图一一对应;
    将所述至少两个第三特征图叠加,得到所述叠加集合对应的第二特征图。
  33. 如权利要求32所述的装置,其特征在于,所述叠加模块具体用于:
    根据所述至少两个第三特征图各自对应的权重,对所述至少两个第三特征图进行叠加,得到所述第二特征图。
  34. 如权利要求33所述的装置,其特征在于,所述至少一个第三特征图中的每个第三特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异训练得到的。
  35. 如权利要求31-34中任一项所述的装置,其特征在于,所述拼接模块具体用于:
    根据所述至少两个第二特征图各自对应的权重,对所述至少两个第二特征图进行拼接,得到所述待处理图像的显著性图。
  36. 如权利要求35所述的装置,其特征在于,所述至少两个第二特征图中的每个第二特征图的权重是根据训练图像的显著性图与所述训练图像对应的参考显著性图的差异确定的。
  37. 如权利要求31-36中任一项所述的装置,其特征在于,所述叠加模块具体用于:
    对所述至少两个集合中的叠加集合中包含的至少两个第一特征图进行叠加处理;
    对所述叠加处理后得到的至少两个特征图进行卷积处理,以得到所述卷积处理后的至少两个特征图,其中,所述卷积处理用于提取所述叠加处理后得到的至少两个特征图的特征;
    对所述卷积处理得到的至少两个特征图进行池化处理,以得到所述至少两个第二特征图。
  38. 如权利要求31-37中任一项所述的装置,其特征在于,所述拼接模块具体用于:
    对所述至少两个第二特征图进行卷积处理,以得到所述至少两个第二特征图的特征;
    对所述至少两个第二特征图的特征进行拼接,从而得到所述待处理图像的显著性图。
  39. 如权利要求31-38中任一项所述的装置,所述装置还包括:
    滤波模块,用于根据所述待处理图像对所述待处理图像的显著性图进行引导滤波,得到所述待处理图像的分割图像。
  40. 如权利要求39所述的装置,所述显著性图为第一显著性图,所述第一显著性图的分辨率小于所述待处理图像的分辨率,所述滤波模块具体用于:
    对所述第一显著性图进行上采样,得到与所述待处理图像分辨率相同的第二显著性图;
    根据所述待处理图像对所述第二显著性图进行引导滤波,得到所述待处理图像的分割 图像。
PCT/CN2018/092514 2017-06-23 2018-06-22 图像显著性物体检测方法和装置 WO2018233708A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18820395.4A EP3633611A4 (en) 2017-06-23 2018-06-22 METHOD AND DEVICE FOR DETECTING A RELIEF OBJECT IN AN IMAGE
US16/723,539 US11430205B2 (en) 2017-06-23 2019-12-20 Method and apparatus for detecting salient object in image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710488970.4 2017-06-23
CN201710488970.4A CN109118459B (zh) 2017-06-23 2017-06-23 图像显著性物体检测方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/723,539 Continuation US11430205B2 (en) 2017-06-23 2019-12-20 Method and apparatus for detecting salient object in image

Publications (1)

Publication Number Publication Date
WO2018233708A1 true WO2018233708A1 (zh) 2018-12-27

Family

ID=64732394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/092514 WO2018233708A1 (zh) 2017-06-23 2018-06-22 图像显著性物体检测方法和装置

Country Status (4)

Country Link
US (1) US11430205B2 (zh)
EP (1) EP3633611A4 (zh)
CN (1) CN109118459B (zh)
WO (1) WO2018233708A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118459A (zh) * 2017-06-23 2019-01-01 南开大学 图像显著性物体检测方法和装置
CN112166455A (zh) * 2019-09-26 2021-01-01 深圳市大疆创新科技有限公司 图像处理方法、装置、可移动平台及机器可读存储介质
CN112819006A (zh) * 2020-12-31 2021-05-18 北京声智科技有限公司 图像处理方法、装置及电子设备

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568627B2 (en) 2015-11-18 2023-01-31 Adobe Inc. Utilizing interactive deep learning to select objects in digital visual media
US10192129B2 (en) 2015-11-18 2019-01-29 Adobe Systems Incorporated Utilizing interactive deep learning to select objects in digital visual media
CN106934397B (zh) * 2017-03-13 2020-09-01 北京市商汤科技开发有限公司 图像处理方法、装置及电子设备
US11190784B2 (en) * 2017-07-06 2021-11-30 Samsung Electronics Co., Ltd. Method for encoding/decoding image and device therefor
US11244195B2 (en) * 2018-05-01 2022-02-08 Adobe Inc. Iteratively applying neural networks to automatically identify pixels of salient objects portrayed in digital images
US10922589B2 (en) * 2018-10-10 2021-02-16 Ordnance Survey Limited Object-based convolutional neural network for land use classification
US10984532B2 (en) 2018-08-24 2021-04-20 Ordnance Survey Limited Joint deep learning for land cover and land use classification
WO2020080765A1 (en) 2018-10-19 2020-04-23 Samsung Electronics Co., Ltd. Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
US11282208B2 (en) 2018-12-24 2022-03-22 Adobe Inc. Identifying target objects using scale-diverse segmentation neural networks
CN111833363B (zh) * 2019-04-17 2023-10-24 南开大学 图像边缘和显著性检测方法及装置
CN110097115B (zh) * 2019-04-28 2022-11-25 南开大学 一种基于注意力转移机制的视频显著性物体检测方法
KR102436512B1 (ko) 2019-10-29 2022-08-25 삼성전자주식회사 부호화 방법 및 그 장치, 복호화 방법 및 그 장치
CN111028237B (zh) * 2019-11-26 2023-06-06 中国科学院深圳先进技术研究院 图像分割方法、装置及终端设备
CN111462149B (zh) * 2020-03-05 2023-06-06 中国地质大学(武汉) 一种基于视觉显著性的实例人体解析方法
CN111476806B (zh) * 2020-06-23 2020-10-23 腾讯科技(深圳)有限公司 图像处理方法、装置、计算机设备和存储介质
US11335004B2 (en) 2020-08-07 2022-05-17 Adobe Inc. Generating refined segmentation masks based on uncertain pixels
WO2022119506A1 (en) * 2020-12-03 2022-06-09 National University Of Singapore Method and system for training a neural network
CN112613363B (zh) * 2020-12-11 2024-04-05 浙江大华技术股份有限公司 一种车辆图像划分的方法、装置及存储介质
US11676279B2 (en) 2020-12-18 2023-06-13 Adobe Inc. Utilizing a segmentation neural network to process initial object segmentations and object user indicators within a digital image to generate improved object segmentations
US11875510B2 (en) 2021-03-12 2024-01-16 Adobe Inc. Generating refined segmentations masks via meticulous object segmentation
US20220301099A1 (en) * 2021-03-16 2022-09-22 Argo AI, LLC Systems and methods for generating object detection labels using foveated image magnification for autonomous driving
CN113160039B (zh) * 2021-04-28 2024-03-26 北京达佳互联信息技术有限公司 图像风格迁移方法、装置、电子设备及存储介质
CN116740069B (zh) * 2023-08-15 2023-11-07 山东锋士信息技术有限公司 基于多尺度显著信息和双向特征融合的表面缺陷检测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537393A (zh) * 2015-01-04 2015-04-22 大连理工大学 一种基于多分辨率卷积神经网络的交通标志识别方法
CN104680508A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 卷积神经网络和基于卷积神经网络的目标物体检测方法
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN106127725A (zh) * 2016-05-16 2016-11-16 北京工业大学 一种基于多分辨率cnn的毫米波雷达云图分割方法
CN106157319A (zh) * 2016-07-28 2016-11-23 哈尔滨工业大学 基于卷积神经网络的区域和像素级融合的显著性检测方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5361291A (en) * 1991-11-20 1994-11-01 General Electric Company Deconvolution filter for CT system
CN101329767B (zh) 2008-07-11 2011-11-16 西安交通大学 基于学习的视频中显著物体序列自动检测方法
CN101520894B (zh) 2009-02-18 2011-03-30 上海大学 基于区域显著性的显著对象提取方法
CN101526955B (zh) 2009-04-01 2010-09-01 清华大学 一种基于草图的网络图元自动提取方法和系统
JP4862930B2 (ja) * 2009-09-04 2012-01-25 カシオ計算機株式会社 画像処理装置、画像処理方法及びプログラム
CN101650824B (zh) 2009-09-23 2011-12-28 清华大学 基于共形能量的内容敏感图像缩放方法
CN102184557B (zh) 2011-06-17 2012-09-12 电子科技大学 一种复杂场景的显著区域检测方法
CN104463865A (zh) 2014-12-05 2015-03-25 浙江大学 一种人像分割方法
CN106296638A (zh) * 2015-06-04 2017-01-04 欧姆龙株式会社 显著性信息取得装置以及显著性信息取得方法
CN105590319B (zh) * 2015-12-18 2018-06-29 华南理工大学 一种深度学习的图像显著性区域检测方法
CN105787482A (zh) 2016-02-26 2016-07-20 华北电力大学 一种基于深度卷积神经网络的特定目标轮廓图像分割方法
CN106339984B (zh) * 2016-08-27 2019-09-13 中国石油大学(华东) 基于k均值驱动卷积神经网络的分布式图像超分辨方法
CN106600538A (zh) * 2016-12-15 2017-04-26 武汉工程大学 一种基于区域深度卷积神经网络的人脸超分辨率算法
CN109118459B (zh) 2017-06-23 2022-07-19 南开大学 图像显著性物体检测方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680508A (zh) * 2013-11-29 2015-06-03 华为技术有限公司 卷积神经网络和基于卷积神经网络的目标物体检测方法
CN104537393A (zh) * 2015-01-04 2015-04-22 大连理工大学 一种基于多分辨率卷积神经网络的交通标志识别方法
CN104794504A (zh) * 2015-04-28 2015-07-22 浙江大学 基于深度学习的图形图案文字检测方法
CN106127725A (zh) * 2016-05-16 2016-11-16 北京工业大学 一种基于多分辨率cnn的毫米波雷达云图分割方法
CN106157319A (zh) * 2016-07-28 2016-11-23 哈尔滨工业大学 基于卷积神经网络的区域和像素级融合的显著性检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3633611A4

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109118459A (zh) * 2017-06-23 2019-01-01 南开大学 图像显著性物体检测方法和装置
US11430205B2 (en) 2017-06-23 2022-08-30 Huawei Technologies Co., Ltd. Method and apparatus for detecting salient object in image
CN112166455A (zh) * 2019-09-26 2021-01-01 深圳市大疆创新科技有限公司 图像处理方法、装置、可移动平台及机器可读存储介质
WO2021056304A1 (zh) * 2019-09-26 2021-04-01 深圳市大疆创新科技有限公司 图像处理方法、装置、可移动平台及机器可读存储介质
CN112819006A (zh) * 2020-12-31 2021-05-18 北京声智科技有限公司 图像处理方法、装置及电子设备
CN112819006B (zh) * 2020-12-31 2023-12-22 北京声智科技有限公司 图像处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN109118459A (zh) 2019-01-01
US20200143194A1 (en) 2020-05-07
CN109118459B (zh) 2022-07-19
EP3633611A1 (en) 2020-04-08
EP3633611A4 (en) 2020-06-10
US11430205B2 (en) 2022-08-30

Similar Documents

Publication Publication Date Title
WO2018233708A1 (zh) 图像显著性物体检测方法和装置
CN108010031B (zh) 一种人像分割方法及移动终端
CN105323425B (zh) 融合图像系统中的场景运动校正
US9292928B2 (en) Depth constrained superpixel-based depth map refinement
CN111583097A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
WO2021016873A1 (zh) 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质
US20110085741A1 (en) Methods and apparatus for editing images
KR20210058887A (ko) 이미지 처리 방법 및 장치, 전자 기기 및 저장 매체
Lo et al. Joint trilateral filtering for depth map super-resolution
CN110532965B (zh) 年龄识别方法、存储介质及电子设备
WO2022032998A1 (zh) 图像处理方法及装置、电子设备、存储介质和程序产品
EP3836083B1 (en) Disparity estimation system and method, electronic device and computer program product
CN111080670A (zh) 图像提取方法、装置、设备及存储介质
KR102262671B1 (ko) 비디오 영상에 보케 효과를 적용하는 방법 및 기록매체
CN115493612A (zh) 一种基于视觉slam的车辆定位方法及装置
CN111095295B (zh) 物体检测方法和装置
WO2024041235A1 (zh) 图像处理方法、装置、设备、存储介质及程序产品
CN110472490A (zh) 基于改进VGGNet的动作识别方法及装置、存储介质和终端
JP2019220174A (ja) 人工ニューラルネットワークを用いた画像処理
Wang et al. A encoder-decoder deblurring network combined with high-frequency a priori
JP6150419B2 (ja) カメラおよびプログラム
JP5556494B2 (ja) 画像処理装置、画像処理方法およびプログラム
Adorno et al. Smartphone-based floor detection in unstructured and structured environments
EP4244807A1 (en) Systems and methods for concurrent depth representation and inpainting of images
JP5962810B2 (ja) 画像処理装置、画像処理方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18820395

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018820395

Country of ref document: EP

Effective date: 20200102