WO2020232672A1 - Image cropping method and apparatus, and photographing apparatus - Google Patents

Image cropping method and apparatus, and photographing apparatus Download PDF

Info

Publication number
WO2020232672A1
WO2020232672A1 PCT/CN2019/087999 CN2019087999W WO2020232672A1 WO 2020232672 A1 WO2020232672 A1 WO 2020232672A1 CN 2019087999 W CN2019087999 W CN 2019087999W WO 2020232672 A1 WO2020232672 A1 WO 2020232672A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
network
area
sub
original image
Prior art date
Application number
PCT/CN2019/087999
Other languages
French (fr)
Chinese (zh)
Inventor
曾辉
曹子晟
胡攀
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980009520.XA priority Critical patent/CN111684488A/en
Priority to PCT/CN2019/087999 priority patent/WO2020232672A1/en
Publication of WO2020232672A1 publication Critical patent/WO2020232672A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present invention relates to the field of image processing, in particular to an image cropping method, device and photographing device.
  • the visual effects of the photos taken are not ideal due to the user's late composition or lack of composition knowledge.
  • the photo can be re-composed by post-cutting to enhance the visual effect.
  • Toolkits currently on the market provide manual cropping functions.
  • manual cropping of photos requires users to have a certain knowledge of photographic composition; and when the number of photos is large, the workload will become very large.
  • the prior art also has a method for automatically cropping photos to realize automatic cropping of photos.
  • the early automatic cropping algorithms are mainly designed based on the attention mechanism, aiming to capture the most important content or regions of interest in the photo.
  • this type of algorithm first obtains the saliency map of the photo, and then uses the form of sliding window to select the most significant window as the cropping result; some algorithms also introduce face detection or visual interactive information as an aid.
  • the design algorithm based on the attention mechanism does not consider the overall composition of the photo, the cropping results obtained are often in general visual effects.
  • the method based on aesthetic attributes focuses on the aesthetic attributes of photos and the commonly used composition rules of photography, and models these attributes and rules by manually designing features, and then learns an aesthetic classifier, such as a support vector machine, to choose from many cropping options Choose the best result.
  • an aesthetic classifier such as a support vector machine
  • Data-driven algorithms mostly implement automatic cropping of photos by training an end-to-end deep neural network on the labeled data set.
  • Most of these algorithms directly adopt standard neural network architectures that have been successful in other fields (such as image classification, target detection).
  • the depth models used contain hundreds of megabytes of parameters, and require very powerful GPUs and high power consumption to operate. This is something that low-power products such as cameras cannot afford.
  • the characteristics of photo cropping and composition itself are not considered, the cropping results of these methods are not stable enough.
  • the invention provides an image cropping method, device and photographing device.
  • the present invention is implemented through the following technical solutions:
  • an image cropping method including:
  • the feature map after dimensionality reduction is cropped.
  • an image cropping device comprising:
  • Storage device for storing program instructions
  • One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
  • the feature map after dimensionality reduction is cropped.
  • a photographing device comprising:
  • Image acquisition module for obtaining original images
  • Storage device for storing program instructions
  • One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
  • the feature map after dimensionality reduction is cropped.
  • an unmanned aerial vehicle includes:
  • Image acquisition module for obtaining original images
  • Storage device for storing program instructions
  • One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
  • the feature map after dimensionality reduction is cropped.
  • a mobile terminal comprising:
  • Image acquisition module for obtaining original images
  • Storage device for storing program instructions
  • One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
  • the feature map after dimensionality reduction is cropped.
  • a handheld PTZ includes:
  • Image acquisition module for obtaining original images
  • Storage device for storing program instructions
  • One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
  • the feature map after dimensionality reduction is cropped.
  • the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes to accurately and fully obtain the original image.
  • Feature information is conducive to accurately modeling the results of image cropping, so that the image cropping method can adapt to more complex application scenarios; and, reducing the dimensionality of the feature map, and then cropping the reduced feature map, can improve image cropping
  • the performance of the image cropping parameter is reduced, so that the image cropping method is suitable for chips with lower power consumption, which greatly improves the practicability.
  • FIG. 1 is a method flowchart of an image cropping method in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of the network structure of a convolutional neural network in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a specific network structure of a convolutional neural network in an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for cutting a feature map after dimension reduction in an embodiment of the present invention
  • FIG. 5 is a schematic diagram of the division of feature maps in an embodiment of the present invention.
  • FIG. 6 is a flow chart of a specific method for cutting the feature map after dimension reduction in the embodiment shown in FIG. 5;
  • Fig. 7 is a structural block diagram of an image cropping device in an embodiment of the present invention.
  • FIG. 8 is a structural block diagram of a photographing device in an embodiment of the present invention.
  • Figure 9 is a structural block diagram of a drone in an embodiment of the present invention.
  • FIG. 10 is a structural block diagram of a mobile terminal in an embodiment of the present invention.
  • Fig. 11 is a structural block diagram of a handheld pan/tilt in an embodiment of the present invention.
  • Fig. 1 is a method flowchart of an image cropping method in an embodiment of the present invention. As shown in Fig. 1, the image cropping method of an embodiment of the present invention may include the following steps:
  • Objects in the actual application scene have multiple sizes. Even the same object has different sizes in the image due to different shooting distances.
  • the embodiment of the invention extracts multiple original image information with different distance characteristics from the same original image.
  • different strategies can be used to extract multiple original image information with different distance characteristics, for example, multiple original image information with different distance characteristics can be extracted based on down-sampling, up-sampling or a combination of down-sampling and up-sampling .
  • the original image is input into a pre-trained convolutional neural network; the original image is downsampled by the convolutional neural network to obtain multiple original image information with different distance characteristics.
  • a three-layer image pyramid is used as the input of the convolutional neural network, and the three-layer image pyramid is downsampled layer by layer by 2 times.
  • feature extraction is performed on the original image, the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times, respectively, to obtain multiple original image information with different distance characteristics, that is, this embodiment
  • the original image information of the example includes the original image (1 in Figure 2), the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times.
  • the network structure of the convolutional neural network will be described in detail in the following embodiments.
  • a feature map of the original image information is extracted on the distance scale of each original image information based on a pre-trained convolutional neural network.
  • the convolutional neural network 10 may include multiple sub-networks, the input of the sub-network is the original image information corresponding to the distance feature; each sub-network is used to extract the feature map of the original image information corresponding to the distance feature.
  • the neural network may include 2, 3, 4, 5 or other numbers of sub-networks.
  • the convolutional neural network 10 includes a first sub-network 11, a second sub-network 12, and The third sub-network 13, wherein the input of the first sub-network 11 is the original image, the input of the second sub-network 12 is the image obtained by down-sampling the original image by 2 times, and the input of the third sub-network 13 is the original image down-sampling 4 Times the image obtained.
  • the labeling network architecture (such as VGG16) used has multiple downsampling (usually 5 times 32 times downsampling) during the feature extraction process, which leads to a great loss of object spatial information.
  • the original image size is 256*256
  • a feature map with a spatial resolution of only 8*8 is obtained, and then the feature map with a resolution of 8*8 is cropped.
  • the resolution corresponding to the cropping frame is lower. At such a small spatial resolution, most of the information has been lost, and it is impossible to accurately model the cropping results.
  • each sub-network of the embodiment of the present invention first down-samples the original image information corresponding to the distance feature, and then up-sampling, while ensuring a sufficiently large receptive field and spatial resolution.
  • the sub-network of this embodiment may include multiple first network layers connected in sequence and at least one second network layer set after the first network layer. The first network layer is used for downsampling, and the second The network layer is used for upsampling; and, the number of the first network layer is greater than the number of the second network layer.
  • each sub-network includes multiple second network layers.
  • each sub-network includes four first network layers connected in sequence and two second network layers connected in sequence after the four first network layers; in other embodiments, each sub-network The network includes 5 first network layers connected in sequence and 2 second network layers connected in sequence after the 5 first network layers; it is understandable that in each sub-network, the number of the first network layer and the second network can also be Setting to other is not limited to the number in the above-listed embodiments.
  • the first network layer can implement downsampling based on shuffle operation or convolution operation
  • the second network layer can be based on shuffle operation, deconvolution operation, bicubic interpolation, nearest neighbor interpolation or bilinear Up-sampling is realized by sampling (bilinear interpolation).
  • the first network layer includes a convolution layer and a pooling layer
  • the second network layer includes a deconvolution layer.
  • multiple sub-networks share weight parameters, that is, the convolution kernels of the convolutional layer and the pooling layer in each sub-network are the same.
  • the step size of the first first network layer of the sub-network is different from the first first network layer of other sub-networks.
  • the step size of the layer is different from the step size of the first first network layer of the other sub-networks; optional, The step size of the convolutional layer of the first first network layer of the sub-network is different from the step size of the first first network layer of the other sub-networks, and the pool of the first first network layer of the sub-network
  • the step size of the pooling layer is different from the step size of the pooling layer of the first first network layer of other sub-networks.
  • the step size of the convolutional layer and pooling layer of the other first network layer (not the first first network layer) is the same as that of the convolutional layer and pooling layer of the network layer in the corresponding position in other sub-networks.
  • the step size is equal, and in each sub-network, the step size of the deconvolution layer of the second network layer is equal to the step size of the deconvolution layer of the second network layer in other sub-networks. That is, by adjusting the step size of the convolutional layer and/or the pooling layer of the first first network layer in each sub-network, it can be ensured that the distance size of the feature map finally output by each sub-network is the same distance size. Furthermore, in each sub-network, the step size of the convolutional layer and the pooling layer of the other first network layer (not the first network layer) is also equal to the step size of the deconvolution layer of the second network layer.
  • the step size of the first convolutional layer and the pooling layer of the first network layer of the first sub-network 11 is 4, and the volume of the first first network layer of the second sub-network 12
  • the step size of the accumulation layer and the pooling layer is 2, and the step size of the convolution layer and the pooling layer of the first network layer of the third sub-network 13 is 0.
  • the steps of the convolutional layer and the pooling layer of the other first network layer are all 2.
  • the steps of the deconvolution layer of the second network layer The length is also 2.
  • each sub-network is trained separately, and the weight parameter of each sub-network is determined. For a sub-network with a smaller distance size, the parameters of the sub-network can be further reduced.
  • the output of any first network layer of the sub-network can be used as any first network layer and/or the first network layer of any other sub-network. Input to the second network layer. In some embodiments, the output of any second network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network, wherein the output of the sub-network. In some embodiments, the output of any first network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network.
  • the output of the second network layer can be used as the input of any first network layer and/or second network layer of any other sub-network.
  • the distance size of the feature map output by a certain network layer is equal to the distance size of the feature map input by another certain network layer.
  • the output of a certain network layer is superimposed on the input of another certain network layer in the channel direction of the feature map.
  • a fully connected layer with many parameters is generally required.
  • a 7*7*512*4096 fully connected layer is the standard configuration of many target detection networks, and this setting is also adopted by some image cropping models.
  • the parameters of this layer alone are as high as 392 trillion, which is completely unaffordable by the current camera system.
  • image cropping does not need to accurately identify every content in the original image, so the channel dimension of the feature map can be reduced to a very low level without loss of performance, thereby greatly reducing the use of images Tailor the parameters of the fully connected layer.
  • the feature map is input to the 1*1 convolutional layer 20 for dimensionality reduction processing, thereby reducing the dimensionality of the feature map.
  • the way of reducing the dimensionality of the feature map is not limited to using the 1*1 convolutional layer 20 to implement dimensionality reduction processing, and other existing dimensionality reduction algorithms can also be used instead.
  • S104 Perform cropping processing on the feature map after dimensionality reduction.
  • Fig. 4 is a flow chart of a specific method for cropping the feature map after dimensionality reduction in an embodiment of the present invention. As shown in Fig. 4, cropping the feature map after dimensionality reduction may include:
  • S401 Divide the feature map along the length and width directions of the feature map after dimension reduction to obtain multiple grid regions, each grid region including multiple pixels;
  • S402 According to a plurality of grid regions and preset prior conditions, perform cutting processing on the feature map after the dimension reduction.
  • the feature map is divided, when cropping, only the grid area is the smallest unit, and there is no need to be accurate to the pixel level, which reduces the amount of data processing during image cropping and further improves the performance of image cropping.
  • the feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions, that is, the dimensionality reduction is performed along the length direction of the feature map.
  • the feature map of, and the dimensionality reduction feature map is equally divided along the width direction of the feature map.
  • the feature information of multiple pixels will be roughly the same , which helps to improve the accuracy of cutting.
  • the dimensionality-reduced feature map can be divided into 16*16, 12*12 grid areas.
  • the dimensionality-reduced feature map can be divided into 8*10 grid areas. . It can be understood that when step S401 is implemented, the feature map after dimension reduction may also be divided in a non-equal division manner.
  • the meshing model divides the feature map along the length and width directions of the reduced-dimensional feature map to obtain multiple grids Region; it is understandable that the feature map can also be divided along the length and width directions of the feature map after dimensionality reduction based on the grid division algorithm to obtain multiple grid regions.
  • FIG. 6 is an embodiment of the present invention, according to a plurality of grid regions and preset a priori conditions, the implementation of cutting the feature map after the dimensionality reduction, as shown in FIG. 6
  • the implementation process of cutting the feature map after dimensionality reduction can include:
  • extract the feature information of any pixel in each grid area such as the center pixel, and use the feature information of the center pixel as the feature information of the grid area; optionally, extract some pixels (at least The feature information of two pixels) is determined based on the feature information of the extracted partial pixels, for example, the average value of the feature information of the partial pixels is taken as the feature information of the grid region.
  • S602 According to the feature information of the grid area, extract feature information of a part of the grid area among the multiple grid areas, and the feature information of the part of the grid area includes at least the feature information of the region of interest in the original image;
  • the region of interest in the original image refers to the target region containing the target subject, and the existing target detection algorithm can be selected to determine the target region in the original image.
  • S603 Determine a part of the grid area as a target cropping area.
  • cropping results of different resolutions and aspect ratios can be obtained to meet different needs of users.
  • a plurality of target cropping areas of squares with different resolutions and aspect ratios are obtained.
  • the multiple target cropping regions determined in S603 can be further filtered to remove the target cropping regions that do not meet the requirements.
  • the process of cutting the feature map after dimensionality reduction further includes: determining one of the diagonal corners of the square target clipping region The two end points of the line are in the restricted area of the feature map.
  • the two restricted areas are distributed on both sides of the same diagonal line of the feature map, as far as possible to ensure that the target clipping area contains the target body; and the restricted area includes at least one grid area.
  • the restricted area includes a first restricted area (M1 in FIG. 5) and a second restricted area (M2 in FIG. 5).
  • the first restricted area is used to restrict the upper left corner of the target crop area in the feature map.
  • the second restricted area is used to restrict the position of the lower right corner of the target crop area in the feature map; optionally, the restricted area includes the third restricted area and the fourth restricted area, and the third restricted area is used to restrict the target cropped area
  • the upper right corner of the is located in the feature map, and the second restriction area is used to limit the location of the lower left corner of the target crop area in the feature map.
  • Each restricted area in this embodiment includes multiple grid areas to obtain multiple target cropping areas.
  • the center pixel of any grid area in the first restricted area is used as the upper left vertex of the target cropping area
  • the center pixel of any grid area in the second restricted area is used as the target cropping
  • multiple target clipping areas are obtained, such as the square area formed by the dotted line in Figure 5.
  • the process of cropping the feature map after dimensionality reduction according to multiple grid regions and preset prior conditions further includes: the aspect ratio of the target cropping region meets the preset aspect ratio strategy.
  • the aspect ratio of the target crop area is used to indicate that the target crop area is a rectangular area, which meets the conventional composition requirements.
  • the aspect ratio of the rectangular target crop area is 1:3, that is, the preset aspect ratio strategy It is: the aspect ratio of the target crop area is 1:3; optionally, the aspect ratio of the target crop area is used to indicate that the target crop area is a square area.
  • the process of cutting the feature map after dimensionality reduction further includes: the area proportion of the target cropping region is greater than or equal to the preset proportion threshold , Where the area ratio of the target cropping area is the ratio of the area of the target cropping area to the area of the feature map.
  • the size of the preset proportion threshold can be set as required, such as 1/4, that is, the area proportion of the target cropping area is greater than or equal to 1/4, so that the cropping result is more in line with the composition requirements.
  • the target cropping area that does not meet the requirements can be removed according to a combination of any two of the above three further screening strategies or a combination of the above three further screening strategies.
  • the feature map after the dimension reduction is cropped in the fully connected layer 30 to improve the performance of image cropping.
  • the image cropping method may further include: after performing cropping processing on the feature map after the dimension reduction, the feature information of the target cropped region is fed back to the convolutional neural network 10, based on this feedback At that time, the complexity of the network was increased and the processing results of the entire network could be optimized. Specifically, the feature information of the target crop region is used as the input of the convolutional neural network 10.
  • the image cropping method of the embodiment of the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes, so as to accurately and fully obtain the feature information of the original image, which is beneficial to Accurately model the image cropping results, so that the image cropping method can adapt to more complex application scenarios; and, by reducing the dimensionality of the feature map, and then cropping the feature map after the dimensionality reduction, the performance of image cropping can be improved and the image can be reduced.
  • the size of the cropping parameters makes the image cropping method suitable for chips with low power consumption.
  • the network structure of the embodiment of the present invention only needs less than 10 megabytes of parameters, and it can be equivalent to a large network of hundreds of megabytes such as vgg16. Performance greatly improves the practicability.
  • an embodiment of the present invention also provides an image cropping device.
  • the image cropping device 100 includes a storage device 110 and one or more processors 120.
  • the storage device 110 is used to store program instructions; one or more processors 120 call the program instructions stored in the storage device 110.
  • the one or more processors 120 are individually or jointly It is configured to: extract multiple original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimensionality of the feature map; crop the feature map after dimensionality reduction deal with.
  • the processor 120 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention.
  • the image cropping apparatus 100 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.
  • the image cropping device 100 of this embodiment can be a computer or other equipment with image processing capabilities, or can be a shooting device with a camera function, such as a camera, a video camera, a smart phone, a smart terminal, or a shooting stabilizer. Unmanned aerial vehicles and so on.
  • an embodiment of the present invention also provides a photographing device.
  • the photographing device 200 includes: an image acquisition module 210, a storage device 220, and one or more processors 230.
  • the image acquisition module 210 is used to obtain the original image; the storage device 220 is used to store program instructions; one or more processors 230 call the program instructions stored in the storage device 220.
  • one or The plurality of processors 230 are individually or collectively configured to: extract a plurality of original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; Dimension; cut the feature map after dimensionality reduction.
  • the image acquisition module 210 includes a lens and an imaging sensor matched with the lens, such as image sensors such as CCD and CMOS.
  • the processor 230 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention.
  • the image cutting method in the foregoing embodiment may be referred to for description of the photographing apparatus 200 in this embodiment.
  • the photographing device 200 can be a camera with a photographing function, a video camera, a smart phone, a smart terminal, a photographing stabilizer (such as a handheld PTZ), an unmanned aerial vehicle (such as a drone), and so on.
  • the unmanned aerial vehicle 300 includes: an image acquisition module 310, a storage device 320 and one or more processors 330.
  • the image acquisition module 310 is used to obtain the original image; the storage device 320 is used to store program instructions; one or more processors 330 call the program instructions stored in the storage device 320, and when the program instructions are executed At this time, the one or more processors 330 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
  • the image acquisition module 310 in this embodiment may be a camera, or may be a structure with a shooting function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).
  • an imaging sensor such as CCD, CMOS, etc.
  • the processor 330 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention.
  • the UAV 300 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.
  • the drone 300 in this embodiment of the present invention refers to an aerial photography drone, and other drones that do not have a camera function do not belong to the protection subject of this embodiment.
  • the UAV 300 may be a multi-rotor UAV or a fixed-wing UAV.
  • the embodiment of the present invention does not specifically limit the type of the UAV 300.
  • the image acquisition module 310 can be mounted on the fuselage (not shown) via a pan/tilt (not shown), and the image acquisition module 310 can be stabilized by the pan/tilt.
  • the pan/tilt may be a two-axis cloud.
  • the platform may also be a three-axis platform, which is not specifically limited in the embodiment of the present invention.
  • the mobile terminal 400 includes: an image acquisition module 410, a storage device 420, and one or more processors 430.
  • the image acquisition module 410 is used to obtain the original image; the storage device 420 is used to store program instructions; one or more processors 430 call the program instructions stored in the storage device 420, and when the program instructions are executed At this time, the one or more processors 430 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
  • the image acquisition module 410 in this embodiment is a camera built in the mobile terminal 400.
  • the mobile terminal 400 may be a smart mobile terminal such as a mobile phone or a tablet computer.
  • the processor 430 may implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention.
  • the mobile terminal 400 of this embodiment can be described with reference to the image cropping method of the foregoing embodiment.
  • the embodiment of the present invention also provides a handheld pan/tilt.
  • the handheld pan/tilt 500 includes: an image acquisition module 510, a storage device 520, and one or more processors 530.
  • the image acquisition module 510 is used to obtain the original image; the storage device 520 is used to store program instructions; one or more processors 530 call the program instructions stored in the storage device 520, and when the program instructions are executed At this time, the one or more processors 530 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
  • the image acquisition module 510 in this embodiment may be a camera, or a structure with a photographing function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).
  • the processor 530 can implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG.
  • the handheld pan/tilt 500 in this embodiment of the present invention refers to a pan/tilt with a camera function, and other pan/tilts without a camera function do not belong to the protection subject of this embodiment.
  • the foregoing storage device may include volatile memory, such as random-access memory (RAM); the storage device may also include non-volatile memory, such as flash memory ( flash memory, hard disk drive (HDD) or solid-state drive (SSD); the storage device 110 may also include a combination of the foregoing types of memory.
  • volatile memory such as random-access memory (RAM)
  • non-volatile memory such as flash memory ( flash memory, hard disk drive (HDD) or solid-state drive (SSD)
  • SSD solid-state drive
  • the storage device 110 may also include a combination of the foregoing types of memory.
  • the processor may be a central processing unit (CPU).
  • the processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) ) Or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • DSP Digital Signal Processor
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • the general-purpose processor may be a microprocessor or the processor 7 may also be any conventional processor or the like.
  • an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image cropping method in the foregoing embodiment are implemented. Specifically, when the program is executed by the processor, the following steps are implemented: extract multiple original image information with different distance characteristics; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimension of the feature map; Cut the feature map after dimensionality reduction.
  • the computer-readable storage medium may be the internal storage unit of the pan/tilt head described in any of the foregoing embodiments, such as a hard disk or a memory.
  • the computer-readable storage medium may also be an external storage device of the pan-tilt, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card (Flash Card), etc. equipped on the device .
  • the computer-readable storage medium may also include both an internal storage unit of the pan-tilt and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the pan/tilt, and can also be used to temporarily store data that has been output or will be output.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Abstract

Disclosed are an image cropping method and apparatus, and a photographing apparatus. The method comprises: extracting a plurality of pieces of original image information having different distance features (S101); extracting a feature map of the original image information at a distance scale of each piece of original image information (S102); reducing the dimension of the feature map (S103); and cropping the feature map that has been subjected to dimension reduction (S104). By means of extracting feature information of objects of various sizes, feature information of an original image can be accurately and fully obtained, thereby facilitating precise modeling of an image cropping result, such that the image cropping method can be applied to a more complex application scenario. Moreover, a feature map is subjected to dimension reduction, and then, the feature map that has been subjected to dimension reduction is cropped, thereby improving the performance of image cropping and reducing the magnitude of parameters of image cropping. The image cropping method is applied to a chip with relatively low power consumption and greatly improves the practicability.

Description

图像裁剪方法、装置和拍摄装置Image cropping method, device and shooting device 技术领域Technical field
本发明涉及图像处理领域,尤其涉及一种图像裁剪方法、装置和拍摄装置。The present invention relates to the field of image processing, in particular to an image cropping method, device and photographing device.
背景技术Background technique
很多情况下,由于用户来不及构图或者缺乏构图知识等原因导致拍摄的照片视觉效果不理想。针对此,可通过后期裁剪便对照片进行二次构图,以提升视觉效果。目前市面上的工具包提供手动裁剪功能,然而,手动裁剪照片要求用户具备一定的摄影构图知识;并且,当照片数量较多时,工作量会变得非常大。In many cases, the visual effects of the photos taken are not ideal due to the user's late composition or lack of composition knowledge. In response to this, the photo can be re-composed by post-cutting to enhance the visual effect. Toolkits currently on the market provide manual cropping functions. However, manual cropping of photos requires users to have a certain knowledge of photographic composition; and when the number of photos is large, the workload will become very large.
现有技术也存在照片自动裁剪的方法,实现对照片的自动裁剪。其中,早期的自动裁剪算法主要都基于注意力机制设计,旨在抓取照片中最主要的内容或者感兴趣的区域。通常,这类算法会先获取照片的显著性图,然后采用滑窗的形式选出最显著的窗口作为裁剪结果;部分算法也会同时引入人脸检测或者视觉交互信息作为辅助。然而,基于注意力机制设计算法由于没有考虑照片的整体构图,得到的裁剪结果往往视觉效果一般。The prior art also has a method for automatically cropping photos to realize automatic cropping of photos. Among them, the early automatic cropping algorithms are mainly designed based on the attention mechanism, aiming to capture the most important content or regions of interest in the photo. Generally, this type of algorithm first obtains the saliency map of the photo, and then uses the form of sliding window to select the most significant window as the cropping result; some algorithms also introduce face detection or visual interactive information as an aid. However, since the design algorithm based on the attention mechanism does not consider the overall composition of the photo, the cropping results obtained are often in general visual effects.
基于美学属性的方法重点研究照片的美学属性和摄影常用的构图法则,并通过手动设计特征的方式来建模这些属性和法则,然后学习一个美学分类器,例如支持向量机,以从众多裁剪选项中选出最好的结果。然而,手动设计的特征由于自身的局限,往往并难以精确的建模照片的美学属性。The method based on aesthetic attributes focuses on the aesthetic attributes of photos and the commonly used composition rules of photography, and models these attributes and rules by manually designing features, and then learns an aesthetic classifier, such as a support vector machine, to choose from many cropping options Choose the best result. However, the manually designed features are often difficult to accurately model the aesthetic properties of the photos due to their own limitations.
基于数据驱动的算法大多通过在标注的数据集上训练一个端到端的深度神经网络来实现照片的自动裁剪。这类算法大多直接采用在其他领域(如图像分类、目标检测)取得成功的标准神经网络架构,所用的深度模型都包含了几百兆的参数,需要非常强大的GPU和高功耗才能运转起来,这是相机等低功耗产品无法负担的。此外,由于没有考虑照片裁剪和构图本身的特性,这些方法的裁剪结果也不够稳定。Data-driven algorithms mostly implement automatic cropping of photos by training an end-to-end deep neural network on the labeled data set. Most of these algorithms directly adopt standard neural network architectures that have been successful in other fields (such as image classification, target detection). The depth models used contain hundreds of megabytes of parameters, and require very powerful GPUs and high power consumption to operate. This is something that low-power products such as cameras cannot afford. In addition, since the characteristics of photo cropping and composition itself are not considered, the cropping results of these methods are not stable enough.
发明内容Summary of the invention
本发明提供一种图像裁剪方法、装置和拍摄装置。The invention provides an image cropping method, device and photographing device.
具体地,本发明是通过如下技术方案实现的:Specifically, the present invention is implemented through the following technical solutions:
根据本发明的第一方面,提供一种图像裁剪方法,所述方法包括:According to a first aspect of the present invention, there is provided an image cropping method, the method including:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
根据本发明第二方面,提供一种图像裁剪装置,所述装置包括:According to a second aspect of the present invention, there is provided an image cropping device, the device comprising:
存储装置,用于存储程序指令;Storage device for storing program instructions;
一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置成用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
根据本发明第三方面,提供一种拍摄装置,所述拍摄装置包括:According to a third aspect of the present invention, there is provided a photographing device, the photographing device comprising:
图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
存储装置,用于存储程序指令;Storage device for storing program instructions;
一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置成用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
根据本发明第四方面,提供一种无人机,所述无人机包括:According to a fourth aspect of the present invention, an unmanned aerial vehicle is provided, and the unmanned aerial vehicle includes:
图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
存储装置,用于存储程序指令;Storage device for storing program instructions;
一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
根据本发明第五方面,提供一种移动终端,所述移动终端包括:According to a fifth aspect of the present invention, there is provided a mobile terminal, the mobile terminal comprising:
图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
存储装置,用于存储程序指令;Storage device for storing program instructions;
一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
根据本发明第六方面,提供一种手持云台,所述手持云台包括:According to a sixth aspect of the present invention, there is provided a handheld PTZ, the handheld PTZ includes:
图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
存储装置,用于存储程序指令;Storage device for storing program instructions;
一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
降低所述特征图的维度;Reduce the dimension of the feature map;
对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
由以上本发明实施例提供的技术方案可见,本发明提取具有不同距离特征的多个原始图像信息的特征图,即提取各种大小的物体的特征信息,以准确地、充分地获得原始图像的特征信息,有利于精确地建模图像裁剪结果,使得图像裁剪方法能够适应更加复杂的应用场景;并且,对特征图进行降维,再对降维后的特征图进行裁剪处理,能够提高图像裁剪的性能,降低图像裁剪的参数大小,使得图像裁剪方法适用于功耗较低的芯片,极大地提高了实用性。It can be seen from the technical solutions provided by the above embodiments of the present invention that the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes to accurately and fully obtain the original image. Feature information is conducive to accurately modeling the results of image cropping, so that the image cropping method can adapt to more complex application scenarios; and, reducing the dimensionality of the feature map, and then cropping the reduced feature map, can improve image cropping The performance of the image cropping parameter is reduced, so that the image cropping method is suitable for chips with lower power consumption, which greatly improves the practicability.
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1是本发明一实施例中图像裁剪方法的方法流程图;FIG. 1 is a method flowchart of an image cropping method in an embodiment of the present invention;
图2是本发明一实施例中卷积神经网络的网络结构示意图;2 is a schematic diagram of the network structure of a convolutional neural network in an embodiment of the present invention;
图3是本发明一实施例中卷积神经网络的一具体网络结构示意图;3 is a schematic diagram of a specific network structure of a convolutional neural network in an embodiment of the present invention;
图4是本发明一实施例中对降维后的特征图进行裁剪处理的方法流程图;FIG. 4 is a flowchart of a method for cutting a feature map after dimension reduction in an embodiment of the present invention;
图5是本发明一实施例中特征图的划分示意图;5 is a schematic diagram of the division of feature maps in an embodiment of the present invention;
图6是图5所示实施例中对降维后的特征图进行裁剪处理的具体方法流程图;FIG. 6 is a flow chart of a specific method for cutting the feature map after dimension reduction in the embodiment shown in FIG. 5;
图7是本发明一实施例中图像裁剪装置的结构框图;Fig. 7 is a structural block diagram of an image cropping device in an embodiment of the present invention;
图8是本发明一实施例中拍摄装置的结构框图;FIG. 8 is a structural block diagram of a photographing device in an embodiment of the present invention;
图9是本发明一实施例中无人机的结构框图;Figure 9 is a structural block diagram of a drone in an embodiment of the present invention;
图10是本发明一实施例中移动终端的结构框图;FIG. 10 is a structural block diagram of a mobile terminal in an embodiment of the present invention;
图11是本发明一实施例中手持云台的结构框图。Fig. 11 is a structural block diagram of a handheld pan/tilt in an embodiment of the present invention.
附图标记:1:原始图像;Reference signs: 1: original image;
10:卷积神经网络;11:第一子网络;12:第二子网络;13:第三子网络;10: Convolutional neural network; 11: the first sub-network; 12: the second sub-network; 13: the third sub-network;
20:1*1的卷积层;20: 1*1 convolutional layer;
30:全连接层。30: Fully connected layer.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
下面结合附图,对本发明的图像裁剪方法、装置和拍摄装置进行详细说明。在不冲突的情况下,下述的实施例及实施方式中的特征可以相互组合。The image cropping method, device, and shooting device of the present invention will be described in detail below in conjunction with the accompanying drawings. In the case of no conflict, the following embodiments and features in the implementation can be combined with each other.
图1是本发明一实施例中的图像裁剪方法的方法流程图,如图1所示,本发明实施例的图像裁剪方法可以包括如下步骤:Fig. 1 is a method flowchart of an image cropping method in an embodiment of the present invention. As shown in Fig. 1, the image cropping method of an embodiment of the present invention may include the following steps:
S101:提取具有不同距离特征的多个原始图像信息;S101: Extract multiple original image information with different distance features;
实际应用场景中的物体存在多种尺寸,即便同一个物体因为不同拍摄距离在图像上也有不同大小的体现,为了准确地提取各种大小的物体的特征信息,以提高图像裁剪的准确性,本发明实施例对同一原始图像,提取具有不同距离特征的多个原始图像信息。Objects in the actual application scene have multiple sizes. Even the same object has different sizes in the image due to different shooting distances. In order to accurately extract the feature information of objects of various sizes to improve the accuracy of image cutting, The embodiment of the invention extracts multiple original image information with different distance characteristics from the same original image.
对同一原始图像,可采用不同的策略提取具有不同距离特征的多个原始图像信息,比如,基于下采样、上采样或下采样与上采样结合的方式提取具有不同距离特征的多个原始图像信息。For the same original image, different strategies can be used to extract multiple original image information with different distance characteristics, for example, multiple original image information with different distance characteristics can be extracted based on down-sampling, up-sampling or a combination of down-sampling and up-sampling .
作为一种可行的实现方式,将原始图像输入预先训练的卷积神经网络中;由卷积神经网络对原始图像进行下采样处理,获得具有不同距离特征的多个原始图像信息。可选的,采用三层图像金字塔作为卷积神经网络的输入,三层图像金字塔逐层下采样2倍。本实施例中,对原始图像、原始图像下采样2倍获得的图像以及原始图像下采样4倍获得的图像分别进行特征提取,获得具有不同距离特征的多个原始图像信息,也即,本实施例的原始图像信息包括原始图像(如图2中的1)、原始图像下采样2倍获得的图像以及原始图像下采样4倍获得的图像。对于卷积神经网络的网络结构,将在下述实施例中进行详细说明。As a feasible implementation, the original image is input into a pre-trained convolutional neural network; the original image is downsampled by the convolutional neural network to obtain multiple original image information with different distance characteristics. Optionally, a three-layer image pyramid is used as the input of the convolutional neural network, and the three-layer image pyramid is downsampled layer by layer by 2 times. In this embodiment, feature extraction is performed on the original image, the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times, respectively, to obtain multiple original image information with different distance characteristics, that is, this embodiment The original image information of the example includes the original image (1 in Figure 2), the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times. The network structure of the convolutional neural network will be described in detail in the following embodiments.
S102:在每个原始图像信息的距离尺度上提取原始图像信息的特征图;S102: Extract a feature map of the original image information on the distance scale of each original image information;
可以采用不同的算法或策略在每个原始图像信息的距离尺度上提取原始图像信息的特征图,从而获得原始图像的颜色特征、纹理特征、形状特征和/或空间关系特征等等。可选的,在某些实施例中,基于预先训练的卷积神经网络在每个原始图像信息的距离尺度上提取原始图像信息的特征图。Different algorithms or strategies can be used to extract the feature map of the original image information on the distance scale of each original image information, so as to obtain the color feature, texture feature, shape feature, and/or spatial relationship feature of the original image. Optionally, in some embodiments, a feature map of the original image information is extracted on the distance scale of each original image information based on a pre-trained convolutional neural network.
下面,阐述一种具体的卷积神经网络的网络结构。In the following, the network structure of a specific convolutional neural network is described.
如图2所示,卷积神经网络10可包括多个子网络,子网络的输入为对应距离特征的原始图像信息;每个子网络用于提取对应距离特征的原始图像信息的特征图。其中,神经网络可以包括2、、3、4、5或者其他个数的子网络,在图2所示的实施例中,卷积神经网络10包括第一子网络11、第二子网络12和第三子网络13,其中,第一子网络11的输入为原始图像,第二子网络12的输入为原始图像下采样2倍获得的图像,第三子网络13的输入为原始图像下采样4倍获得的图像。As shown in FIG. 2, the convolutional neural network 10 may include multiple sub-networks, the input of the sub-network is the original image information corresponding to the distance feature; each sub-network is used to extract the feature map of the original image information corresponding to the distance feature. Among them, the neural network may include 2, 3, 4, 5 or other numbers of sub-networks. In the embodiment shown in FIG. 2, the convolutional neural network 10 includes a first sub-network 11, a second sub-network 12, and The third sub-network 13, wherein the input of the first sub-network 11 is the original image, the input of the second sub-network 12 is the image obtained by down-sampling the original image by 2 times, and the input of the third sub-network 13 is the original image down-sampling 4 Times the image obtained.
现有技术中,采用的标注网络架构(如VGG16),其在特征提取的过程中存在多次下采样(一般为5次32倍下采样),这就导致了物体空间信息的极大损失。假如原始图像大小为256*256,为了提升效率,对256*256进行5次下采样后,得到空间分辨率只有8*8的特征图,再对分辨率为8*8的特征图进行裁剪,裁剪框对应的分辨率则更低。在如此小的空间分辨率上,大部分信息都已经丢失,无法精确建模裁剪结果。而若减少下采样次数,又会导致特征的感受野减少,不足以表达大尺寸的物体。对于此,本发明实施例的各子网络对对应距离特征的原始图像信息先进行下采样,再进行上采样,同时保证足够大的感受野和空间分辨率。如图3所示,本实施例的子网络可包括依次连接的多个第一网络层和设于第一网络层之后的至少一个第二网络层,第一网络层用于下采样,第二网络层用于上采样;并且,第一网络层的数量大于第二网络层的数量。In the prior art, the labeling network architecture (such as VGG16) used has multiple downsampling (usually 5 times 32 times downsampling) during the feature extraction process, which leads to a great loss of object spatial information. If the original image size is 256*256, in order to improve efficiency, after 5 times downsampling of 256*256, a feature map with a spatial resolution of only 8*8 is obtained, and then the feature map with a resolution of 8*8 is cropped. The resolution corresponding to the cropping frame is lower. At such a small spatial resolution, most of the information has been lost, and it is impossible to accurately model the cropping results. However, if the number of downsampling is reduced, the receptive field of the feature will decrease, which is not enough to express large-scale objects. For this, each sub-network of the embodiment of the present invention first down-samples the original image information corresponding to the distance feature, and then up-sampling, while ensuring a sufficiently large receptive field and spatial resolution. As shown in Figure 3, the sub-network of this embodiment may include multiple first network layers connected in sequence and at least one second network layer set after the first network layer. The first network layer is used for downsampling, and the second The network layer is used for upsampling; and, the number of the first network layer is greater than the number of the second network layer.
所述子网络中,第一网络层、第二网络层的数量可根据需要设定,可选的,各子网络包括多个第二网络层。在图3所示的实施例中,各子网络包括依次连接的4个第一网络层以及依次连接在4个第一网络层之后的2个第二网络层;在其他实施例中, 各子网络包括依次连接的5个第一网络层和依次连接在5个第一网络层之后的2个第二网络层;可以理解,各子网络中,第一网络层、第二网络的数量也可以设置为其他,不限于上述所列举的实施例中的数量。In the sub-networks, the number of the first network layer and the second network layer can be set as required. Optionally, each sub-network includes multiple second network layers. In the embodiment shown in FIG. 3, each sub-network includes four first network layers connected in sequence and two second network layers connected in sequence after the four first network layers; in other embodiments, each sub-network The network includes 5 first network layers connected in sequence and 2 second network layers connected in sequence after the 5 first network layers; it is understandable that in each sub-network, the number of the first network layer and the second network can also be Setting to other is not limited to the number in the above-listed embodiments.
第一网络层可基于shuffle操作或卷积操作实现下采样,第二网络层可基于shuffle操作、反卷积操作、双三次插值(Bicubic interpolation)、最邻近插值(nearest neighbour interpolation)或双线性采样(bilinear interpolation)等实现上采样。作为一种具体的实现方式,第一网络层包括卷积层和池化层,第二网络层包括反卷积层。The first network layer can implement downsampling based on shuffle operation or convolution operation, and the second network layer can be based on shuffle operation, deconvolution operation, bicubic interpolation, nearest neighbor interpolation or bilinear Up-sampling is realized by sampling (bilinear interpolation). As a specific implementation manner, the first network layer includes a convolution layer and a pooling layer, and the second network layer includes a deconvolution layer.
进一步的,在某些实施例中,多个子网络共享权重参数,即各子网络中的卷积层和池化层的卷积核相同。为使得多个子网络输出的特征图的距离尺寸相同,可选的,子网络的首个第一网络层的卷积层的步长,不同于其他子网络的首个第一网络层的卷积层的步长;可选的,子网络的首个第一网络层的池化层的步长,不同于其他子网络的首个第一网络层的池化层的步长;可选的,子网络的首个第一网络层的卷积层的步长,不同于其他子网络的首个第一网络层的卷积层的步长,并且,子网络的首个第一网络层的池化层的步长,不同于其他子网络的首个第一网络层的池化层的步长。各子网络中,其他第一网络层(非首个第一网络层)的卷积层和池化层的步长与其他子网络中,对应位置的网络层的卷积层和池化层的步长相等,并且,各子网络中,第二网络层的反卷积层的步长与其他子网络中,第二网络层的反卷积层的步长相等。也即,通过调整各子网络中,首个第一网络层的卷积层和/或池化层的步长,即可保证各子网最终输出的特征图的距离尺寸为同一距离尺寸。更进一步的,各子网络中,其他第一网络层(非首个第一网络层)的卷积层和池化层的步长与第二网络层的反卷积层的步长也相等。Further, in some embodiments, multiple sub-networks share weight parameters, that is, the convolution kernels of the convolutional layer and the pooling layer in each sub-network are the same. In order to make the distance size of the feature maps output by multiple sub-networks the same, optionally, the step size of the first first network layer of the sub-network is different from the first first network layer of other sub-networks. The step size of the layer; optionally, the step size of the pooling layer of the first first network layer of the sub-network is different from the step size of the first first network layer of the other sub-networks; optional, The step size of the convolutional layer of the first first network layer of the sub-network is different from the step size of the first first network layer of the other sub-networks, and the pool of the first first network layer of the sub-network The step size of the pooling layer is different from the step size of the pooling layer of the first first network layer of other sub-networks. In each sub-network, the step size of the convolutional layer and pooling layer of the other first network layer (not the first first network layer) is the same as that of the convolutional layer and pooling layer of the network layer in the corresponding position in other sub-networks. The step size is equal, and in each sub-network, the step size of the deconvolution layer of the second network layer is equal to the step size of the deconvolution layer of the second network layer in other sub-networks. That is, by adjusting the step size of the convolutional layer and/or the pooling layer of the first first network layer in each sub-network, it can be ensured that the distance size of the feature map finally output by each sub-network is the same distance size. Furthermore, in each sub-network, the step size of the convolutional layer and the pooling layer of the other first network layer (not the first first network layer) is also equal to the step size of the deconvolution layer of the second network layer.
在图3所示的实施例中,第一子网络11的首个第一网络层的卷积层和池化层的步长为4,第二子网络12的首个第一网络层的卷积层和池化层的步长为2,第三子网络13的首个第一网络层的卷积层和池化层的步长为0。第一子网络11、第二子网络12以及第三子网络13中,其他第一网络层的卷积层和池化层的步长均为2,第二网络层的反卷积层的步长也为2。In the embodiment shown in FIG. 3, the step size of the first convolutional layer and the pooling layer of the first network layer of the first sub-network 11 is 4, and the volume of the first first network layer of the second sub-network 12 The step size of the accumulation layer and the pooling layer is 2, and the step size of the convolution layer and the pooling layer of the first network layer of the third sub-network 13 is 0. In the first sub-network 11, the second sub-network 12, and the third sub-network 13, the steps of the convolutional layer and the pooling layer of the other first network layer are all 2. The steps of the deconvolution layer of the second network layer The length is also 2.
在某些实施例中,针对每个子网络单独训练,确定各子网络的权重参数,对于距离尺寸较小的子网络,可以进一步减少该子网络的参数。In some embodiments, each sub-network is trained separately, and the weight parameter of each sub-network is determined. For a sub-network with a smaller distance size, the parameters of the sub-network can be further reduced.
为提高卷积神经网络10的复杂度以及网络深度,在某些实施例中,子网络的任一第一网络层的输出可作为其他任一子网络的任一第一网络层和/或第二网络层的输入。在某些实施例中,子网络的任一第二网络层的输出可作为其他任一子网络的任一第一网络层和/或第二网络层的输入,其中,子网络的。在某些实施例中,子网络的任一第一网络层的输出可作为其他任一子网络的任一第一网络层和/或第二网络层的输入,并且,子网络的任一第二网络层的输出可作为其他任一子网络的任一第一网络 层和/或第二网络层的输入。上述实施例中,在将某一网络层的输出作为其他某一网络层的输入时,某一网络层输出的特征图的距离尺寸与其他某一网络层输入的特征图的距离尺寸大小相等。并且,在具体实现时,将某一网络层的输出在特征图通道方向叠加至其他某一网络层的输入上。In order to improve the complexity and network depth of the convolutional neural network 10, in some embodiments, the output of any first network layer of the sub-network can be used as any first network layer and/or the first network layer of any other sub-network. Input to the second network layer. In some embodiments, the output of any second network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network, wherein the output of the sub-network. In some embodiments, the output of any first network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network. The output of the second network layer can be used as the input of any first network layer and/or second network layer of any other sub-network. In the above embodiment, when the output of a certain network layer is used as the input of another certain network layer, the distance size of the feature map output by a certain network layer is equal to the distance size of the feature map input by another certain network layer. In addition, in specific implementation, the output of a certain network layer is superimposed on the input of another certain network layer in the channel direction of the feature map.
S103:降低特征图的维度;S103: Reduce the dimension of the feature map;
在一些经典视觉任务如目标检测中,为了得到准确的目标窗口,一般需要采用一个参数很多的全连接层。比如一个7*7*512*4096的全连接层是很多目标检测网络的标配,这一设置也被一些图像裁剪模型采用。然而仅仅这一层的参数便高达392兆,这是目前相机系统完全无法负担的。不同于目标检测等任务,图像裁剪并不需要精确地识别出原始图像中的每一项内容,因此可以将特征图的通道维降到很低而不损失性能,从而极大的减少用于图像裁剪处理的全连接层的参数。In some classic vision tasks such as target detection, in order to obtain an accurate target window, a fully connected layer with many parameters is generally required. For example, a 7*7*512*4096 fully connected layer is the standard configuration of many target detection networks, and this setting is also adopted by some image cropping models. However, the parameters of this layer alone are as high as 392 trillion, which is completely unaffordable by the current camera system. Different from tasks such as target detection, image cropping does not need to accurately identify every content in the original image, so the channel dimension of the feature map can be reduced to a very low level without loss of performance, thereby greatly reducing the use of images Tailor the parameters of the fully connected layer.
作为一种可行的实现方式,结合图2和图3,将特征图输入1*1的卷积层20进行降维处理,从而降低特征图的维度。应当理解,降低特征图的维度的方式不限于采用1*1的卷积层20实现降维处理,也可用其它现有的降维算法来替代。As a feasible implementation manner, in combination with FIG. 2 and FIG. 3, the feature map is input to the 1*1 convolutional layer 20 for dimensionality reduction processing, thereby reducing the dimensionality of the feature map. It should be understood that the way of reducing the dimensionality of the feature map is not limited to using the 1*1 convolutional layer 20 to implement dimensionality reduction processing, and other existing dimensionality reduction algorithms can also be used instead.
S104:对降维后的特征图进行裁剪处理。S104: Perform cropping processing on the feature map after dimensionality reduction.
图4是本发明一实施例中对降维后的特征图进行裁剪处理的具体方法流程图,如图4所示,对降维后的特征图进行裁剪处理可包括:Fig. 4 is a flow chart of a specific method for cropping the feature map after dimensionality reduction in an embodiment of the present invention. As shown in Fig. 4, cropping the feature map after dimensionality reduction may include:
S401:分别沿着降维后的特征图的长度及宽度方向对特征图进行划分,得到多个网格区域,每个网格区域包括多个像素;S401: Divide the feature map along the length and width directions of the feature map after dimension reduction to obtain multiple grid regions, each grid region including multiple pixels;
S402:根据多个网格区域以及预设先验条件,对降维后的特征图进行裁剪处理。S402: According to a plurality of grid regions and preset prior conditions, perform cutting processing on the feature map after the dimension reduction.
由于人眼对于像素不是很敏感,如果存在像素级的偏差,人眼实际上是无法识别的,因此,本发明实施例对于精度的要求不需要到像素量级,基于此,将对降维后的特征图进行划分,在裁剪的时候只需要以网格区域为最小单位,无需精确到像素量级,减少图像裁剪时的数据处理量,进一步提高图像裁剪的性能。Since the human eye is not very sensitive to pixels, if there is a pixel-level deviation, the human eye is actually unable to recognize it. Therefore, the accuracy requirements of the embodiment of the present invention do not need to be on the pixel level. The feature map is divided, when cropping, only the grid area is the smallest unit, and there is no need to be accurate to the pixel level, which reduces the amount of data processing during image cropping and further improves the performance of image cropping.
可选的,在实现步骤S401时,分别沿着降维后的特征图的长度及宽度方向对特征图进行等分,得到多个网格区域,即沿着特征图的长度方向对降维后的特征图进行均分,并沿着特征图的宽度方向对降维后的特征图进行均分,这种基于等分方式划分获得的各网格区域中,多个像素的特征信息会大致相同,有利于提高裁剪的精确度。例如,可将降维后的特征图均分成16*16、12*12的网格区域,在图5所示的实施例中,将降维后的特征图均分成8*10的网格区域。可以理解,在实现步骤S401时,也可采用非等分方式对降维后的特征图进行划分。Optionally, when step S401 is implemented, the feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions, that is, the dimensionality reduction is performed along the length direction of the feature map. The feature map of, and the dimensionality reduction feature map is equally divided along the width direction of the feature map. In each grid area obtained by this division based on the equal division method, the feature information of multiple pixels will be roughly the same , Which helps to improve the accuracy of cutting. For example, the dimensionality-reduced feature map can be divided into 16*16, 12*12 grid areas. In the embodiment shown in Figure 5, the dimensionality-reduced feature map can be divided into 8*10 grid areas. . It can be understood that when step S401 is implemented, the feature map after dimension reduction may also be divided in a non-equal division manner.
可选的,将降维后的特征图输入预先训练的网格划分模型,由网格划分模型分别沿着降维后的特征图的长度及宽度方向对特征图进行划分,得到多个网格区域;可 以理解,也可基于网格划分算法分别沿着降维后的特征图的长度及宽度方向对特征图进行划分,得到多个网格区域。Optionally, input the reduced-dimensional feature map into a pre-trained meshing model, and the meshing model divides the feature map along the length and width directions of the reduced-dimensional feature map to obtain multiple grids Region; it is understandable that the feature map can also be divided along the length and width directions of the feature map after dimensionality reduction based on the grid division algorithm to obtain multiple grid regions.
预先先验条件可根据需要设计,图6为本发明实施例中一种根据多个网格区域以及预设先验条件,对降维后的特征图进行裁剪处理的实现方式,如图6所示,根据多个网格区域以及预设先验条件,对降维后的特征图进行裁剪处理的实现过程可包括:The pre-prior conditions can be designed as required. FIG. 6 is an embodiment of the present invention, according to a plurality of grid regions and preset a priori conditions, the implementation of cutting the feature map after the dimensionality reduction, as shown in FIG. 6 According to multiple grid regions and preset prior conditions, the implementation process of cutting the feature map after dimensionality reduction can include:
S601:以网格区域为最小特征提取单位,提取每个网格区域的特征信息;S601: Using the grid area as the minimum feature extraction unit, extract feature information of each grid area;
可选的,提取各网格区域中任一像素如中心像素的特征信息,将中心像素的特征信息作为该网格区域的特征信息;可选的,提取各网格区域中,部分像素(至少两个像素)的特征信息,基于提取的部分像素的特征信息,确定该网格区域的特征信息,如,取部分像素的特征信息的均值作为该网格区域的特征信息。Optionally, extract the feature information of any pixel in each grid area, such as the center pixel, and use the feature information of the center pixel as the feature information of the grid area; optionally, extract some pixels (at least The feature information of two pixels) is determined based on the feature information of the extracted partial pixels, for example, the average value of the feature information of the partial pixels is taken as the feature information of the grid region.
S602:根据网格区域的特征信息,提取多个网格区域中的部分网格区域的特征信息,部分网格区域的特征信息至少包括原始图像中兴趣区域的特征信息;S602: According to the feature information of the grid area, extract feature information of a part of the grid area among the multiple grid areas, and the feature information of the part of the grid area includes at least the feature information of the region of interest in the original image;
需要说明的是,原始图像中兴趣区域是指包含目标主体的目标区域,可选择现有目标检测算法确定原始图像中的目标区域。It should be noted that the region of interest in the original image refers to the target region containing the target subject, and the existing target detection algorithm can be selected to determine the target region in the original image.
S603:将部分网格区域确定为目标裁剪区域。S603: Determine a part of the grid area as a target cropping area.
基于上述裁剪方式,可获得不同分辨率和长宽比的裁剪结果,满足用户的不同需求。可选的,基于上述裁剪方式,获得多个不同分辨率和长宽比方形的目标裁剪区域。Based on the above-mentioned cropping method, cropping results of different resolutions and aspect ratios can be obtained to meet different needs of users. Optionally, based on the foregoing cropping method, a plurality of target cropping areas of squares with different resolutions and aspect ratios are obtained.
可对S603确定的多个目标裁剪区域进一步筛选,去除不满足需求的目标裁剪区域。可选的,在某些实施例中,根据多个网格区域以及预设先验条件,对降维后的特征图进行裁剪处理的过程进一步包括:确定方形的目标裁剪区域的其中一条对角线的两个端点在特征图中的限制区域。其中,两个限制区域分布在特征图的同一对角线的两侧,尽可能地确保目标裁剪区域包含目标主体;并且,限制区域包括至少一个网格区域。可选的,限制区域包括第一限制区域(如图5中的M1)和第二限制区域(如图5中的M2),第一限制区域用于限制目标裁剪区域的左上角在特征图中的位置,第二限制区域用于限制目标裁剪区域的右下角在特征图中的位置;可选的,限制区域包括第三限制区域和第四限制区域,第三限制区域用于限制目标裁剪区域的右上角在特征图中的位置,第二限制区域用于限制目标裁剪区域的左下角在特征图中的位置。The multiple target cropping regions determined in S603 can be further filtered to remove the target cropping regions that do not meet the requirements. Optionally, in some embodiments, according to a plurality of grid regions and preset prior conditions, the process of cutting the feature map after dimensionality reduction further includes: determining one of the diagonal corners of the square target clipping region The two end points of the line are in the restricted area of the feature map. Wherein, the two restricted areas are distributed on both sides of the same diagonal line of the feature map, as far as possible to ensure that the target clipping area contains the target body; and the restricted area includes at least one grid area. Optionally, the restricted area includes a first restricted area (M1 in FIG. 5) and a second restricted area (M2 in FIG. 5). The first restricted area is used to restrict the upper left corner of the target crop area in the feature map. The second restricted area is used to restrict the position of the lower right corner of the target crop area in the feature map; optionally, the restricted area includes the third restricted area and the fourth restricted area, and the third restricted area is used to restrict the target cropped area The upper right corner of the is located in the feature map, and the second restriction area is used to limit the location of the lower left corner of the target crop area in the feature map.
本实施例的每个限制区域包括多个网格区域,以获得多个目标裁剪区域。在图5所示的实施例中,将第一限制区域中任一网格区域的中心像素作为目标裁剪区域的左上角顶点,将第二限制区域中任一网格区域的中心像素作为目标裁剪区域的右上角顶点,获得多个目标裁剪区域,如图5中虚线形成的方形区域。Each restricted area in this embodiment includes multiple grid areas to obtain multiple target cropping areas. In the embodiment shown in FIG. 5, the center pixel of any grid area in the first restricted area is used as the upper left vertex of the target cropping area, and the center pixel of any grid area in the second restricted area is used as the target cropping At the top right vertex of the area, multiple target clipping areas are obtained, such as the square area formed by the dotted line in Figure 5.
在某些实施例中,根据多个网格区域以及预设先验条件,对降维后的特征图进 行裁剪处理的过程进一步包括:目标裁剪区域的长宽比满足预设长宽比策略。可选的,目标裁剪区域的长宽比用于指示目标裁剪区域为长方形区域,符合常规的构图需求,通常,长方形的目标裁剪区域的长宽比为1:3,即预设长宽比策略为:目标裁剪区域的长宽比为1:3;可选的,目标裁剪区域的长宽比用于指示目标裁剪区域为正方形区域。In some embodiments, the process of cropping the feature map after dimensionality reduction according to multiple grid regions and preset prior conditions further includes: the aspect ratio of the target cropping region meets the preset aspect ratio strategy. Optionally, the aspect ratio of the target crop area is used to indicate that the target crop area is a rectangular area, which meets the conventional composition requirements. Generally, the aspect ratio of the rectangular target crop area is 1:3, that is, the preset aspect ratio strategy It is: the aspect ratio of the target crop area is 1:3; optionally, the aspect ratio of the target crop area is used to indicate that the target crop area is a square area.
在某些实施例中,根据多个网格区域以及预设先验条件,对降维后的特征图进行裁剪处理的过程进一步包括:目标裁剪区域的面积占比大于或等于预设占比阈值,其中,目标裁剪区域的面积占比为目标裁剪区域的面积与特征图的面积的比例。预设占比阈值的大小可根据需要设定,如1/4,即目标裁剪区域的面积占比大于或等于1/4,使得裁剪结果更加符合构图需求。In some embodiments, according to multiple grid regions and preset a priori conditions, the process of cutting the feature map after dimensionality reduction further includes: the area proportion of the target cropping region is greater than or equal to the preset proportion threshold , Where the area ratio of the target cropping area is the ratio of the area of the target cropping area to the area of the feature map. The size of the preset proportion threshold can be set as required, such as 1/4, that is, the area proportion of the target cropping area is greater than or equal to 1/4, so that the cropping result is more in line with the composition requirements.
在某些实施例中,可以根据上述三种进一步筛选策略中任意两个的组合或者上述三种进一步筛选策略的组合来去除不满足需求的目标裁剪区域。In some embodiments, the target cropping area that does not meet the requirements can be removed according to a combination of any two of the above three further screening strategies or a combination of the above three further screening strategies.
请再次结合图2以及图3,在某些实施例中,在全连接层30中对降维后的特征图进行裁剪处理,提高图像裁剪的性能。Please combine FIG. 2 and FIG. 3 again. In some embodiments, the feature map after the dimension reduction is cropped in the fully connected layer 30 to improve the performance of image cropping.
此外,在某些实施例中,所述图像裁剪方法还可以包括:对降维后的特征图进行裁剪处理之后,将目标裁剪区域的特征信息反馈至卷积神经网络10中,基于这种反馈当时,提高网络的复杂度,并能够优化整个网络的处理结果。具体的,将目标裁剪区域的特征信息作为卷积神经网络10的输入。In addition, in some embodiments, the image cropping method may further include: after performing cropping processing on the feature map after the dimension reduction, the feature information of the target cropped region is fed back to the convolutional neural network 10, based on this feedback At that time, the complexity of the network was increased and the processing results of the entire network could be optimized. Specifically, the feature information of the target crop region is used as the input of the convolutional neural network 10.
本发明实施例的图像裁剪方法,提取具有不同距离特征的多个原始图像信息的特征图,即提取各种大小的物体的特征信息,以准确地、充分地获得原始图像的特征信息,有利于精确地建模图像裁剪结果,使得图像裁剪方法能够适应更加复杂的应用场景;并且,对特征图进行降维,再对降维后的特征图进行裁剪处理,能够提高图像裁剪的性能,降低图像裁剪的参数大小,使得图像裁剪方法适用于功耗较低的芯片,经过验证,本发明实施例的网络结构只需要不足10兆的参数,便能获得和几百兆的大型网络如vgg16相当的性能,极大地提高了实用性。The image cropping method of the embodiment of the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes, so as to accurately and fully obtain the feature information of the original image, which is beneficial to Accurately model the image cropping results, so that the image cropping method can adapt to more complex application scenarios; and, by reducing the dimensionality of the feature map, and then cropping the feature map after the dimensionality reduction, the performance of image cropping can be improved and the image can be reduced. The size of the cropping parameters makes the image cropping method suitable for chips with low power consumption. After verification, the network structure of the embodiment of the present invention only needs less than 10 megabytes of parameters, and it can be equivalent to a large network of hundreds of megabytes such as vgg16. Performance greatly improves the practicability.
对应于上述实施例的图像裁剪方法,本发明实施例还提供一种图像裁剪装置,参见图7,所述图像裁剪装置100包括:存储装置110和一个或多个处理器120。Corresponding to the image cropping method of the foregoing embodiment, an embodiment of the present invention also provides an image cropping device. Referring to FIG. 7, the image cropping device 100 includes a storage device 110 and one or more processors 120.
其中,存储装置110,用于存储程序指令;一个或多个处理器120,调用存储装置110中存储的程序指令,当程序指令被执行时,一个或多个处理器120单独地或共同地被配置成用于:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取原始图像信息的特征图;降低特征图的维度;对降维后的特征图进行裁剪处理。Among them, the storage device 110 is used to store program instructions; one or more processors 120 call the program instructions stored in the storage device 110. When the program instructions are executed, the one or more processors 120 are individually or jointly It is configured to: extract multiple original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimensionality of the feature map; crop the feature map after dimensionality reduction deal with.
处理器120可以实现如本发明图1、图4以及图6所示实施例的图像处理方法,可参见上述实施例的图像裁剪方法对本实施例的图像裁剪装置100进行说明。The processor 120 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The image cropping apparatus 100 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.
需要说明的是,本实施例的图像裁剪装置100可以为电脑等具备图像处理能力的设备,也可以为带有摄像功能的拍摄装置,如照相机,摄像机,智能手机,智能终端,拍摄稳定器,无人飞行器等等。It should be noted that the image cropping device 100 of this embodiment can be a computer or other equipment with image processing capabilities, or can be a shooting device with a camera function, such as a camera, a video camera, a smart phone, a smart terminal, or a shooting stabilizer. Unmanned aerial vehicles and so on.
对应于上述实施例的图像裁剪方法,本发明实施例还提供一种拍摄装置,参见图8,拍摄装置200包括:图像采集模块210、存储装置220和一个或多个处理器230。Corresponding to the image cropping method of the foregoing embodiment, an embodiment of the present invention also provides a photographing device. Referring to FIG. 8, the photographing device 200 includes: an image acquisition module 210, a storage device 220, and one or more processors 230.
其中,图像采集模块210,用于获得原始图像;存储装置220,用于存储程序指令;一个或多个处理器230,调用存储装置220中存储的程序指令,当程序指令被执行时,一个或多个处理器230单独地或共同地被配置成用于:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取原始图像信息的特征图;降低特征图的维度;对降维后的特征图进行裁剪处理。The image acquisition module 210 is used to obtain the original image; the storage device 220 is used to store program instructions; one or more processors 230 call the program instructions stored in the storage device 220. When the program instructions are executed, one or The plurality of processors 230 are individually or collectively configured to: extract a plurality of original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; Dimension; cut the feature map after dimensionality reduction.
可选的,图像采集模块210包括镜头和与镜头相配合的成像传感器,如CCD、CMOS等图像传感器。Optionally, the image acquisition module 210 includes a lens and an imaging sensor matched with the lens, such as image sensors such as CCD and CMOS.
处理器230可以实现如本发明图1、图4以及图6所示实施例的图像处理方法,可参见上述实施例的图像裁剪方法对本实施例的拍摄装置200进行说明。The processor 230 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The image cutting method in the foregoing embodiment may be referred to for description of the photographing apparatus 200 in this embodiment.
该拍摄装置200可为带有摄像功能的照相机,摄像机,智能手机,智能终端,拍摄稳定器(如手持云台),无人飞行器(如无人机)等等。The photographing device 200 can be a camera with a photographing function, a video camera, a smart phone, a smart terminal, a photographing stabilizer (such as a handheld PTZ), an unmanned aerial vehicle (such as a drone), and so on.
本发明实施例提供一种无人机,参见图9,所述无人机300包括:图像采集模块310、存储装置320和一个或多个处理器330。An embodiment of the present invention provides an unmanned aerial vehicle. Referring to FIG. 9, the unmanned aerial vehicle 300 includes: an image acquisition module 310, a storage device 320 and one or more processors 330.
其中,图像采集模块310,用于获得原始图像;存储装置320,用于存储程序指令;一个或多个处理器330,调用所述存储装置320中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器330单独地或共同地被配置用于:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;降低所述特征图的维度;对所述降维后的特征图进行裁剪处理。Among them, the image acquisition module 310 is used to obtain the original image; the storage device 320 is used to store program instructions; one or more processors 330 call the program instructions stored in the storage device 320, and when the program instructions are executed At this time, the one or more processors 330 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
本实施例的图像采集模块310可以为相机,也可以为镜头和成像传感器(如CCD、CMOS等)组合形成的具有拍摄功能的结构。The image acquisition module 310 in this embodiment may be a camera, or may be a structure with a shooting function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).
处理器330可以实现如本发明图1、图4以及图6所示实施例的图像处理方法,可参见上述实施例的图像裁剪方法对本实施例的无人机300进行说明。The processor 330 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The UAV 300 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.
需要说明的是,本发明实施例的无人机300是指航拍无人机,其他不具有摄像功能的无人机不属于本实施例的保护主体。It should be noted that the drone 300 in this embodiment of the present invention refers to an aerial photography drone, and other drones that do not have a camera function do not belong to the protection subject of this embodiment.
所述无人机300可为多旋翼无人机,也可为固定翼无人机,本发明实施例对无人机300的类型不作具体限定。The UAV 300 may be a multi-rotor UAV or a fixed-wing UAV. The embodiment of the present invention does not specifically limit the type of the UAV 300.
进一步的,所述图像采集模块310可通过云台(未标出)搭载在机身(未标出), 通过云台对图像采集模块310进行增稳,其中,该云台可为两轴云台,也可为三轴云台,本发明实施例对此不作具体限定。Further, the image acquisition module 310 can be mounted on the fuselage (not shown) via a pan/tilt (not shown), and the image acquisition module 310 can be stabilized by the pan/tilt. The pan/tilt may be a two-axis cloud. The platform may also be a three-axis platform, which is not specifically limited in the embodiment of the present invention.
本发明实施例还提供一种移动终端,参见图10,所述移动终端400包括:图像采集模块410、存储装置420和一个或多个处理器430。An embodiment of the present invention also provides a mobile terminal. Referring to FIG. 10, the mobile terminal 400 includes: an image acquisition module 410, a storage device 420, and one or more processors 430.
其中,图像采集模块410,用于获得原始图像;存储装置420,用于存储程序指令;一个或多个处理器430,调用所述存储装置420中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器430单独地或共同地被配置用于:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;降低所述特征图的维度;对所述降维后的特征图进行裁剪处理。The image acquisition module 410 is used to obtain the original image; the storage device 420 is used to store program instructions; one or more processors 430 call the program instructions stored in the storage device 420, and when the program instructions are executed At this time, the one or more processors 430 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
本实施例的图像采集模块410为移动终端400自带的摄像头。The image acquisition module 410 in this embodiment is a camera built in the mobile terminal 400.
该移动终端400可为手机或平板电脑等智能移动终端。The mobile terminal 400 may be a smart mobile terminal such as a mobile phone or a tablet computer.
处理器430可以实现如本发明图1、图4以及图6所示实施例的图像处理方法,可参见上述实施例的图像裁剪方法对本实施例的移动终端400进行说明。The processor 430 may implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The mobile terminal 400 of this embodiment can be described with reference to the image cropping method of the foregoing embodiment.
本发明实施例还提供一种手持云台,参见图11,所述手持云台500包括:图像采集模块510、存储装置520和一个或多个处理器530。The embodiment of the present invention also provides a handheld pan/tilt. Referring to FIG. 11, the handheld pan/tilt 500 includes: an image acquisition module 510, a storage device 520, and one or more processors 530.
其中,图像采集模块510,用于获得原始图像;存储装置520,用于存储程序指令;一个或多个处理器530,调用所述存储装置520中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器530单独地或共同地被配置用于:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;降低所述特征图的维度;对所述降维后的特征图进行裁剪处理。Among them, the image acquisition module 510 is used to obtain the original image; the storage device 520 is used to store program instructions; one or more processors 530 call the program instructions stored in the storage device 520, and when the program instructions are executed At this time, the one or more processors 530 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.
本实施例的图像采集模块510可以为相机,也可以为镜头和成像传感器(如CCD、CMOS等)组合形成的具有拍摄功能的结构。The image acquisition module 510 in this embodiment may be a camera, or a structure with a photographing function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).
处理器530可以实现如本发明图1、图4以及图6所示实施例的图像处理方法,可参见上述实施例的图像裁剪方法对本实施例的手持云台500进行说明。The processor 530 can implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG.
需要说明的是,本发明实施例的手持云台500是指带有摄像功能的云台,其他不具有摄像功能的云台不属于本实施例的保护主体。It should be noted that the handheld pan/tilt 500 in this embodiment of the present invention refers to a pan/tilt with a camera function, and other pan/tilts without a camera function do not belong to the protection subject of this embodiment.
上述存储装置可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储装置也可以包括非易失性存储器(non-volatile memory),例如快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储装置110还可以包括上述种类的存储器的组合。The foregoing storage device may include volatile memory, such as random-access memory (RAM); the storage device may also include non-volatile memory, such as flash memory ( flash memory, hard disk drive (HDD) or solid-state drive (SSD); the storage device 110 may also include a combination of the foregoing types of memory.
应当理解,本发明实施例中,处理器可以是中央处理器(central processing unit,CPU)。该处理器还可以是其它通用处理器、数字信号处理器(Digital Signal Processor, DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器7也可以是任何常规的处理器等。It should be understood that, in this embodiment of the present invention, the processor may be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) ) Or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor 7 may also be any conventional processor or the like.
此外,本发明实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例的图像裁剪方法的步骤。具体的,该程序被处理器执行时实现如下步骤:提取具有不同距离特征的多个原始图像信息;在每个原始图像信息的距离尺度上提取原始图像信息的特征图;降低特征图的维度;对降维后的特征图进行裁剪处理。In addition, an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image cropping method in the foregoing embodiment are implemented. Specifically, when the program is executed by the processor, the following steps are implemented: extract multiple original image information with different distance characteristics; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimension of the feature map; Cut the feature map after dimensionality reduction.
所述计算机可读存储介质可以是前述任一实施例所述的云台的内部存储单元,例如硬盘或内存。所述计算机可读存储介质也可以是云台的外部存储设备,例如所述设备上配备的插接式硬盘、智能存储卡(Smart Media Card,SMC)、SD卡、闪存卡(Flash Card)等。进一步的,所述计算机可读存储介质还可以既包括云台的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述云台所需的其他程序和数据,还可以用于暂时地存储已经输出或者将要输出的数据。The computer-readable storage medium may be the internal storage unit of the pan/tilt head described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of the pan-tilt, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card (Flash Card), etc. equipped on the device . Further, the computer-readable storage medium may also include both an internal storage unit of the pan-tilt and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the pan/tilt, and can also be used to temporarily store data that has been output or will be output.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
以上所揭露的仅为本发明部分实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。The above-disclosed are only some embodiments of the present invention, which of course cannot be used to limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims (40)

  1. 一种图像裁剪方法,其特征在于,所述方法包括:An image cropping method, characterized in that the method includes:
    提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
    在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
    降低所述特征图的维度;Reduce the dimension of the feature map;
    对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
  2. 根据权利要求1所述的方法,其特征在于,所述在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图,包括:The method according to claim 1, wherein the extracting the feature map of the original image information on the distance scale of each original image information comprises:
    基于预先训练的卷积神经网络在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图。The feature map of the original image information is extracted on the distance scale of each original image information based on the pre-trained convolutional neural network.
  3. 根据权利要求2所述的方法,其特征在于,所述卷积神经网络包括多个子网络,所述子网络的输入为对应距离特征的原始图像信息;The method according to claim 2, wherein the convolutional neural network comprises a plurality of sub-networks, and the input of the sub-networks is original image information corresponding to the distance feature;
    每个子网络用于提取对应距离特征的原始图像信息的特征图。Each sub-network is used to extract the feature map of the original image information corresponding to the distance feature.
  4. 根据权利要求3所述的方法,其特征在于,所述子网络包括依次连接的多个第一网络层和设于所述第一网络层之后的至少一个第二网络层,所述第一网络层用于下采样,所述第二网络层用于上采样;The method according to claim 3, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer arranged after the first network layer, and the first network Layer is used for downsampling, and the second network layer is used for upsampling;
    并且,所述第一网络层的数量大于所述第二网络层的数量。And, the number of the first network layer is greater than the number of the second network layer.
  5. 根据权利要求4所述的方法,其特征在于,所述子网络的任一第一网络层的输出作为其他任一子网络的任一第一网络层和/或第二网络层的输入;和/或The method according to claim 4, wherein the output of any first network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network; and /or
    所述子网络的任一第二网络层的输出作为其他任一子网络的任一第一网络层和/或第二网络层的输入。The output of any second network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network.
  6. 根据权利要求4所述的方法,其特征在于,所述子网络包括多个所述第二网络层。The method according to claim 4, wherein the sub-network includes a plurality of the second network layers.
  7. 根据权利要求4所述的方法,其特征在于,所述第一网络层包括卷积层和池化层,所述第二网络层包括反卷积层。The method according to claim 4, wherein the first network layer includes a convolution layer and a pooling layer, and the second network layer includes a deconvolution layer.
  8. 根据权利要求7所述的方法,其特征在于,多个所述子网络共享权重参数,所述子网络的首个第一网络层的卷积层的步长,不同于其他子网络的首个第一网络层的卷积层的步长,和/或,所述子网络的首个第一网络层的池化层的步长,不同于其他子网络的首个第一网络层的池化层的步长,使得多个所述子网络输出的特征图的距离尺寸相同。The method according to claim 7, wherein a plurality of the sub-networks share weight parameters, and the step size of the first convolutional layer of the first network layer of the sub-network is different from that of other sub-networks. The step size of the convolutional layer of the first network layer, and/or, the step size of the first first network layer of the sub-network is different from the first first network layer of the other sub-networks The step size of the layer makes the distance size of the feature maps output by the multiple sub-networks the same.
  9. 根据权利要求2所述的方法,其特征在于,所述对所述降维后的特征图进行裁剪处理之后,所述方法还包括:The method according to claim 2, characterized in that, after the cutting process is performed on the feature map after dimensionality reduction, the method further comprises:
    将目标裁剪区域的特征信息反馈至所述卷积神经网络中。The feature information of the target cropped area is fed back to the convolutional neural network.
  10. 根据权利要求1或2所述的方法,其特征在于,所述提取具有不同距离特征的多个原始图像信息,包括:The method according to claim 1 or 2, wherein the extracting multiple original image information with different distance characteristics comprises:
    将原始图像输入预先训练的卷积神经网络中;Input the original image into the pre-trained convolutional neural network;
    由所述卷积神经网络对所述原始图像进行下采样处理,获得具有不同距离特征的多个原始图像信息。The convolutional neural network performs down-sampling processing on the original image to obtain multiple original image information with different distance characteristics.
  11. 根据权利要求1所述的方法,其特征在于,所述对所述降维后的特征图进行裁剪处理,包括:The method according to claim 1, wherein the cutting the feature map after dimensionality reduction comprises:
    分别沿着所述降维后的特征图的长度及宽度方向对所述特征图进行划分,得到多个网格区域,每个网格区域包括多个像素;Dividing the feature map along the length and width directions of the feature map after dimensionality reduction to obtain a plurality of grid regions, each grid region including a plurality of pixels;
    根据多个所述网格区域以及预设先验条件,对所述降维后的特征图进行裁剪处理。According to the plurality of grid regions and preset prior conditions, the feature map after the dimensionality reduction is cropped.
  12. 根据权利要求11所述的方法,其特征在于,所述分别沿着所述降维后的特征图的长度及宽度方向对所述特征图进行划分,得到多个网格区域,包括:The method according to claim 11, wherein the dividing the feature map along the length and width directions of the feature map after dimensionality reduction respectively to obtain multiple grid regions comprises:
    分别沿着所述降维后的特征图的长度及宽度方向对所述特征图进行等分,得到多个网格区域。The feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions.
  13. 根据权利要求11所述的方法,其特征在于,所述根据多个所述网格区域以及预设先验条件,对所述降维后的特征图进行裁剪处理,包括:The method according to claim 11, wherein the cutting the dimensionality-reduced feature map according to a plurality of the grid regions and preset prior conditions comprises:
    以所述网格区域为最小特征提取单位,提取每个网格区域的特征信息;Taking the grid area as the minimum feature extraction unit, extracting feature information of each grid area;
    根据所述网格区域的特征信息,提取多个所述网格区域中的部分网格区域的特征信息,所述部分网格区域的特征信息至少包括原始图像中兴趣区域的特征信息;Extracting feature information of a part of the grid area of the plurality of grid areas according to the feature information of the grid area, the feature information of the part of the grid area at least including the feature information of the interest area in the original image;
    将所述部分网格区域确定为目标裁剪区域。The partial grid area is determined as the target cropping area.
  14. 根据权利要求13所述的方法,其特征在于,所述根据多个所述网格区域以及预设先验条件,对所述降维后的特征图进行裁剪处理,进一步包括以下至少一种:The method according to claim 13, wherein the cutting the dimensionality-reduced feature map according to a plurality of the grid regions and preset prior conditions further comprises at least one of the following:
    确定方形的目标裁剪区域的其中一条对角线的两个端点在所述特征图中的限制区域,其中,两个限制区域分布在所述特征图的同一对角线的两侧,且所述限制区域包括至少一个网格区域;It is determined that two end points of one of the diagonals of the square target cropping area are in the restricted area of the feature map, wherein the two restricted areas are distributed on both sides of the same diagonal of the feature map, and the The restricted area includes at least one grid area;
    所述目标裁剪区域的长宽比满足预设长宽比策略;The aspect ratio of the target cropping area satisfies a preset aspect ratio strategy;
    所述目标裁剪区域的面积占比大于或等于预设占比阈值,其中,所述目标裁剪区域的面积占比为所述目标裁剪区域的面积与所述特征图的面积的比例。The area proportion of the target cropping region is greater than or equal to a preset proportion threshold, wherein the area proportion of the target cropping region is a ratio of the area of the target cropping region to the area of the feature map.
  15. 根据权利要求14所述的方法,其特征在于,所述限制区域包括第一限制区域和第二限制区域,所述第一限制区域用于限制所述目标裁剪区域的左上角在所述特征图中的位置,所述第二限制区域用于限制所述目标裁剪区域的右下角在所述特征图中的位置。The method according to claim 14, wherein the restricted area includes a first restricted area and a second restricted area, and the first restricted area is used to restrict the upper left corner of the target crop area to be in the feature map. The second restriction area is used to restrict the position of the lower right corner of the target crop area in the feature map.
  16. 根据权利要求14所述的方法,其特征在于,所述目标裁剪区域的长宽比满足预设长宽比策略,包括:The method according to claim 14, wherein the aspect ratio of the target cropping area satisfies a preset aspect ratio strategy, comprising:
    所述目标裁剪区域的长宽比用于指示所述目标裁剪区域为长方形区域。The aspect ratio of the target cropping area is used to indicate that the target cropping area is a rectangular area.
  17. 根据权利要求1或2所述的方法,其特征在于,在全连接层中对所述降维后的特征图进行裁剪处理。The method according to claim 1 or 2, characterized in that the feature map after the dimension reduction is cropped in a fully connected layer.
  18. 根据权利要求1或2所述的方法,其特征在于,所述降低所述特征图的维度,包括:The method according to claim 1 or 2, wherein the reducing the dimension of the feature map comprises:
    将所述特征图输入1*1的卷积层进行降维处理。The feature map is input into a 1*1 convolutional layer for dimensionality reduction processing.
  19. 一种图像裁剪装置,其特征在于,所述装置包括:An image cropping device, characterized in that the device includes:
    存储装置,用于存储程序指令;Storage device for storing program instructions;
    一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置成用于:One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:
    提取具有不同距离特征的多个原始图像信息;Extract multiple original image information with different distance features;
    在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图;Extracting a feature map of the original image information on the distance scale of each original image information;
    降低所述特征图的维度;Reduce the dimension of the feature map;
    对所述降维后的特征图进行裁剪处理。The feature map after dimensionality reduction is cropped.
  20. 根据权利要求19所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The apparatus according to claim 19, wherein the one or more processors are separately or collectively further configured to:
    基于预先训练的卷积神经网络在每个原始图像信息的距离尺度上提取所述原始图像信息的特征图。The feature map of the original image information is extracted on the distance scale of each original image information based on the pre-trained convolutional neural network.
  21. 根据权利要求20所述的装置,其特征在于,所述卷积神经网络包括多个子网络,所述子网络的输入为对应距离特征的原始图像信息;The device according to claim 20, wherein the convolutional neural network comprises a plurality of sub-networks, and the input of the sub-networks is original image information corresponding to the distance feature;
    每个子网络用于提取对应距离特征的原始图像信息的特征图。Each sub-network is used to extract the feature map of the original image information corresponding to the distance feature.
  22. 根据权利要求21所述的装置,其特征在于,所述子网络包括依次连接的多个第一网络层和设于所述第一网络层之后的至少一个第二网络层,所述第一网络层用于下采样,所述第二网络层用于上采样;The device according to claim 21, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer disposed after the first network layer, and the first network Layer is used for downsampling, and the second network layer is used for upsampling;
    并且,所述第一网络层的数量大于所述第二网络层的数量。And, the number of the first network layer is greater than the number of the second network layer.
  23. 根据权利要求22所述的装置,其特征在于,所述子网络的任一第一网络层的输出作为其他任一子网络的任一第一网络层和/或第二网络层的输入;和/或The device according to claim 22, wherein the output of any first network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network; and /or
    所述子网络的任一第二网络层的输出作为其他任一子网络的任一第一网络层和/或第二网络层的输入。The output of any second network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network.
  24. 根据权利要求22所述的装置,其特征在于,所述子网络包括多个所述第二网络层。The apparatus according to claim 22, wherein the sub-network includes a plurality of the second network layers.
  25. 根据权利要求22所述的装置,其特征在于,所述第一网络层包括卷积层和池化层,所述第二网络层包括反卷积层。The apparatus according to claim 22, wherein the first network layer includes a convolutional layer and a pooling layer, and the second network layer includes a deconvolution layer.
  26. 根据权利要求25所述的装置,其特征在于,多个所述子网络共享权重参数,所述子网络的首个第一网络层的卷积层的步长,不同于其他子网络的首个第一网络层的卷积层的步长,和/或,所述子网络的首个第一网络层的池化层的步长,不同于其他子网络的首个第一网络层的池化层的步长,使得多个所述子网络输出的特征图的距离尺寸相同。The device according to claim 25, wherein a plurality of said sub-networks share weight parameters, and the step size of the first convolutional layer of the first network layer of the sub-network is different from that of other sub-networks. The step size of the convolutional layer of the first network layer, and/or, the step size of the first first network layer of the sub-network is different from the first first network layer of the other sub-networks The step size of the layer makes the distance size of the feature maps output by the multiple sub-networks the same.
  27. 根据权利要求20所述的装置,其特征在于,所述对所述降维后的特征图进行裁剪处理之后,所述一个或多个处理器单独地或共同地被进一步配置成用于:The apparatus according to claim 20, wherein after the cutting process is performed on the feature map after dimensionality reduction, the one or more processors are separately or collectively further configured to:
    将目标裁剪区域的特征信息反馈至所述卷积神经网络中。The feature information of the target cropped area is fed back to the convolutional neural network.
  28. 根据权利要求19或20所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to:
    将原始图像输入预先训练的卷积神经网络中;Input the original image into the pre-trained convolutional neural network;
    由所述卷积神经网络对所述原始图像进行下采样处理,获得具有不同距离特征的多个原始图像信息。The convolutional neural network performs down-sampling processing on the original image to obtain multiple original image information with different distance characteristics.
  29. 根据权利要求19所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The apparatus according to claim 19, wherein the one or more processors are separately or collectively further configured to:
    分别沿着所述降维后的特征图的长度及宽度方向对所述特征图进行划分,得到多个网格区域,每个网格区域包括多个像素;Dividing the feature map along the length and width directions of the feature map after dimensionality reduction to obtain a plurality of grid regions, each grid region including a plurality of pixels;
    根据多个所述网格区域以及预设先验条件,对所述降维后的特征图进行裁剪处理。According to the plurality of grid regions and preset prior conditions, the feature map after the dimensionality reduction is cropped.
  30. 根据权利要求29所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The apparatus according to claim 29, wherein the one or more processors are separately or collectively further configured to:
    分别沿着所述降维后的特征图的长度及宽度方向对所述特征图进行等分,得到多个网格区域。The feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions.
  31. 根据权利要求29所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The apparatus according to claim 29, wherein the one or more processors are separately or collectively further configured to:
    以所述网格区域为最小特征提取单位,提取每个网格区域的特征信息;Taking the grid area as the minimum feature extraction unit, extracting feature information of each grid area;
    根据所述网格区域的特征信息,提取多个所述网格区域中的部分网格区域的特征信息,所述部分网格区域的特征信息至少包括原始图像中兴趣区域的特征信息;Extracting feature information of a part of the grid area of the plurality of grid areas according to the feature information of the grid area, the feature information of the part of the grid area at least including the feature information of the interest area in the original image;
    将所述部分网格区域确定为目标裁剪区域。The partial grid area is determined as the target cropping area.
  32. 根据权利要求31所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于执行以下至少一种:The device according to claim 31, wherein the one or more processors are separately or collectively further configured to perform at least one of the following:
    确定方形的目标裁剪区域的其中一条对角线的两个端点在所述特征图中的限制区域,其中,两个限制区域分布在所述特征图的同一对角线的两侧,且所述限制区域包括至少一个网格区域;It is determined that two end points of one of the diagonals of the square target cropping area are in the restricted area of the feature map, wherein the two restricted areas are distributed on both sides of the same diagonal of the feature map, and the The restricted area includes at least one grid area;
    所述目标裁剪区域的长宽比满足预设长宽比策略;The aspect ratio of the target cropping area satisfies a preset aspect ratio strategy;
    所述目标裁剪区域的面积占比大于或等于预设占比阈值,其中,所述目标裁剪区域的面积占比为所述目标裁剪区域的面积与所述特征图的面积的比例。The area proportion of the target cropping region is greater than or equal to a preset proportion threshold, wherein the area proportion of the target cropping region is a ratio of the area of the target cropping region to the area of the feature map.
  33. 根据权利要求32所述的装置,其特征在于,所述限制区域包括第一限制区域和第二限制区域,所述第一限制区域用于限制所述目标裁剪区域的左上角在所述特征图中的位置,所述第二限制区域用于限制所述目标裁剪区域的右下角在所述特征图中的位置。The device according to claim 32, wherein the restricted area comprises a first restricted area and a second restricted area, and the first restricted area is used to restrict the upper left corner of the target cropping area to be in the feature map. The second restriction area is used to restrict the position of the lower right corner of the target crop area in the feature map.
  34. 根据权利要求32所述的装置,其特征在于,所述目标裁剪区域的长宽比满足预设长宽比策略,包括:The device according to claim 32, wherein the aspect ratio of the target cropping area satisfies a preset aspect ratio strategy, comprising:
    所述目标裁剪区域的长宽比用于指示所述目标裁剪区域为长方形区域。The aspect ratio of the target cropping area is used to indicate that the target cropping area is a rectangular area.
  35. 根据权利要求19或20所述的装置,其特征在于,所述一个或多个处理器单 独地或共同地被进一步配置成用于:在全连接层中对所述降维后的特征图进行裁剪处理。The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to: perform the dimensionality reduction feature map on the fully connected layer Cut processing.
  36. 根据权利要求19或20所述的装置,其特征在于,所述一个或多个处理器单独地或共同地被进一步配置成用于:The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to:
    将所述特征图输入1*1的卷积层进行降维处理。The feature map is input into a 1*1 convolutional layer for dimensionality reduction processing.
  37. 一种拍摄装置,其特征在于,所述拍摄装置包括:A photographing device, characterized in that the photographing device comprises:
    图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
    存储装置,用于存储程序指令;Storage device for storing program instructions;
    一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置成用于实施权利要求1-18之一所述的方法。One or more processors, which call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claim 1- The method described in one of 18.
  38. 一种无人机,其特征在于,所述无人机包括:An unmanned aerial vehicle, characterized in that the unmanned aerial vehicle comprises:
    图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
    存储装置,用于存储程序指令;Storage device for storing program instructions;
    一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于实施权利要求1-18之一所述的方法。One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.
  39. 一种移动终端,其特征在于,所述移动终端包括:A mobile terminal, characterized in that the mobile terminal comprises:
    图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
    存储装置,用于存储程序指令;Storage device for storing program instructions;
    一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于实施权利要求1-18之一所述的方法。One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.
  40. 一种手持云台,其特征在于,所述手持云台包括:A handheld PTZ, characterized in that the handheld PTZ includes:
    图像采集模块,用于获得原始图像;Image acquisition module for obtaining original images;
    存储装置,用于存储程序指令;Storage device for storing program instructions;
    一个或多个处理器,调用所述存储装置中存储的程序指令,当所述程序指令被执行时,所述一个或多个处理器单独地或共同地被配置用于实施权利要求1-18之一所述的方法。One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.
PCT/CN2019/087999 2019-05-22 2019-05-22 Image cropping method and apparatus, and photographing apparatus WO2020232672A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980009520.XA CN111684488A (en) 2019-05-22 2019-05-22 Image cropping method and device and shooting device
PCT/CN2019/087999 WO2020232672A1 (en) 2019-05-22 2019-05-22 Image cropping method and apparatus, and photographing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/087999 WO2020232672A1 (en) 2019-05-22 2019-05-22 Image cropping method and apparatus, and photographing apparatus

Publications (1)

Publication Number Publication Date
WO2020232672A1 true WO2020232672A1 (en) 2020-11-26

Family

ID=72433306

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087999 WO2020232672A1 (en) 2019-05-22 2019-05-22 Image cropping method and apparatus, and photographing apparatus

Country Status (2)

Country Link
CN (1) CN111684488A (en)
WO (1) WO2020232672A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012293A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Stone carving model construction method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742791A (en) * 2022-04-02 2022-07-12 深圳市国电科技通信有限公司 Auxiliary defect detection method and device for printed circuit board assembly and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115922A1 (en) * 2004-09-22 2006-06-01 Fuji Photo Film Co., Ltd. Photo movie creating method, apparatus, and program
US20130108171A1 (en) * 2011-10-28 2013-05-02 Raymond William Ptucha Image Recomposition From Face Detection And Facial Features
CN106296760A (en) * 2015-05-21 2017-01-04 腾讯科技(深圳)有限公司 The method of cutting out of picture and device
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
CN107610131A (en) * 2017-08-25 2018-01-19 百度在线网络技术(北京)有限公司 A kind of image cropping method and image cropping device
CN108154464A (en) * 2017-12-06 2018-06-12 中国科学院自动化研究所 The method and device of picture automatic cutting based on intensified learning
CN108510504A (en) * 2018-03-22 2018-09-07 北京航空航天大学 Image partition method and device
CN109448001A (en) * 2018-10-26 2019-03-08 山东世纪开元电子商务集团有限公司 A kind of picture automatic cutting method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019046239A (en) * 2017-09-04 2019-03-22 大日本印刷株式会社 Image processing apparatus, image processing method, program, and image data for synthesis
CN108376386A (en) * 2018-03-23 2018-08-07 深圳天琴医疗科技有限公司 A kind of construction method and device of the super-resolution model of image
CN109166130B (en) * 2018-08-06 2021-06-22 北京市商汤科技开发有限公司 Image processing method and image processing device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115922A1 (en) * 2004-09-22 2006-06-01 Fuji Photo Film Co., Ltd. Photo movie creating method, apparatus, and program
US20130108171A1 (en) * 2011-10-28 2013-05-02 Raymond William Ptucha Image Recomposition From Face Detection And Facial Features
CN106296760A (en) * 2015-05-21 2017-01-04 腾讯科技(深圳)有限公司 The method of cutting out of picture and device
CN106650737A (en) * 2016-11-21 2017-05-10 中国科学院自动化研究所 Image automatic cutting method
CN107610131A (en) * 2017-08-25 2018-01-19 百度在线网络技术(北京)有限公司 A kind of image cropping method and image cropping device
CN108154464A (en) * 2017-12-06 2018-06-12 中国科学院自动化研究所 The method and device of picture automatic cutting based on intensified learning
CN108510504A (en) * 2018-03-22 2018-09-07 北京航空航天大学 Image partition method and device
CN109448001A (en) * 2018-10-26 2019-03-08 山东世纪开元电子商务集团有限公司 A kind of picture automatic cutting method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012293A (en) * 2021-03-22 2021-06-22 平安科技(深圳)有限公司 Stone carving model construction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111684488A (en) 2020-09-18

Similar Documents

Publication Publication Date Title
EP4050558A1 (en) Image fusion method and apparatus, storage medium, and electronic device
US10121229B2 (en) Self-portrait enhancement techniques
WO2018201809A1 (en) Double cameras-based image processing device and method
JP7002056B2 (en) 3D model generator and 3D model generation method
WO2020192706A1 (en) Object three-dimensional model reconstruction method and device
EP3454250A1 (en) Facial image processing method and apparatus and storage medium
US9196071B2 (en) Image splicing method and apparatus
US9571819B1 (en) Efficient dense stereo computation
WO2017016050A1 (en) Image preview method, apparatus and terminal
US20190251675A1 (en) Image processing method, image processing device and storage medium
WO2020113408A1 (en) Image processing method and device, unmanned aerial vehicle, system, and storage medium
CN109474780B (en) Method and device for image processing
WO2019041276A1 (en) Image processing method, and unmanned aerial vehicle and system
US9106838B2 (en) Automatic photographing method and system thereof
WO2020258286A1 (en) Image processing method and device, photographing device and movable platform
WO2020024112A1 (en) Photography processing method, device and storage medium
WO2020232672A1 (en) Image cropping method and apparatus, and photographing apparatus
CN111292413A (en) Image model processing method and device, storage medium and electronic device
KR102262671B1 (en) Method and storage medium for applying bokeh effect to video images
US20220392027A1 (en) Method for calibrating image distortion, apparatus, electronic device and storage medium
CN116681636A (en) Light infrared and visible light image fusion method based on convolutional neural network
CN114466133B (en) Photographing method and device
CN112333468B (en) Image processing method, device, equipment and storage medium
WO2021168804A1 (en) Image processing method, image processing apparatus and image processing system
CN116051736A (en) Three-dimensional reconstruction method, device, edge equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19929769

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19929769

Country of ref document: EP

Kind code of ref document: A1