WO2020232672A1

WO2020232672A1 - Image cropping method and apparatus, and photographing apparatus

Info

Publication number: WO2020232672A1
Application number: PCT/CN2019/087999
Authority: WO
Inventors: 曾辉; 曹子晟; 胡攀
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2020-11-26
Also published as: CN111684488A

Abstract

Disclosed are an image cropping method and apparatus, and a photographing apparatus. The method comprises: extracting a plurality of pieces of original image information having different distance features (S101); extracting a feature map of the original image information at a distance scale of each piece of original image information (S102); reducing the dimension of the feature map (S103); and cropping the feature map that has been subjected to dimension reduction (S104). By means of extracting feature information of objects of various sizes, feature information of an original image can be accurately and fully obtained, thereby facilitating precise modeling of an image cropping result, such that the image cropping method can be applied to a more complex application scenario. Moreover, a feature map is subjected to dimension reduction, and then, the feature map that has been subjected to dimension reduction is cropped, thereby improving the performance of image cropping and reducing the magnitude of parameters of image cropping. The image cropping method is applied to a chip with relatively low power consumption and greatly improves the practicability.

Description

Image cropping method, device and shooting device

Technical field

The present invention relates to the field of image processing, in particular to an image cropping method, device and photographing device.

Background technique

In many cases, the visual effects of the photos taken are not ideal due to the user's late composition or lack of composition knowledge. In response to this, the photo can be re-composed by post-cutting to enhance the visual effect. Toolkits currently on the market provide manual cropping functions. However, manual cropping of photos requires users to have a certain knowledge of photographic composition; and when the number of photos is large, the workload will become very large.

The prior art also has a method for automatically cropping photos to realize automatic cropping of photos. Among them, the early automatic cropping algorithms are mainly designed based on the attention mechanism, aiming to capture the most important content or regions of interest in the photo. Generally, this type of algorithm first obtains the saliency map of the photo, and then uses the form of sliding window to select the most significant window as the cropping result; some algorithms also introduce face detection or visual interactive information as an aid. However, since the design algorithm based on the attention mechanism does not consider the overall composition of the photo, the cropping results obtained are often in general visual effects.

The method based on aesthetic attributes focuses on the aesthetic attributes of photos and the commonly used composition rules of photography, and models these attributes and rules by manually designing features, and then learns an aesthetic classifier, such as a support vector machine, to choose from many cropping options Choose the best result. However, the manually designed features are often difficult to accurately model the aesthetic properties of the photos due to their own limitations.

Data-driven algorithms mostly implement automatic cropping of photos by training an end-to-end deep neural network on the labeled data set. Most of these algorithms directly adopt standard neural network architectures that have been successful in other fields (such as image classification, target detection). The depth models used contain hundreds of megabytes of parameters, and require very powerful GPUs and high power consumption to operate. This is something that low-power products such as cameras cannot afford. In addition, since the characteristics of photo cropping and composition itself are not considered, the cropping results of these methods are not stable enough.

Summary of the invention

The invention provides an image cropping method, device and photographing device.

Specifically, the present invention is implemented through the following technical solutions:

According to a first aspect of the present invention, there is provided an image cropping method, the method including:

Extract multiple original image information with different distance features;

Extracting a feature map of the original image information on the distance scale of each original image information;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

According to a second aspect of the present invention, there is provided an image cropping device, the device comprising:

Storage device for storing program instructions;

One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:

Extract multiple original image information with different distance features;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

According to a third aspect of the present invention, there is provided a photographing device, the photographing device comprising:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

Extract multiple original image information with different distance features;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

According to a fourth aspect of the present invention, an unmanned aerial vehicle is provided, and the unmanned aerial vehicle includes:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

Extract multiple original image information with different distance features;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

According to a fifth aspect of the present invention, there is provided a mobile terminal, the mobile terminal comprising:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

Extract multiple original image information with different distance features;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

According to a sixth aspect of the present invention, there is provided a handheld PTZ, the handheld PTZ includes:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

Extract multiple original image information with different distance features;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.

It can be seen from the technical solutions provided by the above embodiments of the present invention that the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes to accurately and fully obtain the original image. Feature information is conducive to accurately modeling the results of image cropping, so that the image cropping method can adapt to more complex application scenarios; and, reducing the dimensionality of the feature map, and then cropping the reduced feature map, can improve image cropping The performance of the image cropping parameter is reduced, so that the image cropping method is suitable for chips with lower power consumption, which greatly improves the practicability.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present invention, the following will briefly introduce the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is a method flowchart of an image cropping method in an embodiment of the present invention;

2 is a schematic diagram of the network structure of a convolutional neural network in an embodiment of the present invention;

3 is a schematic diagram of a specific network structure of a convolutional neural network in an embodiment of the present invention;

FIG. 4 is a flowchart of a method for cutting a feature map after dimension reduction in an embodiment of the present invention;

5 is a schematic diagram of the division of feature maps in an embodiment of the present invention;

FIG. 6 is a flow chart of a specific method for cutting the feature map after dimension reduction in the embodiment shown in FIG. 5;

Fig. 7 is a structural block diagram of an image cropping device in an embodiment of the present invention;

FIG. 8 is a structural block diagram of a photographing device in an embodiment of the present invention;

Figure 9 is a structural block diagram of a drone in an embodiment of the present invention;

FIG. 10 is a structural block diagram of a mobile terminal in an embodiment of the present invention;

Fig. 11 is a structural block diagram of a handheld pan/tilt in an embodiment of the present invention.

Reference signs: 1: original image;

10: Convolutional neural network; 11: the first sub-network; 12: the second sub-network; 13: the third sub-network;

20: 1*1 convolutional layer;

30: Fully connected layer.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

The image cropping method, device, and shooting device of the present invention will be described in detail below in conjunction with the accompanying drawings. In the case of no conflict, the following embodiments and features in the implementation can be combined with each other.

Fig. 1 is a method flowchart of an image cropping method in an embodiment of the present invention. As shown in Fig. 1, the image cropping method of an embodiment of the present invention may include the following steps:

S101: Extract multiple original image information with different distance features;

Objects in the actual application scene have multiple sizes. Even the same object has different sizes in the image due to different shooting distances. In order to accurately extract the feature information of objects of various sizes to improve the accuracy of image cutting, The embodiment of the invention extracts multiple original image information with different distance characteristics from the same original image.

For the same original image, different strategies can be used to extract multiple original image information with different distance characteristics, for example, multiple original image information with different distance characteristics can be extracted based on down-sampling, up-sampling or a combination of down-sampling and up-sampling .

As a feasible implementation, the original image is input into a pre-trained convolutional neural network; the original image is downsampled by the convolutional neural network to obtain multiple original image information with different distance characteristics. Optionally, a three-layer image pyramid is used as the input of the convolutional neural network, and the three-layer image pyramid is downsampled layer by layer by 2 times. In this embodiment, feature extraction is performed on the original image, the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times, respectively, to obtain multiple original image information with different distance characteristics, that is, this embodiment The original image information of the example includes the original image (1 in Figure 2), the image obtained by down-sampling the original image by 2 times, and the image obtained by down-sampling the original image by 4 times. The network structure of the convolutional neural network will be described in detail in the following embodiments.

S102: Extract a feature map of the original image information on the distance scale of each original image information;

Different algorithms or strategies can be used to extract the feature map of the original image information on the distance scale of each original image information, so as to obtain the color feature, texture feature, shape feature, and/or spatial relationship feature of the original image. Optionally, in some embodiments, a feature map of the original image information is extracted on the distance scale of each original image information based on a pre-trained convolutional neural network.

In the following, the network structure of a specific convolutional neural network is described.

As shown in FIG. 2, the convolutional neural network 10 may include multiple sub-networks, the input of the sub-network is the original image information corresponding to the distance feature; each sub-network is used to extract the feature map of the original image information corresponding to the distance feature. Among them, the neural network may include 2, 3, 4, 5 or other numbers of sub-networks. In the embodiment shown in FIG. 2, the convolutional neural network 10 includes a first sub-network 11, a second sub-network 12, and The third sub-network 13, wherein the input of the first sub-network 11 is the original image, the input of the second sub-network 12 is the image obtained by down-sampling the original image by 2 times, and the input of the third sub-network 13 is the original image down-sampling 4 Times the image obtained.

In the prior art, the labeling network architecture (such as VGG16) used has multiple downsampling (usually 5 times 32 times downsampling) during the feature extraction process, which leads to a great loss of object spatial information. If the original image size is 256*256, in order to improve efficiency, after 5 times downsampling of 256*256, a feature map with a spatial resolution of only 8*8 is obtained, and then the feature map with a resolution of 8*8 is cropped. The resolution corresponding to the cropping frame is lower. At such a small spatial resolution, most of the information has been lost, and it is impossible to accurately model the cropping results. However, if the number of downsampling is reduced, the receptive field of the feature will decrease, which is not enough to express large-scale objects. For this, each sub-network of the embodiment of the present invention first down-samples the original image information corresponding to the distance feature, and then up-sampling, while ensuring a sufficiently large receptive field and spatial resolution. As shown in Figure 3, the sub-network of this embodiment may include multiple first network layers connected in sequence and at least one second network layer set after the first network layer. The first network layer is used for downsampling, and the second The network layer is used for upsampling; and, the number of the first network layer is greater than the number of the second network layer.

In the sub-networks, the number of the first network layer and the second network layer can be set as required. Optionally, each sub-network includes multiple second network layers. In the embodiment shown in FIG. 3, each sub-network includes four first network layers connected in sequence and two second network layers connected in sequence after the four first network layers; in other embodiments, each sub-network The network includes 5 first network layers connected in sequence and 2 second network layers connected in sequence after the 5 first network layers; it is understandable that in each sub-network, the number of the first network layer and the second network can also be Setting to other is not limited to the number in the above-listed embodiments.

The first network layer can implement downsampling based on shuffle operation or convolution operation, and the second network layer can be based on shuffle operation, deconvolution operation, bicubic interpolation, nearest neighbor interpolation or bilinear Up-sampling is realized by sampling (bilinear interpolation). As a specific implementation manner, the first network layer includes a convolution layer and a pooling layer, and the second network layer includes a deconvolution layer.

Further, in some embodiments, multiple sub-networks share weight parameters, that is, the convolution kernels of the convolutional layer and the pooling layer in each sub-network are the same. In order to make the distance size of the feature maps output by multiple sub-networks the same, optionally, the step size of the first first network layer of the sub-network is different from the first first network layer of other sub-networks. The step size of the layer; optionally, the step size of the pooling layer of the first first network layer of the sub-network is different from the step size of the first first network layer of the other sub-networks; optional, The step size of the convolutional layer of the first first network layer of the sub-network is different from the step size of the first first network layer of the other sub-networks, and the pool of the first first network layer of the sub-network The step size of the pooling layer is different from the step size of the pooling layer of the first first network layer of other sub-networks. In each sub-network, the step size of the convolutional layer and pooling layer of the other first network layer (not the first first network layer) is the same as that of the convolutional layer and pooling layer of the network layer in the corresponding position in other sub-networks. The step size is equal, and in each sub-network, the step size of the deconvolution layer of the second network layer is equal to the step size of the deconvolution layer of the second network layer in other sub-networks. That is, by adjusting the step size of the convolutional layer and/or the pooling layer of the first first network layer in each sub-network, it can be ensured that the distance size of the feature map finally output by each sub-network is the same distance size. Furthermore, in each sub-network, the step size of the convolutional layer and the pooling layer of the other first network layer (not the first first network layer) is also equal to the step size of the deconvolution layer of the second network layer.

In the embodiment shown in FIG. 3, the step size of the first convolutional layer and the pooling layer of the first network layer of the first sub-network 11 is 4, and the volume of the first first network layer of the second sub-network 12 The step size of the accumulation layer and the pooling layer is 2, and the step size of the convolution layer and the pooling layer of the first network layer of the third sub-network 13 is 0. In the first sub-network 11, the second sub-network 12, and the third sub-network 13, the steps of the convolutional layer and the pooling layer of the other first network layer are all 2. The steps of the deconvolution layer of the second network layer The length is also 2.

In some embodiments, each sub-network is trained separately, and the weight parameter of each sub-network is determined. For a sub-network with a smaller distance size, the parameters of the sub-network can be further reduced.

In order to improve the complexity and network depth of the convolutional neural network 10, in some embodiments, the output of any first network layer of the sub-network can be used as any first network layer and/or the first network layer of any other sub-network. Input to the second network layer. In some embodiments, the output of any second network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network, wherein the output of the sub-network. In some embodiments, the output of any first network layer of the sub-network can be used as the input of any first network layer and/or the second network layer of any other sub-network. The output of the second network layer can be used as the input of any first network layer and/or second network layer of any other sub-network. In the above embodiment, when the output of a certain network layer is used as the input of another certain network layer, the distance size of the feature map output by a certain network layer is equal to the distance size of the feature map input by another certain network layer. In addition, in specific implementation, the output of a certain network layer is superimposed on the input of another certain network layer in the channel direction of the feature map.

S103: Reduce the dimension of the feature map;

In some classic vision tasks such as target detection, in order to obtain an accurate target window, a fully connected layer with many parameters is generally required. For example, a 7*7*512*4096 fully connected layer is the standard configuration of many target detection networks, and this setting is also adopted by some image cropping models. However, the parameters of this layer alone are as high as 392 trillion, which is completely unaffordable by the current camera system. Different from tasks such as target detection, image cropping does not need to accurately identify every content in the original image, so the channel dimension of the feature map can be reduced to a very low level without loss of performance, thereby greatly reducing the use of images Tailor the parameters of the fully connected layer.

As a feasible implementation manner, in combination with FIG. 2 and FIG. 3, the feature map is input to the 1*1 convolutional layer 20 for dimensionality reduction processing, thereby reducing the dimensionality of the feature map. It should be understood that the way of reducing the dimensionality of the feature map is not limited to using the 1*1 convolutional layer 20 to implement dimensionality reduction processing, and other existing dimensionality reduction algorithms can also be used instead.

S104: Perform cropping processing on the feature map after dimensionality reduction.

Fig. 4 is a flow chart of a specific method for cropping the feature map after dimensionality reduction in an embodiment of the present invention. As shown in Fig. 4, cropping the feature map after dimensionality reduction may include:

S401: Divide the feature map along the length and width directions of the feature map after dimension reduction to obtain multiple grid regions, each grid region including multiple pixels;

S402: According to a plurality of grid regions and preset prior conditions, perform cutting processing on the feature map after the dimension reduction.

Since the human eye is not very sensitive to pixels, if there is a pixel-level deviation, the human eye is actually unable to recognize it. Therefore, the accuracy requirements of the embodiment of the present invention do not need to be on the pixel level. The feature map is divided, when cropping, only the grid area is the smallest unit, and there is no need to be accurate to the pixel level, which reduces the amount of data processing during image cropping and further improves the performance of image cropping.

Optionally, when step S401 is implemented, the feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions, that is, the dimensionality reduction is performed along the length direction of the feature map. The feature map of, and the dimensionality reduction feature map is equally divided along the width direction of the feature map. In each grid area obtained by this division based on the equal division method, the feature information of multiple pixels will be roughly the same , Which helps to improve the accuracy of cutting. For example, the dimensionality-reduced feature map can be divided into 16*16, 12*12 grid areas. In the embodiment shown in Figure 5, the dimensionality-reduced feature map can be divided into 8*10 grid areas. . It can be understood that when step S401 is implemented, the feature map after dimension reduction may also be divided in a non-equal division manner.

Optionally, input the reduced-dimensional feature map into a pre-trained meshing model, and the meshing model divides the feature map along the length and width directions of the reduced-dimensional feature map to obtain multiple grids Region; it is understandable that the feature map can also be divided along the length and width directions of the feature map after dimensionality reduction based on the grid division algorithm to obtain multiple grid regions.

The pre-prior conditions can be designed as required. FIG. 6 is an embodiment of the present invention, according to a plurality of grid regions and preset a priori conditions, the implementation of cutting the feature map after the dimensionality reduction, as shown in FIG. 6 According to multiple grid regions and preset prior conditions, the implementation process of cutting the feature map after dimensionality reduction can include:

S601: Using the grid area as the minimum feature extraction unit, extract feature information of each grid area;

Optionally, extract the feature information of any pixel in each grid area, such as the center pixel, and use the feature information of the center pixel as the feature information of the grid area; optionally, extract some pixels (at least The feature information of two pixels) is determined based on the feature information of the extracted partial pixels, for example, the average value of the feature information of the partial pixels is taken as the feature information of the grid region.

S602: According to the feature information of the grid area, extract feature information of a part of the grid area among the multiple grid areas, and the feature information of the part of the grid area includes at least the feature information of the region of interest in the original image;

It should be noted that the region of interest in the original image refers to the target region containing the target subject, and the existing target detection algorithm can be selected to determine the target region in the original image.

S603: Determine a part of the grid area as a target cropping area.

Based on the above-mentioned cropping method, cropping results of different resolutions and aspect ratios can be obtained to meet different needs of users. Optionally, based on the foregoing cropping method, a plurality of target cropping areas of squares with different resolutions and aspect ratios are obtained.

The multiple target cropping regions determined in S603 can be further filtered to remove the target cropping regions that do not meet the requirements. Optionally, in some embodiments, according to a plurality of grid regions and preset prior conditions, the process of cutting the feature map after dimensionality reduction further includes: determining one of the diagonal corners of the square target clipping region The two end points of the line are in the restricted area of the feature map. Wherein, the two restricted areas are distributed on both sides of the same diagonal line of the feature map, as far as possible to ensure that the target clipping area contains the target body; and the restricted area includes at least one grid area. Optionally, the restricted area includes a first restricted area (M1 in FIG. 5) and a second restricted area (M2 in FIG. 5). The first restricted area is used to restrict the upper left corner of the target crop area in the feature map. The second restricted area is used to restrict the position of the lower right corner of the target crop area in the feature map; optionally, the restricted area includes the third restricted area and the fourth restricted area, and the third restricted area is used to restrict the target cropped area The upper right corner of the is located in the feature map, and the second restriction area is used to limit the location of the lower left corner of the target crop area in the feature map.

Each restricted area in this embodiment includes multiple grid areas to obtain multiple target cropping areas. In the embodiment shown in FIG. 5, the center pixel of any grid area in the first restricted area is used as the upper left vertex of the target cropping area, and the center pixel of any grid area in the second restricted area is used as the target cropping At the top right vertex of the area, multiple target clipping areas are obtained, such as the square area formed by the dotted line in Figure 5.

In some embodiments, the process of cropping the feature map after dimensionality reduction according to multiple grid regions and preset prior conditions further includes: the aspect ratio of the target cropping region meets the preset aspect ratio strategy. Optionally, the aspect ratio of the target crop area is used to indicate that the target crop area is a rectangular area, which meets the conventional composition requirements. Generally, the aspect ratio of the rectangular target crop area is 1:3, that is, the preset aspect ratio strategy It is: the aspect ratio of the target crop area is 1:3; optionally, the aspect ratio of the target crop area is used to indicate that the target crop area is a square area.

In some embodiments, according to multiple grid regions and preset a priori conditions, the process of cutting the feature map after dimensionality reduction further includes: the area proportion of the target cropping region is greater than or equal to the preset proportion threshold , Where the area ratio of the target cropping area is the ratio of the area of the target cropping area to the area of the feature map. The size of the preset proportion threshold can be set as required, such as 1/4, that is, the area proportion of the target cropping area is greater than or equal to 1/4, so that the cropping result is more in line with the composition requirements.

In some embodiments, the target cropping area that does not meet the requirements can be removed according to a combination of any two of the above three further screening strategies or a combination of the above three further screening strategies.

Please combine FIG. 2 and FIG. 3 again. In some embodiments, the feature map after the dimension reduction is cropped in the fully connected layer 30 to improve the performance of image cropping.

In addition, in some embodiments, the image cropping method may further include: after performing cropping processing on the feature map after the dimension reduction, the feature information of the target cropped region is fed back to the convolutional neural network 10, based on this feedback At that time, the complexity of the network was increased and the processing results of the entire network could be optimized. Specifically, the feature information of the target crop region is used as the input of the convolutional neural network 10.

The image cropping method of the embodiment of the present invention extracts feature maps of multiple original image information with different distance characteristics, that is, extracts feature information of objects of various sizes, so as to accurately and fully obtain the feature information of the original image, which is beneficial to Accurately model the image cropping results, so that the image cropping method can adapt to more complex application scenarios; and, by reducing the dimensionality of the feature map, and then cropping the feature map after the dimensionality reduction, the performance of image cropping can be improved and the image can be reduced. The size of the cropping parameters makes the image cropping method suitable for chips with low power consumption. After verification, the network structure of the embodiment of the present invention only needs less than 10 megabytes of parameters, and it can be equivalent to a large network of hundreds of megabytes such as vgg16. Performance greatly improves the practicability.

Corresponding to the image cropping method of the foregoing embodiment, an embodiment of the present invention also provides an image cropping device. Referring to FIG. 7, the image cropping device 100 includes a storage device 110 and one or more processors 120.

Among them, the storage device 110 is used to store program instructions; one or more processors 120 call the program instructions stored in the storage device 110. When the program instructions are executed, the one or more processors 120 are individually or jointly It is configured to: extract multiple original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimensionality of the feature map; crop the feature map after dimensionality reduction deal with.

The processor 120 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The image cropping apparatus 100 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.

It should be noted that the image cropping device 100 of this embodiment can be a computer or other equipment with image processing capabilities, or can be a shooting device with a camera function, such as a camera, a video camera, a smart phone, a smart terminal, or a shooting stabilizer. Unmanned aerial vehicles and so on.

Corresponding to the image cropping method of the foregoing embodiment, an embodiment of the present invention also provides a photographing device. Referring to FIG. 8, the photographing device 200 includes: an image acquisition module 210, a storage device 220, and one or more processors 230.

The image acquisition module 210 is used to obtain the original image; the storage device 220 is used to store program instructions; one or more processors 230 call the program instructions stored in the storage device 220. When the program instructions are executed, one or The plurality of processors 230 are individually or collectively configured to: extract a plurality of original image information with different distance features; extract the feature map of the original image information on the distance scale of each original image information; Dimension; cut the feature map after dimensionality reduction.

Optionally, the image acquisition module 210 includes a lens and an imaging sensor matched with the lens, such as image sensors such as CCD and CMOS.

The processor 230 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The image cutting method in the foregoing embodiment may be referred to for description of the photographing apparatus 200 in this embodiment.

The photographing device 200 can be a camera with a photographing function, a video camera, a smart phone, a smart terminal, a photographing stabilizer (such as a handheld PTZ), an unmanned aerial vehicle (such as a drone), and so on.

An embodiment of the present invention provides an unmanned aerial vehicle. Referring to FIG. 9, the unmanned aerial vehicle 300 includes: an image acquisition module 310, a storage device 320 and one or more processors 330.

Among them, the image acquisition module 310 is used to obtain the original image; the storage device 320 is used to store program instructions; one or more processors 330 call the program instructions stored in the storage device 320, and when the program instructions are executed At this time, the one or more processors 330 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.

The image acquisition module 310 in this embodiment may be a camera, or may be a structure with a shooting function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).

The processor 330 may implement the image processing method in the embodiments shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The UAV 300 in this embodiment can be described with reference to the image cropping method in the foregoing embodiment.

It should be noted that the drone 300 in this embodiment of the present invention refers to an aerial photography drone, and other drones that do not have a camera function do not belong to the protection subject of this embodiment.

The UAV 300 may be a multi-rotor UAV or a fixed-wing UAV. The embodiment of the present invention does not specifically limit the type of the UAV 300.

Further, the image acquisition module 310 can be mounted on the fuselage (not shown) via a pan/tilt (not shown), and the image acquisition module 310 can be stabilized by the pan/tilt. The pan/tilt may be a two-axis cloud. The platform may also be a three-axis platform, which is not specifically limited in the embodiment of the present invention.

An embodiment of the present invention also provides a mobile terminal. Referring to FIG. 10, the mobile terminal 400 includes: an image acquisition module 410, a storage device 420, and one or more processors 430.

The image acquisition module 410 is used to obtain the original image; the storage device 420 is used to store program instructions; one or more processors 430 call the program instructions stored in the storage device 420, and when the program instructions are executed At this time, the one or more processors 430 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.

The image acquisition module 410 in this embodiment is a camera built in the mobile terminal 400.

The mobile terminal 400 may be a smart mobile terminal such as a mobile phone or a tablet computer.

The processor 430 may implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG. 6 of the present invention. The mobile terminal 400 of this embodiment can be described with reference to the image cropping method of the foregoing embodiment.

The embodiment of the present invention also provides a handheld pan/tilt. Referring to FIG. 11, the handheld pan/tilt 500 includes: an image acquisition module 510, a storage device 520, and one or more processors 530.

Among them, the image acquisition module 510 is used to obtain the original image; the storage device 520 is used to store program instructions; one or more processors 530 call the program instructions stored in the storage device 520, and when the program instructions are executed At this time, the one or more processors 530 are individually or collectively configured to: extract multiple original image information with different distance characteristics; extract the original image information on the distance scale of each original image information Feature map; reduce the dimension of the feature map; perform a cropping process on the feature map after dimensionality reduction.

The image acquisition module 510 in this embodiment may be a camera, or a structure with a photographing function formed by a combination of a lens and an imaging sensor (such as CCD, CMOS, etc.).

The processor 530 can implement the image processing method of the embodiment shown in FIG. 1, FIG. 4, and FIG.

It should be noted that the handheld pan/tilt 500 in this embodiment of the present invention refers to a pan/tilt with a camera function, and other pan/tilts without a camera function do not belong to the protection subject of this embodiment.

The foregoing storage device may include volatile memory, such as random-access memory (RAM); the storage device may also include non-volatile memory, such as flash memory ( flash memory, hard disk drive (HDD) or solid-state drive (SSD); the storage device 110 may also include a combination of the foregoing types of memory.

It should be understood that, in this embodiment of the present invention, the processor may be a central processing unit (CPU). The processor can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) ) Or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor 7 may also be any conventional processor or the like.

In addition, an embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image cropping method in the foregoing embodiment are implemented. Specifically, when the program is executed by the processor, the following steps are implemented: extract multiple original image information with different distance characteristics; extract the feature map of the original image information on the distance scale of each original image information; reduce the dimension of the feature map; Cut the feature map after dimensionality reduction.

The computer-readable storage medium may be the internal storage unit of the pan/tilt head described in any of the foregoing embodiments, such as a hard disk or a memory. The computer-readable storage medium may also be an external storage device of the pan-tilt, such as a plug-in hard disk, a smart media card (SMC), an SD card, a flash card (Flash Card), etc. equipped on the device . Further, the computer-readable storage medium may also include both an internal storage unit of the pan-tilt and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the pan/tilt, and can also be used to temporarily store data that has been output or will be output.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

The above-disclosed are only some embodiments of the present invention, which of course cannot be used to limit the scope of rights of the present invention. Therefore, equivalent changes made according to the claims of the present invention still fall within the scope of the present invention.

Claims

An image cropping method, characterized in that the method includes:

Extract multiple original image information with different distance features;

Extracting a feature map of the original image information on the distance scale of each original image information;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.
The method according to claim 1, wherein the extracting the feature map of the original image information on the distance scale of each original image information comprises:

The feature map of the original image information is extracted on the distance scale of each original image information based on the pre-trained convolutional neural network.
The method according to claim 2, wherein the convolutional neural network comprises a plurality of sub-networks, and the input of the sub-networks is original image information corresponding to the distance feature;

Each sub-network is used to extract the feature map of the original image information corresponding to the distance feature.
The method according to claim 3, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer arranged after the first network layer, and the first network Layer is used for downsampling, and the second network layer is used for upsampling;

And, the number of the first network layer is greater than the number of the second network layer.
The method according to claim 4, wherein the output of any first network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network; and /or

The output of any second network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network.
The method according to claim 4, wherein the sub-network includes a plurality of the second network layers.
The method according to claim 4, wherein the first network layer includes a convolution layer and a pooling layer, and the second network layer includes a deconvolution layer.
The method according to claim 7, wherein a plurality of the sub-networks share weight parameters, and the step size of the first convolutional layer of the first network layer of the sub-network is different from that of other sub-networks. The step size of the convolutional layer of the first network layer, and/or, the step size of the first first network layer of the sub-network is different from the first first network layer of the other sub-networks The step size of the layer makes the distance size of the feature maps output by the multiple sub-networks the same.
The method according to claim 2, characterized in that, after the cutting process is performed on the feature map after dimensionality reduction, the method further comprises:

The feature information of the target cropped area is fed back to the convolutional neural network.
The method according to claim 1 or 2, wherein the extracting multiple original image information with different distance characteristics comprises:

Input the original image into the pre-trained convolutional neural network;

The convolutional neural network performs down-sampling processing on the original image to obtain multiple original image information with different distance characteristics.
The method according to claim 1, wherein the cutting the feature map after dimensionality reduction comprises:

Dividing the feature map along the length and width directions of the feature map after dimensionality reduction to obtain a plurality of grid regions, each grid region including a plurality of pixels;

According to the plurality of grid regions and preset prior conditions, the feature map after the dimensionality reduction is cropped.
The method according to claim 11, wherein the dividing the feature map along the length and width directions of the feature map after dimensionality reduction respectively to obtain multiple grid regions comprises:

The feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions.
The method according to claim 11, wherein the cutting the dimensionality-reduced feature map according to a plurality of the grid regions and preset prior conditions comprises:

Taking the grid area as the minimum feature extraction unit, extracting feature information of each grid area;

Extracting feature information of a part of the grid area of the plurality of grid areas according to the feature information of the grid area, the feature information of the part of the grid area at least including the feature information of the interest area in the original image;

The partial grid area is determined as the target cropping area.
The method according to claim 13, wherein the cutting the dimensionality-reduced feature map according to a plurality of the grid regions and preset prior conditions further comprises at least one of the following:

It is determined that two end points of one of the diagonals of the square target cropping area are in the restricted area of the feature map, wherein the two restricted areas are distributed on both sides of the same diagonal of the feature map, and the The restricted area includes at least one grid area;

The aspect ratio of the target cropping area satisfies a preset aspect ratio strategy;

The area proportion of the target cropping region is greater than or equal to a preset proportion threshold, wherein the area proportion of the target cropping region is a ratio of the area of the target cropping region to the area of the feature map.
The method according to claim 14, wherein the restricted area includes a first restricted area and a second restricted area, and the first restricted area is used to restrict the upper left corner of the target crop area to be in the feature map. The second restriction area is used to restrict the position of the lower right corner of the target crop area in the feature map.
The method according to claim 14, wherein the aspect ratio of the target cropping area satisfies a preset aspect ratio strategy, comprising:

The aspect ratio of the target cropping area is used to indicate that the target cropping area is a rectangular area.
The method according to claim 1 or 2, characterized in that the feature map after the dimension reduction is cropped in a fully connected layer.
The method according to claim 1 or 2, wherein the reducing the dimension of the feature map comprises:

The feature map is input into a 1*1 convolutional layer for dimensionality reduction processing.
An image cropping device, characterized in that the device includes:

Storage device for storing program instructions;

One or more processors call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to:

Extract multiple original image information with different distance features;

Extracting a feature map of the original image information on the distance scale of each original image information;

Reduce the dimension of the feature map;

The feature map after dimensionality reduction is cropped.
The apparatus according to claim 19, wherein the one or more processors are separately or collectively further configured to:

The feature map of the original image information is extracted on the distance scale of each original image information based on the pre-trained convolutional neural network.
The device according to claim 20, wherein the convolutional neural network comprises a plurality of sub-networks, and the input of the sub-networks is original image information corresponding to the distance feature;

Each sub-network is used to extract the feature map of the original image information corresponding to the distance feature.
The device according to claim 21, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer disposed after the first network layer, and the first network Layer is used for downsampling, and the second network layer is used for upsampling;

And, the number of the first network layer is greater than the number of the second network layer.
The device according to claim 22, wherein the output of any first network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network; and /or

The output of any second network layer of the sub-network is used as the input of any first network layer and/or the second network layer of any other sub-network.
The apparatus according to claim 22, wherein the sub-network includes a plurality of the second network layers.
The apparatus according to claim 22, wherein the first network layer includes a convolutional layer and a pooling layer, and the second network layer includes a deconvolution layer.
The device according to claim 25, wherein a plurality of said sub-networks share weight parameters, and the step size of the first convolutional layer of the first network layer of the sub-network is different from that of other sub-networks. The step size of the convolutional layer of the first network layer, and/or, the step size of the first first network layer of the sub-network is different from the first first network layer of the other sub-networks The step size of the layer makes the distance size of the feature maps output by the multiple sub-networks the same.
The apparatus according to claim 20, wherein after the cutting process is performed on the feature map after dimensionality reduction, the one or more processors are separately or collectively further configured to:

The feature information of the target cropped area is fed back to the convolutional neural network.
The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to:

Input the original image into the pre-trained convolutional neural network;

The convolutional neural network performs down-sampling processing on the original image to obtain multiple original image information with different distance characteristics.
The apparatus according to claim 19, wherein the one or more processors are separately or collectively further configured to:

Dividing the feature map along the length and width directions of the feature map after dimensionality reduction to obtain a plurality of grid regions, each grid region including a plurality of pixels;

According to the plurality of grid regions and preset prior conditions, the feature map after the dimensionality reduction is cropped.
The apparatus according to claim 29, wherein the one or more processors are separately or collectively further configured to:

The feature map is equally divided along the length and width directions of the feature map after dimensionality reduction to obtain multiple grid regions.
The apparatus according to claim 29, wherein the one or more processors are separately or collectively further configured to:

Taking the grid area as the minimum feature extraction unit, extracting feature information of each grid area;

Extracting feature information of a part of the grid area of the plurality of grid areas according to the feature information of the grid area, the feature information of the part of the grid area at least including the feature information of the interest area in the original image;

The partial grid area is determined as the target cropping area.
The device according to claim 31, wherein the one or more processors are separately or collectively further configured to perform at least one of the following:

It is determined that two end points of one of the diagonals of the square target cropping area are in the restricted area of the feature map, wherein the two restricted areas are distributed on both sides of the same diagonal of the feature map, and the The restricted area includes at least one grid area;

The aspect ratio of the target cropping area satisfies a preset aspect ratio strategy;

The area proportion of the target cropping region is greater than or equal to a preset proportion threshold, wherein the area proportion of the target cropping region is a ratio of the area of the target cropping region to the area of the feature map.
The device according to claim 32, wherein the restricted area comprises a first restricted area and a second restricted area, and the first restricted area is used to restrict the upper left corner of the target cropping area to be in the feature map. The second restriction area is used to restrict the position of the lower right corner of the target crop area in the feature map.
The device according to claim 32, wherein the aspect ratio of the target cropping area satisfies a preset aspect ratio strategy, comprising:

The aspect ratio of the target cropping area is used to indicate that the target cropping area is a rectangular area.
The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to: perform the dimensionality reduction feature map on the fully connected layer Cut processing.
The device according to claim 19 or 20, wherein the one or more processors are separately or collectively further configured to:

The feature map is input into a 1*1 convolutional layer for dimensionality reduction processing.
A photographing device, characterized in that the photographing device comprises:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

One or more processors, which call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claim 1- The method described in one of 18.
An unmanned aerial vehicle, characterized in that the unmanned aerial vehicle comprises:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.
A mobile terminal, characterized in that the mobile terminal comprises:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.
A handheld PTZ, characterized in that the handheld PTZ includes:

Image acquisition module for obtaining original images;

Storage device for storing program instructions;

One or more processors that call program instructions stored in the storage device, and when the program instructions are executed, the one or more processors are individually or collectively configured to implement claims 1-18 One of the methods described.