CN111684488A

CN111684488A - Image cropping method and device and shooting device

Info

Publication number: CN111684488A
Application number: CN201980009520.XA
Authority: CN
Inventors: 曾辉; 曹子晟; 胡攀
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-05-22
Filing date: 2019-05-22
Publication date: 2020-09-18
Also published as: WO2020232672A1

Abstract

An image cropping method, an image cropping device and a shooting device are provided, and the method comprises the following steps: extracting a plurality of original image information having different distance features (S101); extracting a feature map of the original image information on a distance scale of each original image information (S102); reducing the dimension of the feature map (S103); the feature map after the dimension reduction is subjected to clipping processing (S104). The characteristic information of the original image is accurately and fully obtained by extracting the characteristic information of objects with various sizes, so that the image cutting result is favorably and accurately modeled, and the image cutting method can adapt to more complex application scenes; in addition, dimension reduction is carried out on the feature graph, and then cutting processing is carried out on the feature graph after dimension reduction, so that the performance of image cutting can be improved, the parameter size of image cutting is reduced, the image cutting method is suitable for chips with low power consumption, and the practicability is greatly improved.

Description

Image cropping method and device and shooting device

Technical Field

The present invention relates to the field of image processing, and in particular, to an image cropping method and apparatus, and a shooting apparatus.

Background

In many cases, the visual effect of a shot photo is not ideal due to reasons such as the user's lack of time to compose the photo or lack of knowledge of the composition. To this, accessible later stage is tailor and just carries out the secondary composition to the photo to promote visual effect. At present, a tool kit on the market provides a manual cutting function, however, the manual cutting of photos requires that a user has certain photographic composition knowledge; also, when the number of photographs is large, the workload becomes very large.

In the prior art, a method for automatically cutting a photo also exists, so that the photo can be automatically cut. Among them, early automatic cropping algorithms were primarily based on attention-based design, aiming to capture the most prominent content or region of interest in the photograph. Generally, such algorithms first obtain a saliency map of a photo, and then select the most salient window in the form of a sliding window as a cropping result; part of the algorithm can also introduce human face detection or visual interaction information as an assistant. However, the algorithm designed based on attention mechanism does not consider the overall composition of the photo, and the obtained cropping result is often in a general visual effect.

The aesthetic property-based approach focuses on the aesthetic properties of photographs and the composition rules common to photography and models these properties and rules by manually designing features, and then learns an aesthetic classifier, such as a support vector machine, to select the best result from a large number of cropping options. However, the nature of manual design tends to make it difficult to accurately model the aesthetic attributes of photographs due to its own limitations.

Data-driven algorithms mostly implement automatic cropping of photographs by training an end-to-end deep neural network on labeled data sets. Most of the algorithms directly adopt a standard neural network architecture which is successful in other fields (such as image classification and target detection), the used depth models all contain hundreds of megabytes of parameters, and a very strong GPU and high power consumption are needed to operate, so that low-power-consumption products such as cameras cannot bear load. Further, since the characteristics of the photo cropping and the composition itself are not considered, the cropping results of these methods are not stable enough.

Disclosure of Invention

The invention provides an image cropping method and device and a shooting device.

Specifically, the invention is realized by the following technical scheme:

according to a first aspect of the present invention, there is provided an image cropping method, the method comprising:

extracting a plurality of original image information with different distance characteristics;

extracting a feature map of each original image information on a distance scale of the original image information;

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to a second aspect of the present invention, there is provided an image cropping device, the device comprising:

storage means for storing program instructions;

one or more processors that invoke program instructions stored in the storage device, the one or more processors individually or collectively configured when the program instructions are executed to:

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to a third aspect of the present invention, there is provided a photographing apparatus including:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to a fourth aspect of the invention, there is provided a drone comprising:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

one or more processors that invoke program instructions stored in the storage device, the one or more processors individually or collectively configured to, when the program instructions are executed:

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to a fifth aspect of the present invention, there is provided a mobile terminal comprising:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to a sixth aspect of the present invention, there is provided a handheld tripod head comprising:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

According to the technical scheme provided by the embodiment of the invention, the characteristic graphs of a plurality of original image information with different distance characteristics are extracted, namely the characteristic information of objects with various sizes is extracted, so that the characteristic information of the original images is accurately and fully obtained, the image cutting result is favorably and accurately modeled, and the image cutting method can adapt to more complex application scenes; moreover, dimension reduction is carried out on the feature graph, and then cutting processing is carried out on the feature graph after dimension reduction, so that the image cutting performance can be improved, the parameter size of image cutting is reduced, the image cutting method is suitable for chips with low power consumption, and the practicability is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a flow chart of a method of image cropping according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network structure of a convolutional neural network according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an exemplary network structure of a convolutional neural network according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for clipping a feature map after dimension reduction according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the division of a feature map according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a specific method for performing clipping processing on the feature map after dimension reduction in the embodiment shown in FIG. 5;

FIG. 7 is a block diagram illustrating an exemplary image cropping system according to the present invention;

FIG. 8 is a block diagram of an embodiment of a camera;

fig. 9 is a block diagram of the structure of the drone in an embodiment of the present invention;

fig. 10 is a block diagram of a mobile terminal according to an embodiment of the present invention;

fig. 11 is a block diagram of a handheld cradle head according to an embodiment of the present invention.

Reference numerals: 1: an original image;

10: a convolutional neural network; 11: a first sub-network; 12: a second sub-network; 13: a third sub-network;

20: 1 x 1 convolutional layer;

30: and (4) fully connecting the layers.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The image cropping method, the image cropping device and the image capturing device of the invention are described in detail below with reference to the accompanying drawings. The features of the following examples and embodiments may be combined with each other without conflict.

Fig. 1 is a flowchart of an image cropping method in an embodiment of the present invention, and as shown in fig. 1, the image cropping method in an embodiment of the present invention may include the following steps:

s101: extracting a plurality of original image information with different distance characteristics;

in order to accurately extract feature information of objects with various sizes and improve the accuracy of image cropping, the embodiment of the invention extracts a plurality of original image information with different distance features from the same original image.

For the same original image, different strategies can be used to extract multiple pieces of original image information with different distance features, for example, multiple pieces of original image information with different distance features are extracted based on down-sampling, up-sampling or a combination of down-sampling and up-sampling.

As a feasible implementation, an original image is input into a convolutional neural network trained in advance; and carrying out downsampling processing on the original image by the convolutional neural network to obtain a plurality of pieces of original image information with different distance characteristics. Optionally, three layers of image pyramids are used as input of the convolutional neural network, and the three layers of image pyramids are down-sampled by 2 times layer by layer. In this embodiment, feature extraction is performed on an original image, an image obtained by down-sampling the original image by 2 times, and an image obtained by down-sampling the original image by 4 times, so as to obtain a plurality of pieces of original image information with different distance features, that is, the original image information of this embodiment includes an original image (such as 1 in fig. 2), an image obtained by down-sampling the original image by 2 times, and an image obtained by down-sampling the original image by 4 times. The network structure of the convolutional neural network will be described in detail in the following embodiments.

S102: extracting a feature map of the original image information on the distance scale of each original image information;

different algorithms or strategies may be employed to extract the feature map of the original image information on the distance scale of each original image information, thereby obtaining color features, texture features, shape features, and/or spatial relationship features, etc. of the original image. Optionally, in some embodiments, a feature map of the original image information is extracted at a distance scale of each original image information based on a pre-trained convolutional neural network.

Next, a network structure of a specific convolutional neural network is explained.

As shown in fig. 2, the convolutional neural network 10 may include a plurality of sub-networks, the input of which is the original image information of the corresponding distance feature; each sub-network is used to extract a feature map of the original image information corresponding to the distance features. Wherein the neural network may comprise 2, 3, 4, 5 or other number of sub-networks, in the embodiment shown in fig. 2, the convolutional neural network 10 comprises a first sub-network 11, a second sub-network 12 and a third sub-network 13, wherein the input of the first sub-network 11 is the original image, the input of the second sub-network 12 is the image obtained by down-sampling the original image by 2 times, and the input of the third sub-network 13 is the image obtained by down-sampling the original image by 4 times.

In the prior art, a labeling network architecture (such as VGG16) is adopted, which has multiple downsampling (typically 5 downsampling times and 32 downsampling times) in the feature extraction process, and this results in a great loss of object space information. If the original image size is 256 × 256, 256 × 256 is downsampled 5 times to improve efficiency, a feature map with a spatial resolution of only 8 × 8 is obtained, and the feature map with the resolution of 8 × 8 is cropped, so that the resolution corresponding to the cropping frame is lower. At such small spatial resolutions, most of the information is lost and the clipping results cannot be modeled accurately. And if the down-sampling times are reduced, the characteristic receptive field is reduced, and the large-size object cannot be expressed. In contrast, in the embodiment of the present invention, each sub-network performs down-sampling and then up-sampling on the original image information corresponding to the distance feature, and meanwhile, sufficiently large receptive field and spatial resolution are ensured. As shown in fig. 3, the sub-network of the present embodiment may include a plurality of first network layers connected in sequence and at least one second network layer disposed after the first network layers, the first network layer being used for down-sampling and the second network layer being used for up-sampling; and, the number of the first network layers is greater than the number of the second network layers.

In the sub-networks, the number of the first network layer and the second network layer may be set as required, and optionally, each sub-network includes a plurality of second network layers. In the embodiment shown in fig. 3, each sub-network includes 4 first network layers connected in sequence and 2 second network layers connected in sequence after the 4 first network layers; in other embodiments, each sub-network includes 5 first network layers connected in sequence and 2 second network layers connected in sequence after the 5 first network layers; it is to be understood that the number of the first network layer and the second network in each sub-network may be set to be other, and is not limited to the number in the above-mentioned embodiments.

The first network layer may implement downsampling based on shuffle operation or convolution operation, and the second network layer may implement upsampling based on shuffle operation, deconvolution operation, Bicubic interpolation (Bicubic interpolation), nearest neighbor interpolation (nearest neighbor interpolation), bilinear interpolation (bilinear interpolation), or the like. In one specific implementation, the first network layer includes a convolutional layer and a pooling layer, and the second network layer includes an anti-convolutional layer.

Further, in some embodiments, the weight parameter is shared by multiple subnetworks, i.e., the convolution kernels of the convolutional and pooling layers in each subnetwork are the same. In order to make the distance sizes of the feature maps output by the sub-networks the same, optionally, the step size of the convolutional layer of the first network layer of the sub-network is different from the step size of the convolutional layer of the first network layer of the other sub-networks; optionally, the step size of the pooling layer of the first network layer of the sub-network is different from the step sizes of the pooling layers of the first network layers of the other sub-networks; optionally, the step size of the convolutional layer of the first network layer of the sub-network is different from the step size of the convolutional layer of the first network layer of the other sub-networks, and the step size of the pooling layer of the first network layer of the sub-network is different from the step size of the pooling layer of the first network layer of the other sub-networks. In each sub-network, the step sizes of the convolutional layers and the pooling layers of the other first network layers (not the first network layer) are equal to the step sizes of the convolutional layers and the pooling layers of the network layers at the corresponding positions in the other sub-networks, and in each sub-network, the step size of the deconvolution layer of the second network layer is equal to the step size of the deconvolution layer of the second network layer in the other sub-networks. That is, by adjusting the step size of the convolutional layer and/or the pooling layer of the first network layer in each sub-network, the distance size of the feature map finally output by each sub-network can be ensured to be the same. Furthermore, in each sub-network, the step sizes of the convolutional layers and the pooling layers of other first network layers (not the first network layer) are also equal to the step size of the deconvolution layer of the second network layer.

In the embodiment shown in fig. 3, the step size of the convolutional and pooling layers of the first network layer of the first sub-network 11 is 4, the step size of the convolutional and pooling layers of the first network layer of the second sub-network 12 is 2, and the step size of the convolutional and pooling layers of the first network layer of the third sub-network 13 is 0. In the first sub-network 11, the second sub-network 12, and the third sub-network 13, the step sizes of the convolutional layer and the pooling layer of the other first network layers are all 2, and the step size of the deconvolution layer of the second network layer is also 2.

In some embodiments, the weight parameters of each sub-network are determined by training separately for each sub-network, and the parameters of the sub-networks can be further reduced for sub-networks with smaller distance sizes.

To increase the complexity and depth of the convolutional neural network 10, in some embodiments, the output of any first network layer of a subnetwork may be used as an input to any first network layer and/or second network layer of any other subnetwork. In some embodiments, the output of any second network layer of a subnetwork may be used as an input to any first network layer and/or second network layer of any other subnetwork, wherein the subnetwork. In some embodiments, the output of any first network layer of a subnetwork may be used as an input to any first network layer and/or second network layer of any other subnetwork, and the output of any second network layer of a subnetwork may be used as an input to any first network layer and/or second network layer of any other subnetwork. In the above embodiment, when the output of a certain network layer is used as the input of another certain network layer, the distance size of the feature map output by the certain network layer is equal to the distance size of the feature map input by the other certain network layer. And in concrete implementation, the output of a certain network layer is superposed on the input of another certain network layer in the direction of the characteristic diagram channel.

S103: reducing the dimension of the feature map;

in some classical visual tasks such as object detection, a fully connected layer with many parameters is generally required to obtain an accurate object window. For example, a fully connected layer of 7 x 512 x 4096 is the standard for many object detection networks, and this setting is also used by some image cropping models. However, the parameters of this layer are only up to 392 megabytes, which is completely unbearable for current camera systems. Unlike the tasks of object detection and the like, image cropping does not need to accurately identify every item of content in the original image, so that the channel dimension of the feature map can be reduced to be low without losing performance, thereby greatly reducing the parameters of the full-link layer used for the image cropping process.

As a possible implementation manner, in conjunction with fig. 2 and fig. 3, the convolution layer 20 with the feature map input at 1 × 1 is subjected to dimension reduction processing, so as to reduce the dimension of the feature map. It should be understood that the manner of reducing the dimensions of the feature map is not limited to implementing the dimension reduction process using 1 x 1 convolutional layer 20, and other existing dimension reduction algorithms may be used instead.

S104: and (5) cutting the feature graph after dimension reduction.

Fig. 4 is a flowchart of a specific method for performing clipping processing on a feature map after dimension reduction in an embodiment of the present invention, and as shown in fig. 4, the clipping processing on the feature map after dimension reduction may include:

s401: dividing the feature map along the length and width directions of the feature map after dimension reduction to obtain a plurality of grid areas, wherein each grid area comprises a plurality of pixels;

s402: and cutting the feature map subjected to dimension reduction according to the grid areas and a preset prior condition.

Because human eyes are not very sensitive to pixels, and if pixel-level deviation exists, the human eyes cannot be identified actually, so that the requirement on precision does not need to reach the pixel level, the feature graph after dimension reduction is divided on the basis, only a grid area is used as a minimum unit during clipping, the pixel level is not required to be accurately reached, the data processing amount during image clipping is reduced, and the image clipping performance is further improved.

Optionally, when step S401 is implemented, the feature map after dimension reduction is equally divided along the length direction and the width direction of the feature map after dimension reduction to obtain a plurality of grid regions, that is, the feature map after dimension reduction is equally divided along the length direction of the feature map, and the feature map after dimension reduction is equally divided along the width direction of the feature map. For example, the reduced-dimension feature maps may be equally divided into 16 × 16, 12 × 12 grid regions, and in the embodiment shown in fig. 5, the reduced-dimension feature maps may be equally divided into 8 × 10 grid regions. It can be understood that, when step S401 is implemented, the feature map after dimension reduction may also be divided in a non-equal division manner.

Optionally, inputting the feature map subjected to the dimension reduction into a pre-trained mesh division model, and dividing the feature map by the mesh division model along the length and width directions of the feature map subjected to the dimension reduction respectively to obtain a plurality of mesh areas; it can be understood that the feature map may also be divided along the length and width directions of the feature map after dimension reduction based on a mesh division algorithm, so as to obtain a plurality of mesh regions.

Fig. 6 is an implementation manner of performing clipping processing on the feature map after the dimension reduction according to a plurality of grid regions and a preset prior condition in the embodiment of the present invention, and as shown in fig. 6, an implementation process of performing clipping processing on the feature map after the dimension reduction according to a plurality of grid regions and a preset prior condition may include:

s601: extracting feature information of each grid area by taking the grid area as a minimum feature extraction unit;

optionally, extracting feature information of any pixel, such as a central pixel, in each grid region, and taking the feature information of the central pixel as the feature information of the grid region; optionally, feature information of a part of pixels (at least two pixels) in each grid region is extracted, and the feature information of the grid region is determined based on the extracted feature information of the part of pixels, for example, an average value of the feature information of the part of pixels is taken as the feature information of the grid region.

S602: extracting the feature information of partial grid areas in the plurality of grid areas according to the feature information of the grid areas, wherein the feature information of the partial grid areas at least comprises the feature information of interest areas in the original image;

it should be noted that the region of interest in the original image refers to a target region including a target subject, and an existing target detection algorithm may be selected to determine the target region in the original image.

S603: and determining a partial grid area as a target clipping area.

Based on the cutting mode, cutting results with different resolutions and aspect ratios can be obtained, and different requirements of users are met. Optionally, based on the foregoing clipping manner, a plurality of target clipping regions with different resolutions and square shapes with different aspect ratios are obtained.

The multiple target clipping areas determined in S603 may be further filtered to remove target clipping areas that do not meet the requirement. Optionally, in some embodiments, the process of performing clipping processing on the feature map after the dimension reduction according to the multiple mesh regions and a preset prior condition further includes: and determining the limit areas of two end points of one diagonal line of the square target clipping area in the feature map. The two limiting areas are distributed on two sides of the same diagonal line of the feature map, and the target cutting area is ensured to contain a target main body as much as possible; and, the restricted area includes at least one mesh area. Optionally, the restriction regions include a first restriction region (e.g., M1 in fig. 5) and a second restriction region (e.g., M2 in fig. 5), where the first restriction region is used to restrict the position of the upper left corner of the target clipping region in the feature map, and the second restriction region is used to restrict the position of the lower right corner of the target clipping region in the feature map; optionally, the restriction area includes a third restriction area and a fourth restriction area, the third restriction area is used to restrict a position of an upper right corner of the target clipping area in the feature map, and the second restriction area is used to restrict a position of a lower left corner of the target clipping area in the feature map.

Each of the restricted areas of the present embodiment includes a plurality of mesh areas to obtain a plurality of target clipping areas. In the embodiment shown in fig. 5, a central pixel of any grid region in the first restriction region is taken as a vertex of an upper left corner of the target clipping region, and a central pixel of any grid region in the second restriction region is taken as a vertex of an upper right corner of the target clipping region, so as to obtain a plurality of target clipping regions, such as square regions formed by dotted lines in fig. 5.

In some embodiments, the process of performing clipping processing on the feature map after the dimension reduction according to the multiple mesh regions and the preset prior condition further includes: and the aspect ratio of the target cutting area meets a preset aspect ratio strategy. Optionally, the aspect ratio of the target cropping area is used to indicate that the target cropping area is a rectangular area, and meets the conventional composition requirement, and generally, the aspect ratio of the rectangular target cropping area is 1:3, that is, the preset aspect ratio policy is: the length-width ratio of the target cutting area is 1: 3; optionally, the aspect ratio of the target cropping area is used to indicate that the target cropping area is a square area.

In some embodiments, the process of performing clipping processing on the feature map after the dimension reduction according to the multiple mesh regions and the preset prior condition further includes: and the area ratio of the target clipping area is greater than or equal to a preset ratio threshold, wherein the area ratio of the target clipping area is the ratio of the area of the target clipping area to the area of the characteristic diagram. The preset duty threshold can be set according to the requirement, such as 1/4, that is, the area ratio of the target cropping area is greater than or equal to 1/4, so that the cropping result is more in line with the composition requirement.

In some embodiments, the target clipping region that does not meet the requirement may be removed according to a combination of any two of the above three further filtering strategies or a combination of the above three further filtering strategies.

Referring to fig. 2 and fig. 3 again, in some embodiments, the feature map after dimension reduction is clipped in the full connection layer 30, so as to improve the performance of image clipping.

Furthermore, in some embodiments, the image cropping method may further include: after the feature map after dimension reduction is cut, the feature information of the target cutting area is fed back to the convolutional neural network 10, and based on the feedback, the complexity of the network is improved, and the processing result of the whole network can be optimized. Specifically, the feature information of the target clipping region is used as an input of the convolutional neural network 10.

The image clipping method provided by the embodiment of the invention extracts the feature maps of a plurality of original image information with different distance features, namely extracts the feature information of objects with various sizes so as to accurately and fully obtain the feature information of the original image, and is beneficial to accurately modeling an image clipping result, so that the image clipping method can adapt to more complex application scenes; moreover, dimension reduction is carried out on the feature graph, then the feature graph after dimension reduction is cut, the image cutting performance can be improved, the parameter size of image cutting is reduced, the image cutting method is suitable for chips with low power consumption, through verification, the network structure of the embodiment of the invention only needs parameters less than 10 million, the performance equivalent to that of a large network such as vgg16 of hundreds of million can be obtained, and the practicability is greatly improved.

Corresponding to the image cropping method of the above embodiment, an embodiment of the present invention further provides an image cropping device, and referring to fig. 7, the image cropping device 100 includes: a storage device 110 and one or more processors 120.

Wherein, the storage device 110 is used for storing program instructions; one or more processors 120 invoking program instructions stored in the storage 110, the one or more processors 120, when executed, being individually or collectively configured to: extracting a plurality of original image information with different distance characteristics; extracting a feature map of the original image information on the distance scale of each original image information; reducing the dimension of the feature map; and (5) cutting the feature graph after dimension reduction.

The processor 120 may implement the image processing method according to the embodiments of the present invention shown in fig. 1, fig. 4 and fig. 6, and the image cropping device 100 of the present embodiment is described with reference to the image cropping method of the above embodiments.

It should be noted that the image cropping device 100 of the present embodiment may be a device with image processing capability, such as a computer, or may also be a shooting device with a shooting function, such as a camera, a video camera, a smart phone, an intelligent terminal, a shooting stabilizer, an unmanned aerial vehicle, and so on.

Corresponding to the image cropping method of the above embodiment, an embodiment of the present invention further provides a shooting apparatus, and referring to fig. 8, the shooting apparatus 200 includes: an image acquisition module 210, a storage device 220, and one or more processors 230.

The image acquisition module 210 is configured to obtain an original image; storage 220 for storing program instructions; one or more processors 230 invoking program instructions stored in the storage 220, the one or more processors 230 being individually or collectively configured, when the program instructions are executed, to: extracting a plurality of original image information with different distance characteristics; extracting a feature map of the original image information on the distance scale of each original image information; reducing the dimension of the feature map; and (5) cutting the feature graph after dimension reduction.

Optionally, the image capturing module 210 includes a lens and an imaging sensor, such as a CCD, a CMOS, or other image sensor, which is matched with the lens.

The processor 230 may implement the image processing method according to the embodiments of the present invention shown in fig. 1, fig. 4 and fig. 6, and the image capturing apparatus 200 of the present embodiment is described with reference to the image cropping method of the above embodiments.

This shoot device 200 can be for having the camera of the function of making a video recording, the camera, the smart mobile phone, and intelligent terminal shoots the stabilizer (like handheld cloud platform), unmanned vehicles (like unmanned aerial vehicle) and so on.

An embodiment of the present invention provides an unmanned aerial vehicle, referring to fig. 9, where the unmanned aerial vehicle 300 includes: an image acquisition module 310, a storage 320, and one or more processors 330.

The image acquisition module 310 is configured to obtain an original image; storage 320 for storing program instructions; one or more processors 330 invoking program instructions stored in the storage 320, the one or more processors 330, when executed, being individually or collectively configured to: extracting a plurality of original image information with different distance characteristics; extracting a feature map of each original image information on a distance scale of the original image information; reducing the dimension of the feature map; and cutting the feature graph after dimension reduction.

The image capturing module 310 of this embodiment may be a camera, or may be a structure formed by combining a lens and an imaging sensor (such as a CCD, a CMOS, or the like) and having a shooting function.

The processor 330 may implement the image processing method according to the embodiments shown in fig. 1, fig. 4, and fig. 6 of the present invention, and the unmanned aerial vehicle 300 of the present embodiment is described with reference to the image cropping method of the above embodiments.

It should be noted that the unmanned aerial vehicle 300 according to the embodiment of the present invention is an aerial photography unmanned aerial vehicle, and other unmanned aerial vehicles without a camera function do not belong to the protection subject of the embodiment.

The unmanned aerial vehicle 300 may be a multi-rotor unmanned aerial vehicle or a fixed-wing unmanned aerial vehicle, and the type of the unmanned aerial vehicle 300 is not particularly limited in the embodiment of the present invention.

Further, the image capturing module 310 may be mounted on the body (not shown) through a pan-tilt (not shown), and the pan-tilt is used to stabilize the image capturing module 310, where the pan-tilt may be a two-axis pan-tilt or a three-axis pan-tilt, and the embodiment of the present invention is not limited to this.

An embodiment of the present invention further provides a mobile terminal, and referring to fig. 10, the mobile terminal 400 includes: an image acquisition module 410, a storage device 420, and one or more processors 430.

The image acquisition module 410 is configured to obtain an original image; storage 420 for storing program instructions; one or more processors 430 invoking program instructions stored in the storage 420, the one or more processors 430 being individually or collectively configured to, when executed: extracting a plurality of original image information with different distance characteristics; extracting a feature map of each original image information on a distance scale of the original image information; reducing the dimension of the feature map; and cutting the feature graph after dimension reduction.

The image capturing module 410 of the present embodiment is a camera of the mobile terminal 400.

The mobile terminal 400 may be an intelligent mobile terminal such as a mobile phone or a tablet computer.

The processor 430 may implement the image processing method according to the embodiments of the present invention shown in fig. 1, fig. 4, and fig. 6, and the mobile terminal 400 of the present embodiment will be described with reference to the image cropping method of the above-mentioned embodiments.

An embodiment of the present invention further provides a handheld cradle head, referring to fig. 11, where the handheld cradle head 500 includes: an image acquisition module 510, a storage device 520, and one or more processors 530.

The image acquisition module 510 is configured to obtain an original image; storage 520 to store program instructions; one or more processors 530 invoking program instructions stored in the storage 520, the one or more processors 530 being individually or collectively configured when the program instructions are executed to: extracting a plurality of original image information with different distance characteristics; extracting a feature map of each original image information on a distance scale of the original image information; reducing the dimension of the feature map; and cutting the feature graph after dimension reduction.

The image capturing module 510 of this embodiment may be a camera, or may be a structure formed by combining a lens and an imaging sensor (such as a CCD, a CMOS, or the like) and having a shooting function.

The processor 530 may implement the image processing method according to the embodiments shown in fig. 1, fig. 4, and fig. 6 of the present invention, and the handheld pan/tilt head 500 of the present embodiment is described with reference to the image cropping method of the above-mentioned embodiments.

It should be noted that the handheld pan/tilt head 500 according to the embodiment of the present invention refers to a pan/tilt head with a camera function, and other pan/tilt heads without a camera function do not belong to the protection main body of the embodiment.

The storage device may include a volatile memory (volatile memory), such as a random-access memory (RAM); the storage device may also include a non-volatile memory (non-volatile), such as a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); the storage 110 may also comprise a combination of memories of the kind described above.

It should be understood that, in the embodiment of the present invention, the processor may be a Central Processing Unit (CPU). The processor may also be other general purpose processors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor 7 may be any conventional processor or the like.

Furthermore, an embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the image cropping method of the above-described embodiment. Specifically, the program realizes the following steps when being executed by a processor: extracting a plurality of original image information with different distance characteristics; extracting a feature map of the original image information on the distance scale of each original image information; reducing the dimension of the feature map; and (5) cutting the feature graph after dimension reduction.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of the cradle head according to any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the cradle head, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the pan/tilt head. The computer-readable storage medium is used for storing the computer program and other programs and data required by the head, and may also be used for temporarily storing data that has been output or is to be output.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is intended to be illustrative of only some embodiments of the invention, and is not intended to limit the scope of the invention.

Claims

1. An image cropping method, characterized in that it comprises:

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

2. The method according to claim 1, wherein said extracting a feature map of each original image information on a distance scale of the original image information comprises:

and extracting a feature map of each original image information on the distance scale of the original image information based on a pre-trained convolutional neural network.

3. The method of claim 2, wherein the convolutional neural network comprises a plurality of sub-networks, the input of which is the original image information of the corresponding distance features;

each sub-network is used to extract a feature map of the original image information corresponding to the distance features.

4. The method according to claim 3, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer provided after the first network layers, the first network layers being used for down-sampling and the second network layers being used for up-sampling;

and the number of the first network layers is greater than the number of the second network layers.

5. The method of claim 4, wherein the output of any first network layer of the sub-network serves as the input of any first network layer and/or second network layer of any other sub-network; and/or

The output of any second network layer of the sub-network serves as the input of any first network layer and/or second network layer of any other sub-network.

6. The method of claim 4, wherein the sub-network comprises a plurality of the second network layers.

7. The method of claim 4, wherein the first network layer comprises a convolutional layer and a pooling layer, and wherein the second network layer comprises an anti-convolutional layer.

8. The method of claim 7, wherein a plurality of said sub-networks share a weight parameter, wherein the step size of the convolutional layer of the first network layer of said sub-network is different from the step size of the convolutional layer of the first network layer of other sub-networks, and/or wherein the step size of the pooling layer of the first network layer of said sub-network is different from the step size of the pooling layer of the first network layer of other sub-networks, such that the distance sizes of the signatures output from a plurality of said sub-networks are the same.

9. The method according to claim 2, wherein after performing the clipping process on the feature map after the dimension reduction, the method further comprises:

and feeding back the characteristic information of the target clipping area to the convolutional neural network.

10. The method according to claim 1 or 2, wherein the extracting a plurality of original image information having different distance features comprises:

inputting an original image into a pre-trained convolutional neural network;

and carrying out downsampling processing on the original image by the convolutional neural network to obtain a plurality of pieces of original image information with different distance characteristics.

11. The method according to claim 1, wherein the clipping the feature map after the dimension reduction includes:

dividing the feature map after dimension reduction along the length direction and the width direction of the feature map respectively to obtain a plurality of grid areas, wherein each grid area comprises a plurality of pixels;

and cutting the feature map after dimension reduction according to the grid areas and a preset prior condition.

12. The method according to claim 11, wherein the dividing the feature map along the length and width directions of the feature map after the dimension reduction to obtain a plurality of grid regions comprises:

and equally dividing the feature map respectively along the length direction and the width direction of the feature map after dimension reduction to obtain a plurality of grid areas.

13. The method according to claim 11, wherein the clipping processing on the feature map after the dimensionality reduction according to the plurality of grid regions and a preset prior condition includes:

extracting feature information of each grid area by taking the grid area as a minimum feature extraction unit;

extracting the feature information of partial grid areas in the plurality of grid areas according to the feature information of the grid areas, wherein the feature information of the partial grid areas at least comprises the feature information of interest areas in the original image;

and determining the partial grid area as a target cutting area.

14. The method according to claim 13, wherein the clipping processing is performed on the feature map after the dimension reduction according to the plurality of grid regions and a preset prior condition, and further includes at least one of:

determining a limiting area of two end points of one diagonal line of a square target clipping area in the feature map, wherein the two limiting areas are distributed on two sides of the same diagonal line of the feature map, and the limiting area comprises at least one grid area;

the length-width ratio of the target cutting area meets a preset length-width ratio strategy;

the area occupation ratio of the target cutting area is larger than or equal to a preset occupation ratio threshold, wherein the area occupation ratio of the target cutting area is the ratio of the area of the target cutting area to the area of the feature map.

15. The method according to claim 14, wherein the restriction region comprises a first restriction region and a second restriction region, the first restriction region is used for restricting the position of the upper left corner of the target clipping region in the feature map, and the second restriction region is used for restricting the position of the lower right corner of the target clipping region in the feature map.

16. The method of claim 14, wherein the aspect ratio of the target crop area satisfies a preset aspect ratio policy, comprising:

the aspect ratio of the target clipping area is used for indicating that the target clipping area is a rectangular area.

17. The method according to claim 1 or 2, wherein the feature map after dimension reduction is subjected to clipping processing in a full connection layer.

18. The method of claim 1 or 2, wherein the reducing the dimensionality of the feature map comprises:

and inputting the characteristic diagram into the convolution layer of 1 x 1 to perform dimensionality reduction processing.

19. An image cropping device, characterized in that it comprises:

storage means for storing program instructions;

reducing the dimension of the feature map;

and cutting the feature graph after dimension reduction.

20. The apparatus of claim 19, wherein the one or more processors are further configured, individually or collectively, to:

21. The apparatus of claim 20, wherein the convolutional neural network comprises a plurality of sub-networks, the input of which is the original image information of the corresponding distance features;

22. The apparatus of claim 21, wherein the sub-network comprises a plurality of first network layers connected in sequence and at least one second network layer disposed after the first network layers, wherein the first network layers are used for down-sampling and the second network layers are used for up-sampling;

23. The apparatus of claim 22, wherein the output of any first network layer of the sub-network serves as the input of any first network layer and/or second network layer of any other sub-network; and/or

24. The apparatus of claim 22, wherein the subnetwork comprises a plurality of the second network layers.

25. The apparatus of claim 22, wherein the first network layer comprises a convolutional layer and a pooling layer, and wherein the second network layer comprises an anti-convolutional layer.

26. The apparatus of claim 25, wherein a plurality of the sub-networks share a weight parameter, wherein the step size of the convolutional layer of the first network layer of the sub-network is different from the step size of the convolutional layer of the first network layer of other sub-networks, and/or wherein the step size of the pooling layer of the first network layer of the sub-network is different from the step size of the pooling layer of the first network layer of other sub-networks, such that the distance sizes of the feature maps output by the plurality of sub-networks are the same.

27. The apparatus of claim 20, wherein after the clipping process on the reduced-dimension feature map, the one or more processors are further configured, individually or collectively, to:

28. The apparatus of claim 19 or 20, wherein the one or more processors are further configured, individually or collectively, to:

inputting an original image into a pre-trained convolutional neural network;

29. The apparatus of claim 19, wherein the one or more processors are further configured, individually or collectively, to:

30. The apparatus of claim 29, wherein the one or more processors are further configured, individually or collectively, to:

31. The apparatus of claim 29, wherein the one or more processors are further configured, individually or collectively, to:

and determining the partial grid area as a target cutting area.

32. The apparatus of claim 31, wherein the one or more processors are further configured, individually or collectively, to perform at least one of:

33. The apparatus according to claim 32, wherein the restriction region comprises a first restriction region and a second restriction region, the first restriction region is configured to restrict a position of an upper left corner of the target clipping region in the feature map, and the second restriction region is configured to restrict a position of a lower right corner of the target clipping region in the feature map.

34. The apparatus of claim 32, wherein the aspect ratio of the target clipping region satisfies a preset aspect ratio policy, comprising:

35. The apparatus of claim 19 or 20, wherein the one or more processors are further configured, individually or collectively, to: and cutting the feature graph subjected to dimension reduction in the full connection layer.

36. The apparatus of claim 19 or 20, wherein the one or more processors are further configured, individually or collectively, to:

37. A photographing apparatus, characterized by comprising:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

one or more processors that invoke program instructions stored in the storage device, the one or more processors individually or collectively configured to implement the method of any of claims 1-18 when the program instructions are executed.

38. A drone, characterized in that it comprises:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

39. A mobile terminal, characterized in that the mobile terminal comprises:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;

40. A handheld cloud platform, its characterized in that, handheld cloud platform includes:

the image acquisition module is used for acquiring an original image;

storage means for storing program instructions;