CN110349082B

CN110349082B - Image area clipping method and device, storage medium and electronic device

Info

Publication number: CN110349082B
Application number: CN201910584273.8A
Authority: CN
Inventors: 高洵; 沈招益; 刘军煜; 吴韬
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2023-08-29
Anticipated expiration: 2039-06-28
Also published as: CN110349082A

Abstract

The invention discloses a cutting method and device of an image area, a storage medium and an electronic device. Wherein the method comprises the following steps: acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set; acquiring second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set; cutting out a cut image of a target size from the image to be cut based on the second center point when the first center point is invalid; and when the first center point is effective, clipping the clipping image of the target size from the image to be clipped based on the first center point and the second center point.

Description

Image area clipping method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of computers, and in particular, to a method and apparatus for clipping an image area, a storage medium, and an electronic apparatus.

Background

Currently, in some scenes, a picture (image) needs to be cut to obtain a picture main body. For a picture aiming at a target type object, when picture clipping is carried out, the clipped picture needs to contain the target type object as far as possible.

For example, for images containing specific persons, face detection may be utilized for picture body cropping. Faces contained in the image can be detected by face detection techniques, such as MTCNN (Multitask Cascaded Convolutional Networks, multitasking convolutional neural network), faceboxes, etc., and the picture body position is located based on the detected faces.

However, for a picture that does not contain a face or a picture that fails to detect a face, the picture body cannot be further analyzed. Moreover, only the human face area is analyzed, the offset influence of the human body area on the center of the picture main body is ignored, and the cut picture has larger deviation from the picture main body.

As can be seen, the image cropping method in the related art has poor applicability, and the cropping image and the image main body are liable to deviate.

Disclosure of Invention

The embodiment of the invention provides a cutting method and device of an image area, a storage medium and an electronic device, which at least solve the technical problems of poor applicability and easy deviation between a cut picture and a picture main body in a mode of cutting images in related technologies.

According to an aspect of an embodiment of the present invention, there is provided a cropping method of an image area, including: acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set; acquiring second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set; cutting out a cut image of a target size from the image to be cut based on the second center point under the condition that the first center point is an invalid point; and cutting out a cut image of the target size from the image to be cut based on the first center point and the second center point under the condition that the first center point is an effective point.

According to another aspect of the embodiment of the present invention, there is also provided a cropping device for an image area, including: the first acquisition unit is used for acquiring a first center point of a first target area in a first area set corresponding to the image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set; the second acquisition unit is used for acquiring second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set; the first clipping unit is used for clipping a clipping image with a target size from the image to be clipped based on the second center point under the condition that the first center point is an invalid point; and the second clipping unit is used for clipping the clipping image with the target size from the image to be clipped based on the first center point and the second center point under the condition that the first center point is the effective point.

According to a further aspect of embodiments of the present invention, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above method when run.

According to still another aspect of the embodiments of the present application, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method by the computer program.

In the embodiment of the application, the input picture is subjected to main cutting by comprehensively utilizing the object detection and picture saliency main detection technology, a first center point corresponding to the object detection and a second center point corresponding to the saliency detection in the image to be cut are respectively determined, when the first center point is invalid, the image cutting is performed based on the second center point, when the first center point is valid, the image cutting is performed based on the first center point and the second center point, so that the method is applicable to different scenes, meanwhile, the relation between the object and other objects is considered, the most reasonable cutting image is obtained by utilizing a refinement strategy, the applicability of an image cutting mode is ensured, the rationality of the image cutting is improved, and the problems that the image cutting mode in the related technology is poor in applicability and deviation is easy to occur between the cutting picture and the picture main body are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of a clipping method for image regions according to an embodiment of the present invention;

FIG. 2 is a flow chart of an alternative image region cropping method according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an alternative client display interface in accordance with an embodiment of the invention;

FIG. 4 is a schematic illustration of an alternative image region cropping method according to an embodiment of the invention;

FIG. 5 is a schematic illustration of another alternative image region cropping method according to an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative training object detection model according to an embodiment of the invention;

FIG. 7 is a schematic illustration of an alternative target detection model prediction in accordance with an embodiment of the invention;

FIG. 8 is a schematic diagram of yet another alternative image region cropping method according to an embodiment of the invention;

FIG. 9 is a flow chart of another alternative image region cropping method according to an embodiment of the invention;

FIG. 10 is a schematic view of an alternative image area cropping device according to an embodiment of the invention;

fig. 11 is a schematic structural view of an alternative electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Some terms used in the embodiments of the present invention are explained as follows:

(1) Saliency detection, namely extracting a salient region (namely a region of human interest) in an image through an intelligent algorithm;

(2) RAS: reverse Attention for Salient Object Detection, introducing a significance detection model of the attention mechanism;

significance signature: marking a binary image of a significant region and an insignificant region of the picture;

faceboxes: a real-time face detection model on a CPU (Central Processing Unit );

face feature map: and marking a binary image of the face area and the non-face area of the picture.

According to an aspect of an embodiment of the present invention, there is provided a clipping method of an image area. Alternatively, the above clipping method of the image area may be applied, but not limited to, in an application environment as shown in fig. 1. As shown in fig. 1, the terminal device 102 sends the initial image to the server 106 via the network 104 through a client of a target application, which may include, but is not limited to: instant messaging applications, picture processing applications, etc. The initial image is preprocessed by the server 106 to obtain the image to be cropped.

For an image to be cropped, the server 106 may perform the steps of: acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set; acquiring second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set; cutting out a cut image of a target size from the image to be cut based on the second center point under the condition that the first center point is an invalid point; and cutting out a cut image of the target size from the image to be cut based on the first center point and the second center point under the condition that the first center point is an effective point.

After obtaining the cropped image, server 106 may send the cropped image to terminal device 102 via network 104, and terminal device 102 may display and/or save the cropped image.

Alternatively, in the present embodiment, the terminal device 102 may be a terminal device configured with the client, and may include, but is not limited to, at least one of the following: a mobile phone (e.g., android mobile phone, iOS mobile phone, etc.), a notebook computer, a tablet computer, a palm computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, a desktop computer, etc. The network may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, WIFI, and other networks that enable wireless communications. The server may be a single server or a server cluster composed of a plurality of servers. The above is merely an example, and the present embodiment is not limited thereto.

Alternatively, in this embodiment, as an optional implementation manner, the method may be performed by a server, may be performed by a terminal device, or may be performed by the server and the terminal device together, and in this embodiment, the description is given by way of example by the server. As shown in fig. 2, the above clipping method for an image area may include:

Step S202, a first center point of a first target area in a first area set corresponding to an image to be cut is obtained, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set;

step S204, obtaining second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set;

step S206, cutting out a cut image with a target size from the image to be cut based on the second center point under the condition that the first center point is an invalid point;

in step S208, in the case that the first center point is the effective point, a clipping image of the target size is clipped from the image to be clipped based on the first center point and the second center point.

Optionally, the method for clipping the image area may be applied to other scenes needing to be clipped by applying the method for clipping the image area, but not limited to the process of converting a horizontal screen of an image into a vertical screen and the process of converting a picture size.

For example, the above-described clipping method for image areas may be applied to a video cover map with a landscape screen turned to a portrait screen. At present, most of video covers still have horizontal screen patterns, and along with the rising of vertical video, the cover images also need to adapt to vertical screen grids, at this time, one image simultaneously adapts to the requirements of horizontal screen and vertical screen, and main body clipping needs to be performed on the picture content. For another example, in social products, there is typically a fixed size for a graphic thumbnail, and the user's uploaded picture is typically at a different scale than the fixed size, at which point the thumbnail can only display a portion of the picture. By selecting the picture main body, the picture main content can be displayed in the thumbnail more effectively by the picture text information.

According to the embodiment, the main body cutting is carried out on the input picture by comprehensively utilizing the object detection and picture saliency main body detection technology, a first center point corresponding to the object detection and a second center point corresponding to the saliency detection in the image to be cut are respectively determined, when the first center point is invalid, the image cutting is carried out based on the second center point, when the first center point is valid, the image cutting is carried out based on the first center point and the second center point, so that the problems that the image cutting mode in the related technology has poor applicability and deviation is easy to occur between the cut picture and the picture main body are solved, the applicability of the image cutting mode is ensured, and the rationality of the image cutting is improved.

The clipping method of the image area will be described with reference to fig. 2.

Prior to step S202, an image to be cropped may be acquired. The image to be cut can be an initial image; the initial image may be processed as an image to be cropped. The image to be cropped may also be an image obtained after preprocessing an initial image, where the preprocessing may include, but is not limited to: scaling.

Alternatively, in this embodiment, before acquiring the first center point of the first target area in the first area set corresponding to the image to be cropped, a specification parameter corresponding to the initial image may be acquired, where the specification parameter is used to represent parameter information required for cropping the image from the initial image; and under the condition that the specification parameters comprise scaling, scaling the initial image according to the scaling to obtain the image to be cut.

The initial image may be an image uploaded by the user through the client. While uploading the initial image, a size specification (specification parameter) of a cropping image (cropping image) may be set, for example, whether the length and width specified by the cropping image is to be scaled fixedly for the image and then cropped to be the length and width, or the cropping image is strictly performed according to the fixed length and width of the input size. The server may make a conditional determination of the input specification as to whether it is a valid input. The invalid input includes at least one of: the input size is not positive, a fixed length and width cutting graph is set, and the input size is larger than the length and width of the picture.

For example, on the client interface shown in fig. 3, the user may select an initial image and input a scale and a size at the time of cropping, and the image to be cropped is first determined by the server based on the scale.

According to the embodiment, the initial image is scaled, so that different cutting requirements can be met, and the applicability of an image cutting mode is improved.

In step S202, a first center point of a first target region in a first region set corresponding to an image to be cropped is obtained, where each first region includes an object belonging to a target type, and the first target region is a largest region in the first region set.

For an image to be cropped, a center point (first center point) of a largest region (first target region) in a first region set containing an object belonging to a target type corresponding thereto may be acquired. The object of the target type may also be different for different images, whose image subjects are different. For example, for an image containing a person, the object of the target type may be a human face, or other part of the human body. For images containing animals, the object of the target type may be the animal's face, or other parts of the animal.

The first set of regions may be determined prior to acquiring the first center point. Object detection (e.g., face detection) of a target type may be performed on the image to be cropped to obtain a target object feature map (e.g., face feature map), where the target object feature map may be used to mark a first region and a non-first region on the image to be cropped that includes the target type object, and the target object feature map may be a binary map. From the target object feature map, one or more target object regions may be determined.

For example, the object of the target type may be a human face, and the image to be cut is subjected to human face detection to obtain a human face feature binary image corresponding to the image to be cut. Based on the face feature binary image, a boundary region of a valued region (pixel values in the region are all 1, representing a face) having an area larger than a predetermined area value can be regarded as a face region.

It should be noted that, for an image (a plurality of faces) including a plurality of objects of a target type, there may be a plurality of first areas; objects of the target type may overlap, with multiple objects of the target type overlapping within one target object region.

After obtaining the one or more target object regions, the first region set may be determined according to the one or more target object regions, and the first target region may be determined.

As an alternative implementation manner, each target object area may be taken as a first area, a first area set is obtained, and an area with the largest area in the first area set is taken as a first target area.

As another alternative embodiment, in the case that there are a plurality of target object areas, the target object areas with similar distances may be connected to obtain one or more first connection areas, and each first connection area may be used as a first area to obtain a first area set.

There are various ways of communicating the target object areas with similar distances. For example, the target object feature map may be expanded, and the target object regions having intersections after the expansion processing may be connected to obtain one or more first connection regions.

Optionally, in this embodiment, before acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, performing target object detection on the image to be cut, to acquire a target object area, where the target object detection is used to detect an object of a target type included in the image to be cut; when the target object areas are multiple, performing expansion processing on the multiple target object areas by using the first expansion coefficients to obtain multiple first expansion areas; merging the first expansion areas with the intersection among the plurality of first expansion areas to obtain a first area set; a first target region is determined from the first set of regions.

Each target object region may be subjected to expansion processing using the first expansion coefficient, respectively. The expansion process may be: and expanding the target object area outwards by a long expansion value and a wide expansion value corresponding to the first expansion coefficient. The first expansion coefficient may be a fixed value or a relative value. The length expansion value and the width expansion value corresponding to the first expansion coefficient may be the same or different. The first coefficient of expansion may be: the ratio of the size of the cut image to the size of the target object region (area ratio, length ratio, width ratio, etc.) may be a predetermined value positively correlated with the size of the target object region (different area regions correspond to different predetermined values), or may be a fixed value set in advance.

For example, the size of the cropped image is: 60 x 80, the size of the target object region is: 20 x 20, the first expansion coefficient is: the ratio of the area of the clipping image to the area of the target object area is the expansion value of each side of the target object area is: (60×80)/(20×20) =12, and the sides of the target object region are respectively extended by 12 pixel values (each side may also be respectively extended by 6 pixel values). The first expansion coefficient is: and cutting out the ratio of the edge of the image to the corresponding edge of the target object area, wherein the long expansion value of the target object area is as follows: 60/20=3, the length of the target object region extends outwards by 3 pixel values, the wide expansion value of the target object region is: 80/20=4, extending the width outward by 4 pixel values.

For another example, the size of the target object region is: 20 x 20, the first expansion coefficient is (the first parameter is the size range of the target object, the second parameter is the corresponding expansion value): (0, 100, 5), (100, 400), 10, (400, 800, 15, (800, clipping image size), 20. The size of the target object area is 400, the corresponding expansion coefficient is 10, the edge of the target object area is respectively extended outwards by 10 pixel values.

For another example, the size of the target object region is: 20 x 40, the first expansion coefficient is a fixed value: 10, the edges of the target object region are respectively extended outward by 10 pixel values. The first expansion coefficient is a fixed value: 5, 10, the length of the target object area is extended outwards by 5 pixel values, and the width is extended outwards by 10 pixel values. The first expansion coefficient is a fixed value: 5,5 x (width/length), the length of the target object region is extended outward by 5 pixel values, and the width is extended outward by 10 pixel values.

After the expansion processing is performed on each of the plurality of target object regions, a plurality of first expansion regions can be obtained. And merging the first expansion areas (the first expansion areas are communicated) with each other in the plurality of first expansion areas to obtain a first area set, and determining the area with the largest area in the first area set as a first target area.

For example, as shown in fig. 4 and 5, for the determined 8 face regions, expansion processing is performed on each target object region (as shown in the left half of fig. 4) to obtain 8 first expansion regions (as shown in the right half of fig. 4), the first expansion regions having intersections among the 8 first expansion regions are combined to obtain 4 first regions as shown in fig. 5, and the largest first region is determined as the first target region.

According to the embodiment, the expansion processing is performed on the plurality of target object areas obtained by performing target object detection on the image to be cut, and the largest communication area is selected as the first target area, so that the target object areas which are not overlapped and have a relatively close distance can be communicated, the cut image contains as many target type objects as possible, and the accuracy of image cutting is improved.

After the first target area is determined, a center point of the first target area (first center point) may be determined from the first target area.

The manner in which the first center point is determined is described below in connection with alternative examples. In this example, the object of the target type is a human face.

And carrying out face detection on the image to be cut. The face detection can use FaceBoxes algorithm, so that the algorithm can realize real-time face detection while the effect is met on the CPU. The FaceBoxes algorithm consists of fast-digested convolutional layers and multi-scale convolutional layers. The former pays attention to efficiency, and ensures that FaceBoxes can realize real-time detection on a CPU; the latter pays attention to the effect, and aims to output rich receptive fields by using different layers and process faces with different size scales. Using this algorithm, each map can be averaged to detect faces within 150ms on a single core CPU.

If a face is detected (the face feature binary image is effective), the feature image is inflated, and then the center point of the maximum connected area is obtained. The expansion processing is to process several close faces to form a whole communication area, and the center point obtained after the processing is the center point obtained by considering the several faces. The center of the graph cutting area is determined based on the center point, so that all faces can be displayed as much as possible.

In step S204, a second center point of a second target area in a second area set corresponding to the image to be cut is obtained, where each second area includes a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located, or is a maximum area in the second area set.

For an image to be cropped, a second center point of a second target region in a second set of regions corresponding thereto may be acquired.

The second set of regions may first be determined before the second center point of the second target region is acquired. The saliency detection may be performed on the image to be cropped, resulting in a saliency feature map, which may be used to mark non-saliency areas (non-saliency areas) of the saliency areas (salient areas) on the image to be cropped, and the saliency feature map may be a binary map. From the saliency map, one or more salient regions may be determined.

For example, saliency detection is performed on an image to be cut, and a saliency feature binary image corresponding to the image to be cut is obtained. Based on the saliency feature binary image, a boundary region of a valued region (pixel point values in the region are all 1, representing a face) having an area larger than a predetermined area value can be regarded as one saliency region.

When the saliency detection is carried out, the saliency detection model or a submodel of the saliency detection model can be used for carrying out the saliency detection on the image to be cut.

Optionally, in this embodiment, before acquiring the second center point of the second target area in the second area set corresponding to the image to be cropped, the image to be cropped may be input into a target detection model, to acquire a side output saliency map output by a target side output layer of the target detection model, where the target detection model is used for performing saliency detection on the image to be cropped, and the target detection model includes a second number of side output layers, where the target side output layer is a first number of side output layers before the target detection model, and the first number is smaller than the second number; according to the side output saliency map, obtaining a saliency feature map corresponding to the image to be cut; the salient region is determined from the salient feature map.

In order to ensure the accuracy and timeliness of the saliency detection, a target detection model (a saliency detection model for carrying out the saliency detection on an image to be cut) can be used for model training during model training, and model parameters are determined to ensure the accuracy of the saliency detection; when the model is used, the submodel of the target detection model is used for carrying out the saliency detection of the image to be cut, so as to ensure the timeliness of the saliency detection.

The target detection model may be a convolutional neural network (target neural network) comprising a plurality of convolutional layers, which may be divided into a plurality of stages, the last convolutional layer of each stage having a side output layer (side output layer). One result is output after convolutional neural network and each stage side output layer, called global saliency map (global significance map). Since global saliency map is only 1/N (N may be a positive integer greater than or equal to 2, for example, 32) of the input picture (image to be cropped), the residual feature can be learned at each side output to gradually increase the resolution. Residual learning places points of interest in some detail of the undetermined salient region by erasing the currently predicted salient region from the side output features so that a highly accurate, high resolution salient feature map can be obtained using this network.

When the target detection model is used for prediction, if the original network is continuously used, a saliency characteristic diagram with rich details can be obtained, but the detection time is very long. In order to balance efficiency, the original target neural network can be simplified to be used as a detection network, only the side output saliency maps of the front first number of side output layers are used, the accuracy of a part of saliency characteristic edge details is sacrificed, the effect and the efficiency are balanced, the detection time is shortened, and the saliency detection can be ensured to be carried out on a CPU.

According to the side output saliency maps of the side output layers of the first number, the saliency feature maps corresponding to the images to be cut are obtained, and the saliency areas are determined from the saliency feature maps.

According to the embodiment, the side output saliency map of the first number of side output layers of the target detection model is used when the saliency detection is carried out, and the saliency feature map is obtained according to the side output saliency map, so that the accuracy and the timeliness of the saliency detection can be considered.

The manner in which the significance is detected is described below in connection with alternative examples. In this example, the target detection model used for saliency detection may be: RAS model.

As shown in fig. 6, the original RAS model is used in model training. The original RAS structure is based on VGG16, and the network model adds one side output layer to the last convolution layers (conv1_2, conv2_2, conv3_3, conv4_3, conv5_3) of each stage of VGG16 (the first column only shows the output s-out of each stage, steps 1,2,4,8, 16 respectively, as shown in FIG. 6).

A result is output through the VGG network and stages side output layer, referred to as global saliency map. Since global saliency map is only 1/32 of the input picture, the residual features can be learned at each side output to step up the resolution. Residual learning places points of interest in some detail of the significant region that was not determined by erasing the current predicted significant region from the side output features. The network can be used for obtaining the high-accuracy high-resolution saliency characteristic map.

As shown in fig. 7, in the prediction, a simplified RAS model is used, and only the first three sides are used, so that the accuracy of a part of the salient feature edge details is sacrificed, the effect and the efficiency are balanced, and the salient detection can be ensured to be detected on the CPU.

After obtaining the one or more salient regions, a second set of regions may be determined based on the one or more salient regions, thereby determining a second target region.

As an alternative embodiment, each salient region may be taken as a second region, resulting in a second set of regions.

As another alternative embodiment, in the case that there are a plurality of saliency areas, the saliency areas with similar distances may be connected to obtain one or more second connected areas, and each second connected area may be used as a second area to obtain a second area set.

There are various ways of communicating the target object areas with similar distances. For example, the saliency feature map may be expanded, and the saliency regions having intersections after the expansion process may be connected to obtain one or more second connected regions.

Optionally, in this embodiment, before acquiring the second center point of the second target area in the second area set corresponding to the image to be cropped, saliency detection may be performed on the image to be cropped, to acquire a saliency area; under the condition that a plurality of salient regions are provided, respectively performing expansion treatment on the salient regions by using a second expansion coefficient to obtain a plurality of second expansion regions; merging the second expansion areas with the intersection among the plurality of second expansion areas to obtain a second area set; determining a second region in the second region set, in which the first center point is located, as a second target region when the first center point is located in the second region set; in the case that the first center point is located outside the second set of regions, the largest region in the second set of regions is determined as the second target region.

Each salient region may be separately dilated using a second expansion coefficient. The expansion process may be: the salient region is expanded outwards by a long expansion value and a wide expansion value corresponding to the second expansion coefficient. The second expansion coefficient may be a fixed value or a relative value. The length expansion value and the width expansion value corresponding to the second expansion coefficient may be the same or different. The second coefficient of expansion may be: the ratio of the size of the cut image to the size of the salient region (area ratio, length ratio, width ratio, etc.) may be a predetermined value positively correlated with the size of the salient region (different area regions correspond to different predetermined values), or may be a predetermined value.

The second expansion coefficient is defined in a manner similar to that of the first expansion coefficient, and the corresponding expansion value is determined in a manner similar to that of the first expansion coefficient, and a detailed description thereof will be omitted.

After the expansion processing is performed on each of the plurality of significant regions, a plurality of second expansion regions can be obtained. And combining the second expansion areas (the second expansion areas are communicated) with each other in the plurality of second expansion areas to obtain a second area set.

After the second set of regions is obtained, a second target region may be determined from the second set of regions. Determining a second region in the second region set, in which the first center point is located, as a second target region when the first center point is located in the second region set; and determining the largest region in the second region set as a second target region under the condition that the first center point is positioned outside the second region set, and taking the center point of the second target region as a second center point.

For example, as shown in fig. 5, each region shown in fig. 5 is a second communication region. If the position of the first center point is position A, region 3 is determined to be the second target region. If the position of the first center point is position B, area 2 is determined to be the second target area.

After the second target area is determined, a center point of the second target area may be determined as a second center point.

According to the embodiment, the saliency areas obtained by performing saliency detection on the image to be cut are subjected to expansion processing, and the largest connected area is selected as the second target area, so that the saliency areas which are not overlapped and are closer in distance can be connected, the cut image contains as many saliency objects as possible, and the accuracy of image cutting is improved.

After the first center point and the second center point are acquired, the center point of the clipping image can be determined, so that the image to be clipped can be clipped according to the center point of the clipping image and the target size, and the clipping image is obtained.

The validity of the first center point and the second center point may be different for different images, and may include: the first center point is an effective point, and the second center point is an effective point; the second center point is an effective point, and the second center point is an ineffective point; the first center point is an invalid point, and the second center point is an effective point; the first center point is an invalid point and the second center point is an invalid point. For different situations, the center point of the clipping image can be determined in different modes, so that the clipping image is clipped.

In step S206, in the case where the first center point is an invalid point, a cut image of a target size is cut out from the image to be cut out based on the second center point.

When the first center point is an invalid point, it can be determined that the image to be cut does not contain the object of the target type, and a cut image of the target size can be cut out from the image to be cut based on the second center point.

Alternatively, in the present embodiment, cropping the cropping image of the target size from the image to be cropped based on the second center point may include: cutting out a cutting image with a target size from the image to be cut by taking a third center point of the image to be cut as a center under the condition that the second center point is an invalid point; and cutting out a cut image of the target size from the image to be cut by taking the second center point as a center under the condition that the second center point is an effective point.

If the first center point and the second center point are invalid points, the center point of the image to be cut is taken as the center, and the cut image with the target size can be cut from the image to be cut.

If the first center point is an invalid point and the second center point is an effective point, a cutting image with a target size can be cut out from the image to be cut by taking the second center point as a center, so that the image cutting effect is at least the same as that of picture main body cutting by using saliency detection.

According to the embodiment, when the first center point is an invalid point, the image is cut according to whether the second center point is effective or not in different modes, so that the accuracy of image cutting can be ensured.

In step S208, in the case where the first center point is a valid point, a cut image of the target size is cut out from the image to be cut out based on the first center point and the second center point.

When the first center point is an effective point, it can be determined that the image to be cut contains an object of the target type, and a cut image of the target size can be cut out from the image to be cut based on the first center point and the second center point.

Alternatively, in the present embodiment, cropping the cropping image of the target size from the image to be cropped based on the first center point and the second center point may include: cutting out a cut image of a target size from the image to be cut by taking the first center point as a center under the condition that the second center point is an invalid point; under the condition that the second center point is an effective point and the first center point is located in the second target area, weighting and summing the first center point and the second center point to obtain a target point; cutting out a cutting image with a target size from the image to be cut by taking the target point as a center; and under the condition that the second center point is an effective point and the first center point is positioned outside the second target area, cutting out a cutting image with the target size from the image to be cut by taking the second center point as the center.

If the first center point is an effective point and the second center point is an ineffective point, the first center point is taken as the center, and a cut image with the target size is cut from the image to be cut.

If the first center point and the second center point are effective points, the first center point and the second center point can be weighted and summed to obtain a target point, and the target point is taken as the center, a cut image with the target size is cut out of the image to be cut out, or the first center point is taken as the center, a cut image with the target size is cut out of the image to be cut out, or the second center point is taken as the center, and the cut image with the target size is cut out of the image to be cut out.

Optionally, if the first center point and the second center point are valid points, it may be further determined whether the first center point is located in the second target area, and the center point of the clipping image may be determined according to the positional relationship between the first center point and the second target area.

If the first center point is located in the second target area, it can be determined that the saliency area and the target object area overlap more (the object of the target type belongs to the saliency object), and the first center point and the second center point can be weighted and summed to obtain a target point, and the target point is taken as the center, so that a cut image of the target size is cut out from the image to be cut. The weighted weights may be set as desired, and the weights of the first center point may be increased if the cropped image is desired to contain more objects of the target type, and the weights of the second center point may be increased if the cropped image is desired to contain more regions of saliency.

If the first center point is located outside the second target area, it may be determined that the saliency area and the target object area overlap less (the target type object is not a saliency object), the target type object is not a main portion of the image to be cropped, and the cropping image of the target size may be cropped from the image to be cropped with the second center point as the center.

According to the embodiment, when the first center point is an effective point, image clipping is performed in different modes according to whether the second center point is effective or not and the positions of the first center point and the second target area, so that the accuracy of image clipping can be ensured.

The manner in which the image is cropped is described below in connection with alternative examples. In this example, a set of adaptive cropping strategies is used in conjunction with face detection and saliency detection. The images to be cropped can be classified into the following categories:

category 1: the human face picture is contained, and the significant part of the picture is overlapped with the human face main body (or the human face concentrated area);

category 2: the picture contains the human face picture, but the significant part of the picture is not overlapped with the human face main body (or the human face concentrated area);

category 3: the human face picture is contained, but the whole tone light of the picture is not obvious, so that no obvious area exists;

Category 4: does not contain a face picture but contains other objects;

category 5: picture without salient region in its entirety.

Of the above categories, the categories in which the face detection algorithm can function include categories 1, 2, and 3, and the categories in which the saliency detection algorithm can function include 1, 2, and 4. Therefore, the image is cut by combining the face detection algorithm and the saliency detection algorithm, and almost all types of images can be covered. In addition, the image clipping method in the example can achieve the effect of complementing the face detection algorithm and the saliency detection algorithm, and when one algorithm is subjected to missing detection or false detection, the other algorithm can perform a certain correction function. Such as: face detection is missed, and the main body area can still be cut out according to the obvious human body area by the graph cutting strategy.

When the method is implemented, a face detection algorithm corresponds to a face feature map, a feature map maximum region centroid f_center (a first center point) is calculated in an abstract mode, a saliency detection algorithm corresponds to a saliency feature map, and the saliency feature map comprises a face maximum region centroid (only the maximum region centroid is calculated if no) s_center according to a face detection result in an abstract mode. For f_center, the point coordinates are (f_center) _x ,f_center _y ) The method comprises the steps of carrying out a first treatment on the surface of the For s_center, the point coordinates are (s_center _x ,s_center _y )。

We set a unified weight ω trade-off of f_center and s_center versus final crop center point center coordinates (centers) _x ,center _y ) The effect of (2) is shown in formula (1):

the weight parameter ω is proportional to the face area f_area where f_center is located (the area of the first target area) divided by the saliency area s_area where s_center is located (the area of the second target area), as shown in formula (2):

considering all cases, the value of ω is shown in equation (3):

wherein fs_area is the overlapping area of the face region and the saliency region. The effect is that the larger the specific gravity of the human face is, the more the center of the whole figure is deviated to the direction of the human face. Let us set the initial value of s_center to the exact center of the full graph, equation 1 above corresponds to process picture category 3, equation 2 above corresponds to process picture category 1, equation 3 above corresponds to process picture categories 2, 4, 5.

After clipping the clip image, the server may send the clip image to the client that uploaded the initial image and display it on the terminal device running the client.

The clipping method of the image area described above is described below with reference to an alternative example. In this example, the object of the target type is a human face, and the target object region is a human face region.

In the related art, the picture body clipping can be performed by using face detection (mode one). For pictures which do not contain human faces or pictures which fail to detect human faces, the picture body cannot be further analyzed. In addition, only the face area is analyzed, the offset influence of the body area on the center of the picture main body is ignored, and the cut main body area cannot contain more body contents as much as possible. When the face areas are not overlapped and are closer, one face is easily mistakenly taken as the center of the picture main body, and other faces are incompletely cut.

In addition to picture body cropping using face detection, picture body cropping may also be performed in the following manner:

mode two: image main body clipping by using face detection and background recognition

The face detection can be performed on the picture first, and the method is similar to the first mode. When the picture fails to detect the human face, a picture blocking algorithm is adopted to calculate the left, right, upper and lower background parts, and the background is calibrated for the picture. The frame with the background portion removed is then marked as the main portion of the picture.

The image segmentation algorithm uses the idea of merging, namely, continuously merging adjacent pixels with large similarity from pixels of an image, and is an image segmentation algorithm based on the image. When the picture is very complex, the background recognition effect is not obvious, and the picture main body still cannot be determined. It is also difficult to distinguish between the complex objects in the picture. In addition, this method is intended to include a main body region as much as possible, and when the background portion is excessively deleted or the specification is inappropriate, the specification pattern can be synthesized only by filling the gaps.

Mode three: picture main body clipping using saliency detection

The picture main body can be effectively cut by using a method for detecting the picture based on the deep learning significance detection.

The above-mentioned manner is that, on the one hand, character information, such as a person lying on his/her back, is ignored and the person is trapped in the middle part of the body. On the other hand, the algorithm used for the saliency detection at the present stage can only run in real time under the GPU, and the efficiency is lower under the environment with only the CPU.

The clipping method of the image area in the example does not need manual intervention, is suitable for adaptively clipping the same picture to meet the scenes with different size specification requirements, and can also be used as a basis for extracting materials to be used as a poster drawing. The clipping method of the image area in this example may obtain a face feature map and a saliency feature map after face detection and saliency detection are performed on the picture. According to different detection conditions of different pictures, a picture main body area can be positioned in the following manner: the geometrical center of the most important face area can be found through the analysis of the face feature map; the geometric center of the maximum communication area can be selected after the saliency characteristic map is expanded; the two feature maps are combined to find out the salient region where the most main face region is located, and the optimal geometric center is comprehensively analyzed.

Compared with the first to third modes, the clipping method of the image area in the present example has the following advantages:

compared with the first mode, the clipping method of the image area in the example uses the saliency feature image to clip the picture which does not contain the face; aiming at the picture containing the human face, the center is adjusted by combining the human face detection and the saliency feature map; aiming at the problem of multiple faces, the face area is inflated first, so that adjacent faces are connected into a face area.

Compared with the second mode, the clipping method of the image area in the example directly adopts significance detection based on deep learning without background recognition, and simplifies the existing deep learning network to obtain an algorithm with higher efficiency and only focusing on the salient area; and directly provides the specification style free selection.

Compared with the third mode, the clipping method of the image area in the example simultaneously considers the relationship between the face and the human body and other objects, and obtains the most reasonable clipping area by utilizing a refinement strategy; the operation efficiency of the CPU environment is improved on the premise of ensuring the effect.

The comparison result of the clipping modes of each image area may be shown in table 1, and the recognition results of mode one, mode three and mode four, which are the clipping methods of the image areas in the present example, may be shown in fig. 8.

TABLE 1

As shown in fig. 9, the clipping method of the image region in the present example may include the steps of:

and step 1, inputting a picture.

For an input picture, the size specification of the cut picture can be set, for example, if the length and width specified by the cut picture are fixed, the length and width of the picture are scaled and then cut, or the cut picture is strictly performed according to the fixed length and width of the input size. And (5) performing conditional judgment on the input specification to judge whether the input specification is a valid input. Here, the invalid input includes that the input size is not positive, a fixed length-width cropping map is set, and the input size is larger than the length-width of the picture itself.

And step 2, carrying out face detection and outputting a face region characteristic binary image.

Face detection may be performed on the input picture using faceboxes models. And outputting a face region characteristic binary image after the face detection is finished.

And 3, connecting the adjacent face areas, and determining the central point of the maximum connected area.

If a face is detected (i.e. the binary image is valid), the feature image is inflated, and then the center point f_center of the maximum connected region is found. The expansion processing is to process several close faces to form a whole communication area, and the center point obtained after the processing is the center point obtained by considering the several faces.

And 4, performing significance detection and outputting a significance characteristic binary image.

The saliency detection method (RAS) based on Reverse Attention can be used for performing saliency detection on the input picture, and a saliency feature binary image is output.

And 5, connecting the similar areas, and determining the center point of the specific connected area.

The saliency map may be expanded in a manner similar to that of step 3, in that it is desirable to integrate adjacent ones of the otherwise unconnected saliency regions. Unlike step 3, when the face is detected by face detection, the saliency map does not find the maximum connected region center point, but finds the center point s_center of the connected region where the f_center is located. When the face detection does not detect a face, the maximum connected region center point s_center, or the center point s_center of the connected region closest to the f_center is found.

And 6, determining a graph cutting center according to the obtained f_center and s_center.

The following cases can be treated:

case 1: f_center and s_center have no valid graph, and the final graph cutting center takes the center point of the full graph;

case 2: f_center has no valid value, s_center has a value, and the final crop center center=s_center;

Case 3: f_center and s_center have values but f_center is not in the salient region where s_center is located, and final crop center=s_center;

case 4: the f_center and s_center have values and the f_center is within the saliency region where the s_center is located, the final crop center center=w×f_center+ (1-w) ×s_center, where w is a weighting coefficient.

Case 5: f_center has a value, s_center has no valid value, and final crop center center=f_center.

And 7, executing clipping operation on the picture to obtain a clipped picture.

A picture corresponding to the size specification input by the user may be cropped from the input picture (or the scaled picture) with the determined cropping center as a center point.

By the image region clipping method in the example, the self-adaptive clipping of the pictures can provide great convenience for webpage multi-style picture processing, video cover map processing and artificial intelligence design poster, and picture operators and designers can process a large number of pictures more conveniently. Meanwhile, on the picture display effect, the picture which is suitable to cut is displayed to the user, and the experience of the user for using related products can be well improved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

According to another aspect of the embodiment of the present invention, there is also provided an image region clipping device for implementing the above image region clipping method, as shown in fig. 10, the device including:

(1) A first obtaining unit 1002, configured to obtain a first center point of a first target area in a first area set corresponding to an image to be cropped, where each first area includes an object belonging to a target type, and the first target area is a largest area in the first area set;

(2) A second obtaining unit 1004, configured to obtain a second center point of a second target area in a second area set corresponding to the image to be cropped, where each second area includes a saliency area obtained by performing saliency detection on the image to be cropped, and the second target area is a second area where the first center point in the second area set is located, or is a largest area in the second area set;

(3) A first clipping unit 1006 configured to clip a clipping image of a target size from the image to be clipped based on the second center point, in the case where the first center point is an invalid point;

(4) And a second clipping unit 1008 configured to clip a clipping image of a target size from the image to be clipped based on the first center point and the second center point, if the first center point is a valid point.

Optionally, the clipping device for image area may be applied to other scenes needing to be clipped by the clipping device for image area, but not limited to the process of converting the horizontal screen of the image into the vertical screen and the process of converting the size of the picture.

Alternatively, the first obtaining unit 1002 may be configured to perform the above step S202, the second obtaining unit 1004 may be configured to perform the above step S204, the first clipping unit 1006 may be configured to perform the above step S206, and the second clipping unit 1008 may be configured to perform the above step S208.

As an alternative embodiment, the first clipping unit 1006 includes:

(1) The first clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the third center point of the image to be clipped as the center under the condition that the second center point is an invalid point;

(2) And the second clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the second center point as the center under the condition that the second center point is an effective point.

As an alternative embodiment, the second clipping unit 1008 includes:

(1) The third clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the first central point as the center under the condition that the second central point is an invalid point;

(2) The fourth clipping module is configured to perform weighted summation on the first center point and the second center point to obtain a target point when the second center point is an effective point and the first center point is located in the second target area; cutting out the cutting image of the target size from the image to be cut by taking the target point as a center;

(3) And the fifth clipping module is used for clipping the clipping image with the target size from the image to be clipped by taking the second central point as the center under the condition that the second central point is an effective point and the first central point is positioned outside the second target area.

As an alternative embodiment, the apparatus further comprises:

(1) The first detection unit is used for carrying out target object detection on the image to be cut before a first center point of a first target area in a first area set corresponding to the image to be cut is acquired, and a target object area is acquired, wherein the target object detection is used for detecting an object of a target type contained in the image to be cut;

(2) The first expansion processing unit is used for respectively carrying out expansion processing on the plurality of target object areas by using the first expansion coefficients to obtain a plurality of first expansion areas when the target object areas are a plurality of;

(3) The first merging unit is used for merging the first expansion areas with the intersection among the plurality of first expansion areas to obtain a first area set;

(4) And the first determining unit is used for determining a first target area from the first area set.

As an alternative embodiment, the apparatus further comprises:

(1) The second detection unit is used for detecting the saliency of the image to be cut before acquiring a second center point of a second target area in a second area set corresponding to the image to be cut, and acquiring a saliency area;

(2) The second expansion processing unit is used for respectively carrying out expansion processing on the plurality of salient regions by using a second expansion coefficient to obtain a plurality of second expansion regions when the salient regions are a plurality of;

(3) The second merging unit is used for merging the second expansion areas with the intersection among the plurality of second expansion areas to obtain a second area set;

(4) A second determining unit, configured to determine, when the first center point is located in the second area set, a second area in which the first center point is located in the second area set as a second target area;

(5) And a third determining unit configured to determine a maximum region in the second region set as the second target region in a case where the first center point is located outside the second region set.

As an alternative embodiment, the apparatus further comprises:

(1) The input unit is used for inputting the image to be cut into the target detection model before acquiring a second center point of a second target area in a second area set corresponding to the image to be cut, and acquiring a side output saliency map output by a target side output layer of the target detection model, wherein the target detection model is used for carrying out saliency detection on the image to be cut, the target detection model comprises a second number of side output layers, the target side output layers are a first number of side output layers before the target detection model, and the first number is smaller than the second number;

(2) The third acquisition unit is used for outputting a saliency map according to the side, and acquiring a saliency feature map corresponding to the image to be cut;

(3) And a fourth determining unit, configured to determine a salient region from the salient feature map.

As an alternative embodiment, the apparatus further comprises:

(1) A fourth obtaining unit, configured to obtain a specification parameter corresponding to the initial image before obtaining a first center point of a first target area in a first area set corresponding to the image to be cut, where the specification parameter is used to represent parameter information required for cutting the image to be cut from the initial image;

(2) And the scaling unit is used for scaling the initial image according to the scaling ratio under the condition that the specification parameters comprise the scaling ratio to obtain the image to be cut.

According to a further aspect of embodiments of the present invention there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

s1, acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set;

s2, obtaining second center points of second target areas in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is the largest area in the second area set;

s3, cutting out a cut image with a target size from the image to be cut based on the second center point under the condition that the first center point is an invalid point;

And S4, cutting out a cut image with the target size from the image to be cut based on the first center point and the second center point under the condition that the first center point is an effective point.

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

According to still another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the above clipping method of an image area, as shown in fig. 11, the electronic device including: processor 1102, memory 1104, transmission 1106, and the like. The memory has stored therein a computer program, the processor being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, it will be appreciated by those skilled in the art that the configuration shown in fig. 11 is merely illustrative, and the electronic device may be the terminal device 102 or the server 106 shown in fig. 1. Fig. 11 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.

The memory 1104 may be used to store software programs and modules, such as program instructions/modules corresponding to the image region clipping method and apparatus in the embodiment of the present invention, and the processor 1102 executes the software programs and modules stored in the memory 1104, thereby executing various functional applications and data processing, that is, implementing the image region clipping method. Memory 1104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1104 may further include memory remotely located relative to the processor 1102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 1106 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 1106 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1106 is a Radio Frequency (RF) module for communicating wirelessly with the internet.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method of cropping an image region, comprising:

acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area comprises an object belonging to a target type, and the first target area is the largest area in the first area set;

Acquiring a second center point of a second target area in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is the second area where the first center point in the second area set is located or is the largest area in the second area set;

cutting out a cut image of a target size from the image to be cut based on the second center point under the condition that the first center point is an invalid point;

cutting out the cut image of the target size from the image to be cut by taking the first center point as a center under the condition that the first center point is an effective point and the second center point is an ineffective point;

when the first central point is an effective point, the second central point is an effective point, and the first central point is located in the second target area, carrying out weighted summation on the first central point and the second central point to obtain a target point, wherein the weighted weight and the number requirement of the target types contained in the cut image are in positive correlation, or the weighted weight and the number requirement of the saliency areas contained in the cut image are in positive correlation; cutting out the cutting image of the target size from the image to be cut by taking the target point as a center;

And cutting out the cut image with the target size from the image to be cut by taking the second center point as the center under the condition that the first center point is an effective point and the second center point is an effective point and the first center point is positioned outside the second target area.

2. The method of claim 1, wherein cropping the cropped image of the target size from the image to be cropped based on the second center point comprises:

cutting out the cut image of the target size from the image to be cut by taking the third center point of the image to be cut as a center under the condition that the second center point is an invalid point;

and under the condition that the second center point is an effective point, cutting out the cut image of the target size from the image to be cut by taking the second center point as a center.

3. The method of claim 1, wherein prior to acquiring the first center point of the first target region in the first set of regions corresponding to the image to be cropped, the method further comprises:

performing target object detection on the image to be cut to obtain a target object area, wherein the target object detection is used for detecting the object of the target type contained in the image to be cut;

When the target object areas are a plurality of, performing expansion processing on the plurality of target object areas by using a first expansion coefficient to obtain a plurality of first expansion areas;

merging first expansion areas with intersections in the plurality of first expansion areas to obtain a first area set;

and determining the first target area from the first area set.

4. The method of claim 1, wherein prior to acquiring the second center point of the second target region in the second set of regions corresponding to the image to be cropped, the method further comprises:

performing significance detection on the image to be cut to obtain the significance region;

when the number of the salient regions is multiple, performing expansion processing on the salient regions by using a second expansion coefficient to obtain a plurality of second expansion regions;

merging second expansion areas with intersections in the plurality of second expansion areas to obtain a second area set;

determining a second region in the second region set, in which the first center point is located, as the second target region, when the first center point is located in the second region set;

And determining the largest region in the second region set as the second target region when the first center point is located outside the second region set.

5. The method of claim 1, wherein prior to acquiring the second center point of the second target region in the second set of regions corresponding to the image to be cropped, the method further comprises:

inputting the image to be cut into a target detection model, and acquiring a side output saliency map output by a target side output layer of the target detection model, wherein the target detection model is used for performing saliency detection on the image to be cut, the target detection model comprises a second number of side output layers, the target side output layers are a first number of side output layers before the target detection model, and the first number is smaller than the second number;

according to the side output saliency map, obtaining a saliency feature map corresponding to the image to be cut;

and determining the salient region from the salient feature map.

6. The method of any of claims 1-5, wherein prior to acquiring the first center point of the first target region in the first set of regions corresponding to the image to be cropped, the method further comprises:

Acquiring specification parameters corresponding to an initial image, wherein the specification parameters are used for representing parameter information required by cutting out the cut image from the initial image;

and under the condition that the specification parameters comprise scaling, scaling the initial image according to the scaling to obtain the image to be cut.

7. A cropping device for an image area, comprising:

the first acquisition unit is used for acquiring a first center point of a first target area in a first area set corresponding to an image to be cut, wherein each first area contains an object belonging to a target type, and the first target area is the largest area in the first area set;

the second obtaining unit is used for obtaining a second center point of a second target area in a second area set corresponding to the image to be cut, wherein each second area comprises a saliency area obtained by performing saliency detection on the image to be cut, and the second target area is a second area where the first center point in the second area set is located or is a maximum area in the second area set;

The first clipping unit is used for clipping a clipping image with a target size from the image to be clipped based on the second center point under the condition that the first center point is an invalid point;

the second clipping unit is used for clipping a clipping image with a target size from the image to be clipped based on the first center point and the second center point under the condition that the first center point is an effective point;

the second clipping unit includes:

the third clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the first central point as the center under the condition that the second central point is an invalid point;

the fourth clipping module is configured to perform weighted summation on the first center point and the second center point to obtain a target point when the first center point is an effective point and the second center point is an effective point and the first center point is located in the second target area, where the weighted weight and the number requirement of the target types included in the clipping image are in a positive correlation, or the weighted weight and the number requirement of the saliency areas included in the clipping image are in a positive correlation; cutting out the cutting image of the target size from the image to be cut by taking the target point as a center;

And the fifth clipping module is used for clipping the clipping image with the target size from the image to be clipped by taking the second central point as the center under the condition that the second central point is an effective point and the first central point is positioned outside the second target area.

8. The apparatus of claim 7, wherein the first clipping unit comprises:

the first clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the third center point of the image to be clipped as the center under the condition that the second center point is an invalid point;

and the second clipping module is used for clipping the clipping image of the target size from the image to be clipped by taking the second center point as the center under the condition that the second center point is an effective point.

9. The apparatus of claim 7, wherein the apparatus further comprises:

the first detection unit is used for detecting a target object of the image to be cut before the first center point of the first target area in the first area set corresponding to the image to be cut is acquired, and a target object area is acquired, wherein the target object detection is used for detecting an object of the target type contained in the image to be cut;

The first expansion processing unit is used for respectively carrying out expansion processing on the plurality of target object areas by using a first expansion coefficient to obtain a plurality of first expansion areas when the target object areas are a plurality of;

a first merging unit, configured to merge first expansion areas with intersections in the plurality of first expansion areas, so as to obtain the first area set;

and the first determining unit is used for determining the first target area from the first area set.

10. The apparatus of claim 7, wherein the apparatus further comprises:

the second detection unit is used for performing saliency detection on the image to be cut before acquiring the second center point of the second target area in the second area set corresponding to the image to be cut, and acquiring the saliency area;

the second expansion processing unit is used for respectively carrying out expansion processing on the plurality of salient regions by using a second expansion coefficient to obtain a plurality of second expansion regions when the salient regions are a plurality of;

a second merging unit, configured to merge second expansion areas with intersections in the plurality of second expansion areas, so as to obtain the second area set;

A second determining unit, configured to determine, when the first center point is located in the second area set, a second area in the second area set where the first center point is located as the second target area;

and a third determining unit configured to determine a maximum region in the second region set as the second target region, in a case where the first center point is located outside the second region set.

11. The apparatus according to any one of claims 7 to 10, further comprising:

the input unit is used for inputting the image to be cut into a target detection model before acquiring the second center point of the second target area in the second area set corresponding to the image to be cut, and acquiring a side output saliency map output by a target side output layer of the target detection model, wherein the target detection model is used for carrying out saliency detection on the image to be cut and comprises a second number of side output layers, and the target side output layers are a first number of side output layers before the target detection model, and the first number is smaller than the second number;

The third acquisition unit is used for outputting a saliency map according to the side, and acquiring a saliency feature map corresponding to the image to be cut;

and a fourth determining unit, configured to determine the salient region from the salient feature map.

12. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method of any of claims 1 to 6 when run.

13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 6 by means of the computer program.