CN112017193A

CN112017193A - Image cropping device and method based on visual saliency and aesthetic score

Info

Publication number: CN112017193A
Application number: CN202010858270.1A
Authority: CN
Inventors: 吕亚奇; 熊永春; 李云夕
Original assignee: Hangzhou Quwei Science & Technology Co ltd
Current assignee: Hangzhou Quwei Science & Technology Co ltd
Priority date: 2020-08-24
Filing date: 2020-08-24
Publication date: 2020-12-01

Abstract

An image cropping device and method based on visual saliency and aesthetic score, the device comprises an operation module, a saliency detection module, a cropping processing module, an aesthetic quality evaluation module and a display module; the significance detection module and the aesthetic quality evaluation module are deep convolutional neural networks; according to the invention, the cutting processing module is arranged, the initial cutting frame is obtained according to the obvious target frame and the cutting width-to-height ratio, the width, the height and the central point x and y coordinate of the single cutting frame are sequentially transformed, the obvious target frame does not need to be traversed, and the cutting speed is accelerated.

Description

Image cropping device and method based on visual saliency and aesthetic score

Technical Field

The invention relates to the field of image analysis, in particular to an image cropping device and method based on visual saliency and aesthetic scores.

Background

With the development of intelligent devices, the requirements for the intelligent devices are higher and higher, and the automatic processing is gradually changed from the initial realization to the automatic and efficient processing. In order to deal with the rapid processing of a large number of pictures, a variety of picture processing software is designed, and the picture processing software can perform automatic cutting, beautifying and other operations on the pictures. The existing picture cropping method mainly comprises three types:

the first type is to crop directly from the center point of the image as the center. The cutting method has poor applicability and unsatisfactory cutting effect on the condition that the target cutting area is not located at the center of the picture.

The second category is automatic cropping of images based on recognition of face information or traditional saliency algorithms. The algorithm has low recognition degree on the complex scene image, the cutting result can be output only by traversing the remarkable image in the image cutting process, the speed is low, and on the other hand, the cutting fails if no remarkable target exists in the image.

The third type is an image cropping model obtained based on deep learning, the conventional image cropping model has poor generalization capability due to the limitation of the number of training samples, and the aspect ratio of an image cropping area cannot be arbitrarily specified, so that the conventional image cropping model is difficult to adapt to the cropping requirement of any proportion.

The three image cropping methods have disadvantages, so that an image cropping method which can flexibly adapt to images with any cropping proportion, is high in fault tolerance rate and wide in universality is urgently needed.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides an image cropping device and an image cropping method based on visual saliency and aesthetic score, which can efficiently and quickly crop and process a large number of images and are convenient to use.

An image cropping device based on visual saliency and aesthetic scores comprises an operation module, a saliency detection module, a cropping processing module, an aesthetic quality evaluation module and a display module; the operation module is electrically connected with the significance detection module, the cutting processing module and the display module, transmits the initial image information to the significance detection module through a connecting circuit, and transmits an operation instruction to the cutting processing module through the connecting circuit; the saliency module identifies a saliency region of the image; the cropping processing module performs cropping framing on the image according to the saliency area of the image and the operation instruction, and the image subjected to cropping framing is sent to the aesthetic quality evaluation module through a connecting line; the aesthetic quality evaluation module can grade the images in the cutting frame after training; the image with the highest aesthetic quality score is cut according to the cutting frame, and the obtained cut image is used as a final cut image and sent to the display module; the display module is capable of displaying the final cropped image.

Further, the display module simultaneously displays the initial image transmitted by the operation module and the final cut image transmitted by the aesthetic quality evaluation module; the operation module receives an initial image to be cut and an operation instruction input by an operator, wherein the operation instruction comprises the cut aspect ratio.

Further, the significance detection module and the aesthetic quality evaluation module are deep convolutional neural networks.

An image cropping method based on visual saliency and aesthetic scores comprising the steps of:

step S1: the operation module receives the initial image and the cropped aspect ratio, sends the initial image to the significance detection module and sends the cropped aspect ratio to the cropping processing module;

step S2: the saliency detection module receives the initial image to perform saliency region detection to obtain an initial image with a saliency target frame, and sends the initial image with the saliency target frame to the cropping processing module;

step S3: the cropping processing module obtains an initial image with an initial cropping frame according to the significant target frame and the cropped aspect ratio, and generates an initial image with a group of candidate cropping frames based on the initial cropping frame; the group of candidate cropping frames at least comprises one candidate cropping frame; combining each candidate cutting frame with the initial image, and cutting according to the candidate cutting frames to obtain a group of candidate cutting images; sending the candidate cropped image to an aesthetic quality evaluation module;

step S4: the aesthetic quality evaluation module evaluates the aesthetic quality score of each candidate cropped image, and sends the candidate cropped image with the highest aesthetic quality score as a final cropped image to the display module;

step S5: the display module receives the final cut image sent by the aesthetic quality evaluation module and displays the final cut image and the initial image simultaneously;

wherein the significance detection module and the aesthetic quality evaluation module need to be trained firstly.

Further, the significant object box in the step S2 is marked as b_salientThe salient object box is derived from equation (1):

b_salient＝S(I_input) (1)

wherein I_inputA three-dimensional matrix representation representing an initial image; and S is an operator obtained after the significance detection module is trained.

Further, the step of generating a candidate trimming frame in step S3, and trimming according to the candidate trimming frame includes:

s31: determining a cropped aspect ratio r_wH and significant object box b_salient；

S32: with the center of the salient object frame as the origin, combining the width-to-height ratio r of the cut_wH, obtaining an initial cutting frame b containing the image range of the significant target frame_init；

S33: according to the obtained initial cutting frame b_initGenerating a group of candidate cutting frames;

s34: and matching each candidate cutting frame with the initial image, and cutting to obtain a candidate cut image.

Further, in step S32, in order to obtain the initial trimming frame, h needs to be defined first_salient，w_salient，x_salient，y_salientRespectively as a significant object frame b_salientHeight, width, and x and y coordinates of the center point; secondly according to the significant object frame b_salientAnd the aspect ratio r of the cut_wCalculating initial cutting frame b_initAs shown in formula (2):

wherein h is_init，w_init，x_init，y_initAre respectively an initial cutting frame b_initHeight, width, and x and y coordinates of the center point;

if w is satisfied_init≥w_salientThen output the initial trimming frame b_init(ii) a Otherwise, updating the initial cutting frame b according to the formula (3)_initThe width and height and center point data of (a):

outputting an initial trimming frame b_init。

Further, in S33, a set of candidate cropping frames is generated, and the generating step includes:

s331: b of the initial cutting frame_initTransforming within a set high transformation ratio range to obtain n1 cutting frames; wherein the high transformation ratio of each trimming frame is obtained from a ratio of the high transformation ratio range to (n 1-1);

s332: w of the trimming frame obtained in step S311_initConverting within a set wide conversion ratio range to obtain n1 × n2 cutting frames; wherein the wide transform scale of each crop frame is obtained from a ratio of the wide transform scale range to (n 2-1);

s333: x of the trimming frame obtained in step S332_initConverting in the set central point conversion scale range to obtain n1 × n2 × n3 cutting frames; wherein the central point transformation proportion of each cutting frame is obtained according to the ratio of the central point transformation proportion range to (n 3-1);

s334: y of the trimming frame obtained in step S333_initConverting within the set central point conversion scale range to obtain n1 × n2 × n3 × n4 cutting frames; wherein the central point transformation proportion of each cutting frame is obtained according to the ratio of the central point transformation proportion range to (n 4-1);

s335: n trimming frames are randomly picked as candidate trimming frames from the n1 × n2 × n3 × n4 trimming frames obtained in step S334.

Further, in step S4, the candidate cropped image is input to the aesthetic quality evaluation module to obtain an aesthetic quality score q_kAs shown in formula (4):

wherein

Representing the three-dimensional form of the candidate cropped image, wherein k belongs to 1, and n represents the number of candidate cropping frames; and A is an operator obtained after the aesthetic quality evaluation module is trained.

Further, in S33, h is required to be added_init，w_init，x_init，y_initEach parameter in the step (a) is sequentially randomly transformed once in a set proportion range to obtain a candidate cropping frame, and n times of operations are repeated to obtain n candidate cropping frames.

The invention has the beneficial effects that:

by arranging the cutting processing module, obtaining an initial cutting frame according to the obvious target frame and the cutting width-to-height ratio, and independently and sequentially transforming the width, the height and the central point x and y coordinates of the cutting frame, the obvious target frame does not need to be traversed, and the cutting speed is accelerated;

by arranging the significance detection module and the aesthetic quality evaluation module and training the significance detection module and the aesthetic quality evaluation module, the significant target frame can be automatically judged, the candidate cut images can be scored and selected, and the robustness of the algorithm is good;

the display module can output the initial image and the final cut image simultaneously, is convenient to compare, can display the intermediate processing process of the image, and can correct and check the processing process;

the invention marks the images in different stages, is convenient for distinguishing the stage of the current image, can find the stage of the error image by checking the cutting process of the image, and corrects the corresponding module in time.

Drawings

FIG. 1 is a block flow diagram of a first embodiment of the present invention;

fig. 2 is a schematic diagram of an initial image according to a first embodiment of the present invention:

FIG. 3 is a schematic diagram of an initial image with a frame of a salient object according to a first embodiment of the present invention;

FIG. 4 is a diagram illustrating an initial cropped image with an initial cropping frame according to a first embodiment of the present invention;

FIG. 5 is a diagram illustrating an initial image with a set of candidate cropping frames according to a first embodiment of the present invention;

FIG. 6 is a diagram illustrating an initial image with a candidate cropping frame according to a first embodiment of the present invention;

FIG. 7 is a final cropped image according to a first embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The first embodiment is as follows:

an image cropping device based on visual saliency and aesthetic scores comprises an operation module, a saliency detection module, a cropping processing module, an aesthetic quality evaluation module and a display module.

The operation module can receive an initial image to be cut and an operation instruction input by an operator, wherein the operation instruction comprises the cut aspect ratio. The operation module is electrically connected with the significance detection module, the cutting processing module and the display module, the operation module can transmit the received initial image information to the significance detection module through a connecting circuit, and the operation instruction is transmitted to the cutting processing module through the connecting circuit. In this embodiment, the significance detection module is a deep convolutional neural network, and the significance detection module is electrically connected with the operation module, the cropping processing module and the display module. The saliency module can identify a saliency region of an image after being trained, wherein an ideal saliency region represents a minimum block diagram region containing a saliency target, and the block diagram is an upright block diagram rather than a tilted block diagram. The cropping processing module divides the cropping frame of the image according to the saliency area of the image and the operation instruction, crops the image with the divided cropping frame according to the cropping frame to obtain a candidate cropped image, and sends the candidate cropped image to the aesthetic quality evaluation module through the connecting line. The cutting processing module is electrically connected with the significance detection module, the aesthetic quality evaluation module and the display module. The aesthetic quality evaluation module is a deep convolutional neural network and can score the candidate cropped images after training, wherein the candidate cropped image with the highest aesthetic quality score is used as a final cropped image and is sent to the display module. The aesthetic quality evaluation module is electrically connected with the cutting processing module and the display module. The display module can display the final cut image, and in the embodiment, the display module can also simultaneously display the initial image input by the operation module for comparison; the display module can also display the intermediate processing process of the image, so that the backtracking and the inspection are facilitated.

As shown in fig. 1 to 7, an image cropping method based on visual saliency and aesthetic score by means of the above-described image cropping device, comprises the steps of:

step S5: and the display module receives the final cut image sent by the aesthetic quality evaluation module and displays the final cut image and the initial image simultaneously.

As shown in fig. 3, the significant object box in the step S2 is marked as b_salientThe salient object box is derived from equation (1):

b_salient＝S(I_input) (1)

wherein I_inputA three-dimensional matrix representation representing an initial image; and S is an operator obtained after the significance detection module is trained. In this embodiment, the saliency detection module trains on a private data set comprising 20000 color images with marked saliency target frames, wherein the training of the saliency detection module is a conventional deep learning-based target detection training.

As shown in fig. 4 and 5, the step S3 of cropping the image and cropping according to the candidate cropping frame includes:

Where the initial trimming frame obtained in S32 contains the salient object frame from the height or width direction, h needs to be defined first_salient，w_salient，x_salient，y_salientRespectively as a significant object frame b_salientHeight, width, and x and y coordinates of the center point; secondly according to the significant object frame b_salientAnd the aspect ratio r of the cut_wCalculating initial cutting frame b_initAs shown in formula (2):

wherein h is_init，w_init，x_init，y_initAre respectively an initial cutting frame b_initHeight, width, and x and y coordinates of the center point.

outputting an initial trimming frame b_init。

Need to make sure thatIt is noted that in some other embodiments, the data of the initial trimming frame can be obtained according to equation (3) first, if h is satisfied_init≥h_salientThen output the initial trimming frame b_init(ii) a Otherwise, updating the initial trimming frame data according to the formula (2) and outputting the initial trimming frame data.

In S33, a group of candidate cropping frames is generated, and the generating step includes:

In this embodiment, the high transformation ratio range, the wide transformation ratio range and the center transformation ratio range are all [ -20%, 20% ], and n1, n2, n3 and n4 are all 5, which means that 5 × 5 is obtained in step S334, which is 625 cut frames, and 20 cut frames are randomly selected from the 625 cut frames as the candidate cut frames of this embodiment. In this embodiment, the value of n1 is 5, and taking the candidate cropping frame in S331 as an example:

the high transformation ratios of the candidate trimming frames in S331 are obtained as-20%, -10%, 0, 10%, and 20%, respectively, and five trimming frames, the heights of which are 80%, 90%, 1, 110%, and 120% of the initial trimming frame height, respectively, can be obtained by the transformation in step S331.

When generating a candidate trimming frame, h may be set_init，w_init，x_init，y_initEach parameter in the step (a) is sequentially and randomly transformed once in a set proportion range to obtain a candidate cutting frame, and n times of operations are repeated to obtain n candidate cutting frames. Say h_init，w_init，x_init，y_initThe transformation ratio ranges of (1) and (20)%]First, the initial cut frame is required to be at a height of [ 80%, 120%]Is randomly transformed once again within the range of (1), and then again over a width of [ 80%, 120% ]]Is randomly transformed once and then at the x coordinate of the center point [ 80%, 120% ]]Is randomly transformed once in the range of (1), and finally is positioned at the y coordinate of the central point (80%, 120%)]Randomly transforming once within the range of (1) to obtain a candidate cutting frame; repeating the random transformation 20 times can obtain 20 random candidate cropping frames.

In step S34, the candidate trimming frame can exceed the boundary of the initial image, and the portion exceeding the boundary of the initial image is filled with white pixels during trimming.

As shown in FIG. 6, in step S4, the candidate cropped image is input to the aesthetic quality evaluation module to obtain an aesthetic quality score q_kAs shown in formula (4):

wherein

Representing candidatesCropping a three-dimensional form of the image, k ∈ 1., n, where n is 20 in this embodiment, and n represents the number of candidate cropping frames; and A is an operator obtained after the aesthetic quality evaluation module is trained. In the embodiment, the aesthetic quality evaluation module trains on a private data set, the data set comprises q images which are manually scored, wherein the training of the aesthetic quality evaluation module is conventional target detection training based on deep learning.

When the images are processed in the saliency detection module, the cropping processing module and the aesthetic quality evaluation module, the images are displayed in the display module, and the images in different stages are marked by combining colors with solid and dotted lines in the embodiment, wherein a saliency target frame is a red solid line, an initial cropping frame is a green solid line, a candidate cropping frame is a light-colored dotted line frame, and a candidate cropping frame with the highest aesthetic quality score is a yellow solid line. In some other embodiments, the images at different stages can be marked in other ways, such as text marking.

In the implementation process, firstly, a saliency detection module and an aesthetic quality evaluation module are trained, after the training is finished, an initial image to be cut is input into an operation module, a candidate cutting frame is obtained through the detection of the saliency detection module and the cutting of a cutting processing module, finally, the aesthetic quality evaluation module scores the candidate cutting frame, and the image in the candidate cutting frame with the highest score is output to a display module as a final cutting image to realize the automatic cutting of the image; the manner in which candidate cropping frames are generated in conjunction with the cropping processing module enables fast, large-scale processing of images.

The above description is only one specific example of the present invention and should not be construed as limiting the invention in any way. It will be apparent to persons skilled in the relevant art(s) that, having the benefit of this disclosure and its principles, various modifications and changes in form and detail can be made without departing from the principles and structures of the invention, which are, however, encompassed by the appended claims.

Claims

1. An image cropping device based on visual saliency and aesthetic scores is characterized by comprising an operation module, a saliency detection module, a cropping processing module, an aesthetic quality evaluation module and a display module; the operation module is electrically connected with the significance detection module, the cutting processing module and the display module, transmits the initial image information to the significance detection module through a connecting circuit, and transmits an operation instruction to the cutting processing module through the connecting circuit; the saliency module identifies a saliency region of the image, the saliency region representing a minimum block region containing a saliency target; the cropping processing module performs cropping framing on the image according to the saliency area of the image and the operation instruction, and the image subjected to cropping framing is sent to the aesthetic quality evaluation module through a connecting line; the aesthetic quality evaluation module can grade the images in the cutting frame after training; the image with the highest aesthetic quality score is cut according to the cutting frame, and the obtained cut image is used as a final cut image and sent to the display module; the display module is capable of displaying the final cropped image.

2. The image cropping device based on visual saliency and aesthetic score as claimed in claim 1, characterized in that said display module simultaneously displays the initial image transmitted by the operation module and the final cropped image transmitted by the aesthetic quality evaluation module; the operation module receives an initial image to be cut and an operation instruction input by an operator, wherein the operation instruction comprises the cut aspect ratio.

3. The image cropping device based on visual saliency and aesthetic scores of claim 1, characterized in that said saliency detection module and aesthetic quality evaluation module are deep convolutional neural networks.

4. An image cropping method based on visual saliency and aesthetic scores, comprising the steps of:

5. The image cropping method based on visual saliency and aesthetic score as claimed in claim 4, wherein said salient object box marked as b in step S2_salientThe salient object box is derived from equation (1):

b_salient＝S(I_input) (1)

6. The image cropping method based on visual saliency and aesthetic score as claimed in claim 4, wherein said step S3 generates a candidate cropping frame, and the step of cropping according to the candidate cropping frame comprises:

7. The image cropping method based on visual saliency and aesthetic score as claimed in claim 6, wherein in order to obtain an initial cropping frame in S32, h is defined first_salient，w_salient，x_salient，y_salientRespectively as a significant object frame b_salientHeight, width, and x and y coordinates of the center point; secondly according to the significant object frame b_salientAnd the aspect ratio r of the cut_wCalculating an initial clipping frame bin as shown in equation (2):

outputting an initial trimming frame b_init。

8. The image cropping method based on visual saliency and aesthetic score as claimed in claim 4, wherein in said S33, a set of candidate cropping frames is generated, and the generation step comprises:

9. The image cropping method based on visual saliency and aesthetic score as claimed in claim 8, wherein said step S4 inputs the candidate cropped image into an aesthetic quality evaluation module to obtain an aesthetic quality score q_kAs shown in formula (4):

wherein

10. The image cropping method based on visual saliency and aesthetic score as claimed in claim 4, wherein in S33, h is required to be cut_init，w_init，x_init，y_initEach parameter in the step (a) is sequentially randomly transformed once in a set proportion range to obtain a candidate cropping frame, and n times of operations are repeated to obtain n candidate cropping frames.