CN106650737B

CN106650737B - Automatic image cutting method

Info

Publication number: CN106650737B
Application number: CN201611041091.9A
Authority: CN
Inventors: 黄凯奇; 赫然; 考月英
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2016-11-21
Filing date: 2016-11-21
Publication date: 2020-02-28
Anticipated expiration: 2036-11-21
Also published as: CN106650737A

Abstract

The invention relates to an automatic image cropping method. The method comprises the following steps: extracting an aesthetic response graph and a gradient energy graph of an image to be cut; intensively extracting candidate clipping images from the image to be clipped; screening candidate cutting images based on the aesthetic feeling response graph; and estimating the composition score of the screened candidate trimming images based on the aesthetic feeling response graph and the gradient energy graph, and determining the candidate trimming image with the highest score as the trimming image. The scheme utilizes the aesthetic feeling response graph to explore the aesthetic feeling influence area of the picture, utilizes the aesthetic feeling response graph to determine the aesthetic feeling retaining part, thereby retaining the high aesthetic feeling quality of the cut image to the maximum extent, and simultaneously utilizes the gradient energy graph to analyze the gradient distribution rule and evaluates the composition score of the cut image based on the aesthetic feeling response graph and the gradient energy graph. The embodiment of the invention makes up the defect of image composition expression and solves the technical problem of how to improve the robustness and precision of automatic image cutting.

Description

Automatic image cutting method

Technical Field

The invention relates to the technical field of pattern recognition, machine learning and computer vision, in particular to an automatic image cutting method.

Background

With the rapid development of computer technology and digital media technology, people have higher and higher demands and expectations for the fields of computer vision, artificial intelligence, machine perception and the like. Automatic cropping of images is also gaining increasing attention and development as a very important and common task in automatic editing of images. The image automatic cropping technology is expected to remove redundant areas and emphasize interested areas, so that the overall composition and aesthetic quality of an image are improved. An efficient and automatic image cropping method not only frees the human from the tedious work, but also provides some non-professional image editing suggestions.

Since image cropping is a very subjective task, existing rules have difficulty considering all the influencing factors. Conventional automatic cropping of images typically uses saliency maps to identify the main or interesting regions in the image, while computing energy function minimization or learning classifiers to find the cropping area through some established rules. However, the subjective task of image cropping by the established rules is not comprehensive enough, and the precision is difficult to meet the requirement of users.

In view of the above, the present invention is particularly proposed.

Disclosure of Invention

The method for automatically cropping the image aims to solve the problems in the prior art, namely, the technical problem of how to improve the robustness and the precision of the automatic cropping of the image is solved.

In order to realize the purpose, the following technical scheme is provided:

an automatic cropping method of an image, the method comprising:

extracting an aesthetic response graph and a gradient energy graph of an image to be cut;

extracting candidate clipping images from the image to be clipped in a dense mode;

screening the candidate clipping image based on the aesthetic feeling response image;

and estimating the composition score of the screened candidate cutting image based on the aesthetic feeling response graph and the gradient energy graph, and determining the candidate cutting image with the highest score as the cutting image.

Further, the extracting of the aesthetic response map and the gradient energy map of the image to be cropped specifically includes:

extracting the aesthetic feeling response graph of the image to be cut by utilizing a deep convolutional neural network and a category response mapping method and adopting the following formula:

wherein the M (x, y) represents an aesthetic response value at a spatial location (x, y); the K represents the total channel number of the characteristic diagram of the last convolutional layer of the deep convolutional neural network; the k represents the kth channel; f is_k(x, y) representsA characteristic value of the k-th channel at the spatial position (x, y); said w_kRepresenting the result of pooling the characteristic diagram of the kth channel to a weight of a high aesthetic sense category;

and smoothing the image to be cut, and calculating the gradient value of each pixel point to obtain the gradient energy map.

Further, the deep convolutional neural network is trained by the following method:

arranging a convolution layer on the bottom layer of the deep convolutional neural network structure;

pooling each feature map into a point by a global average pooling method after the last convolutional layer of the deep convolutional neural network structure;

connect the same number of fully connected layers and loss functions as the aesthetic quality classification categories.

Further, the screening the candidate cropped image based on the aesthetic response map specifically includes:

calculating an aesthetic retention score of the candidate cropped image by the following formula:

wherein, the S_a(C) Representing the aesthetic retention score of the candidate cropped image; the C represents the candidate cropping image; the (i, j) represents a position of a pixel; the I represents an original image; a is described_(i,j)Representing an aesthetic response value at the (i, j) location;

sorting all candidate cutting images from large to small according to the aesthetic feeling retention scores;

and selecting a part of candidate clipping images with the highest scores.

Further, the estimating, based on the aesthetic response map and the gradient energy map, a composition score of the screened candidate cropped image, and determining the candidate cropped image with the highest score as the cropped image specifically includes:

establishing a composition model based on the aesthetic response map and the gradient energy map;

and estimating the composition score of the screened candidate clipping image by using the composition model, and determining the candidate clipping image with the highest score as the clipping image.

Further, the composition model is obtained by:

establishing a training image set based on the aesthetic response map and the gradient energy map;

marking the training image in aesthetic quality category;

training a deep convolutional neural network by using the labeled training image;

aiming at the marked training image, extracting the space pyramid characteristics of the aesthetic feeling response image and the gradient energy image by using a trained deep convolution neural network;

splicing the extracted spatial pyramid features together;

and training by using a classifier, and automatically learning a composition rule to obtain a composition model.

The embodiment of the invention provides an automatic image cutting method. The method comprises the following steps: extracting an aesthetic response graph and a gradient energy graph of an image to be cut; intensively extracting candidate clipping images from the image to be clipped; screening candidate cutting images based on the aesthetic feeling response graph; and estimating the composition score of the screened candidate trimming images based on the aesthetic feeling response graph and the gradient energy graph, and determining the candidate trimming image with the highest score as the trimming image. The scheme utilizes the aesthetic feeling response graph to explore the aesthetic feeling influence area of the picture, utilizes the aesthetic feeling response graph to determine the aesthetic feeling retaining part, thereby retaining the high aesthetic feeling quality of the cut image to the maximum extent, and simultaneously utilizes the gradient energy graph to analyze the gradient distribution rule and evaluates the composition score of the cut image based on the aesthetic feeling response graph and the gradient energy graph. The embodiment of the invention makes up the defect of image composition expression and solves the technical problem of how to improve the robustness and precision of automatic image cutting. The embodiment of the invention can be applied to a plurality of fields related to automatic image cutting, including image editing, photography, image repositioning and the like.

Drawings

FIG. 1 is a flow chart of an automatic image cropping method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a deep convolutional neural network according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of an image to be cropped according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of a cropped image according to an embodiment of the present invention.

Detailed Description

The technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is to be understood that the described embodiments are merely a few, and not all, of the embodiments of the present application. All other equivalent or obviously modified embodiments obtained by the person skilled in the art based on the embodiments in this application fall within the scope of protection of the invention without inventive step. The embodiments of the invention can be embodied in many different ways as defined and covered by the claims.

The deep learning is developed rapidly and has good effect in various fields. The embodiment of the invention considers that the deep learning is utilized to automatically learn the important influence area on the image cutting so as to automatically and comprehensively learn the rule, thereby keeping the high aesthetic feeling area as much as possible during the cutting.

Therefore, the embodiment of the invention provides an automatic image cropping method. Fig. 1 exemplarily shows a flow of an image automatic cropping method. As shown in fig. 1, the method may include:

s100: and extracting an aesthetic response graph and a gradient energy graph of the image to be cut.

Specifically, the step may include:

s101: extracting an aesthetic feeling response graph of the image to be cut by utilizing a deep convolutional neural network and a category response mapping method and adopting the following formula:

wherein M (x, y) represents an aesthetic response value at the spatial location (x, y); k represents the total channel number of the feature diagram f of the last convolutional layer of the trained deep convolutional neural network; k represents the kth channel; f. of_k(x, y) represents a characteristic value of the kth channel at a spatial position (x, y); w is a_kAnd representing the result of pooling the feature map of the kth channel to a weight of a high aesthetic class.

The deep convolutional neural network can be trained according to actual needs when the aesthetic feeling response graph is extracted through the steps. The training of the deep convolutional neural network may be performed by:

step 1: and arranging a convolution layer at the bottom layer of the deep convolutional neural network structure.

Step 2: and pooling each feature map into one point by a global average pooling method after the last convolution layer of the deep convolutional neural network structure.

And step 3: a fully connected layer and a loss function are connected, the number of which is the same as the number of aesthetic quality classification categories.

Fig. 2 schematically shows a deep convolutional neural network structure.

Through the steps 1-3, a deep convolutional neural network model under the aesthetic quality classification task can be trained. Then, utilizing a deep convolutional neural network and a class response mapping method which are well trained for an aesthetic quality classification task; and then, by adopting the formula, the aesthetic feeling response graph M of the image to be cut under the high aesthetic feeling category can be calculated.

S102: and smoothing the image to be cut, and calculating the gradient value of each pixel point to obtain a gradient energy map.

S110: and densely extracting candidate clipping images from the image to be clipped.

Here, the candidate cropping window may be extracted densely with a sliding window of all sizes smaller than the image size, and the candidate cropping image may be extracted through the candidate cropping window.

S120: and screening candidate cutting images based on the aesthetic feeling response graph.

Specifically, the step may include:

s121: calculating an aesthetic retention score of the candidate cropped image by the following formula:

wherein S is_a(C) An aesthetic retention score representing a candidate cropped image; c represents a candidate cropped image; (i, j) represents the position of the pixel; i represents an original image; a. the_(i,j)Indicating the aesthetic response value at (i, j).

Through this step, an aesthetic retention model can be constructed. And screening the candidate clipping window through an aesthetic feeling retention model to obtain a candidate window with a higher aesthetic feeling retention score.

S122: and sorting all candidate cutting images from large to small according to the aesthetic feeling retention scores.

S123: and selecting a part of candidate clipping images with the highest scores.

For example: in practical application, the candidate cropping images in the first 10000 candidate cropping windows can be set and reserved.

S130: and estimating the composition score of the screened candidate trimming images based on the aesthetic feeling response graph and the gradient energy graph, and determining the candidate trimming image with the highest score as the trimming image.

Specifically, this step may be realized by step S131 to step S133.

S131: and establishing a composition model based on the aesthetic response diagram and the gradient energy diagram.

In the step, the composition model can be trained according to the actual situation when the composition model is established. In the process of training the composition model, the training data can adopt the image with better composition as a positive sample and the image with composition defect as a negative sample.

The composition model may be trained by:

step a: a training image set is established based on the aesthetic response map and the gradient energy map.

Step b: and marking the training image in an aesthetic quality category.

Step c: and training the deep convolutional neural network by using the marked training image.

The training process in this step may refer to steps 1 to 3, which are not described herein again.

Step d: and aiming at the marked training image, extracting the spatial pyramid characteristics of the aesthetic response image and the gradient energy image by using the trained deep convolution neural network.

Step e: and splicing the extracted spatial pyramid features together.

Step f: and training by using a classifier, and automatically learning a composition rule to obtain a composition model.

The classifier may be, for example, a support vector machine classifier.

S132: and estimating the composition score of the screened candidate clipping image by using the composition model, and determining the candidate clipping image with the highest score as the clipping image.

FIG. 3a schematically shows an image to be cropped; fig. 3b exemplarily shows the cropped image.

The invention will be better illustrated by means of a preferred embodiment.

Step A: and sending the image data set labeled with the aesthetic quality category into a deep convolution neural network for aesthetic quality category model training.

And B: inputting the image data set marked with the composition category into a trained deep convolutional neural network, extracting the characteristic graph of the last convolutional layer, calculating an aesthetic feeling response graph, simultaneously calculating an aesthetic feeling gradient graph, and then training a composition model by adopting a support vector machine classifier.

And C: and extracting an aesthetic feeling response graph and a gradient energy graph for the image to be tested.

The extraction method in this step can refer to the method in the training stage.

Step D: and intensively collecting candidate cutting windows of the images to be tested.

For example, on a 1000 x 1000 image to be tested, a sliding window at intervals of 30 pixels is used for acquisition or extraction.

Step E: and screening candidate clipping windows by using the aesthetic feeling retention model.

In this step, an aesthetic feeling retention model is used to calculate the aesthetic feeling retention scores of the intensively collected candidate clipping windows, and a part of candidate clipping windows with the highest aesthetic feeling classification is screened out, for example: 10000 candidate clipping windows are screened out.

Step F: and evaluating the screened candidate clipping windows by using the composition model.

In the step, a well-trained patterning model in a training stage is collected to evaluate the patterning scores of the screened candidate cutting windows, and the highest score is used as the final cutting window, so that a cutting image is obtained.

In summary, the method provided by the embodiment of the present invention well utilizes the aesthetic response graph and the gradient energy graph to maximally retain the aesthetic quality and the composition rule of the image, so as to obtain the more robust and higher-precision automatic cropping performance of the image, thereby further explaining the effectiveness of the aesthetic response graph and the gradient energy graph on the automatic cropping of the image.

Although the method provided by the embodiment of the present invention is described in the foregoing sequence, those skilled in the art will understand that, in order to achieve the effect of the embodiment, the method may also be performed in different sequences, such as in parallel or in reverse order, and these simple changes are all within the protection scope of the present invention.

The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims

1. An automatic image cropping method, characterized in that the method comprises:

estimating composition scores of the screened candidate cutting images based on the aesthetic feeling response graph and the gradient energy graph, and determining the candidate cutting images with the highest scores as cutting images;

the method for extracting the aesthetic response map and the gradient energy map of the image to be cut specifically comprises the following steps:

wherein the M (x, y) represents an aesthetic response value at a spatial location (x, y); the K represents the total channel number of the characteristic diagram of the last convolutional layer of the deep convolutional neural network; the k represents the kth channel; f is_k(x, y) represents a feature value of the k-th channel at the spatial location (x, y); said w_kRepresenting the result of pooling the characteristic diagram of the kth channel to a weight of a high aesthetic sense category;

2. The method of claim 1, wherein the deep convolutional neural network is trained by:

3. The method of claim 1, wherein the filtering the candidate cropped images based on the aesthetic response map comprises:

and selecting a part of candidate clipping images with the highest scores.

4. The method according to claim 1, wherein the estimating composition scores of the screened candidate cropped images based on the aesthetic response map and the gradient energy map, and determining the candidate cropped image with the highest score as the cropped image comprises:

5. The method of claim 4, wherein the composition model is obtained by:

marking the training image in aesthetic quality category;

splicing the extracted spatial pyramid features together;