Automatic picture clipping method
Technical Field
The invention relates to the technical field of deep learning and computer vision, in particular to an automatic picture clipping method.
Background
In the field of image printing, when the length-width ratio of a picture is different from the length-width ratio of the actually printed picture, the picture needs to be preprocessed. The pretreatment principle is as follows: firstly, trying to cut the picture according to the printing size, judging whether the cut picture maintains the figure integrity, the mark building integrity, the special character integrity, the composition integrity and the like of the original picture, if the integrity of the original picture can be maintained, cutting can be performed, otherwise, blank processing is needed, wherein the blank refers to that the original picture is not cut, and white pixels are filled in the vacant positions of photographic paper during printing. All pictures are preferably clipped, and the picture can be left blank only if the integrity of the pictures is damaged after clipping.
In the process, a large amount of manpower is consumed in the manual cutting mode, and meanwhile, the manual cutting cost is increased along with the increase of the pictures to be printed. Because picture clipping is a highly subjective task, and the existing rules hardly consider all influencing factors, an effective automatic picture clipping method can reduce manual operation to a great extent on one hand and can improve the clipping speed of pictures on the other hand.
The existing automatic picture clipping methods mainly comprise three methods. The first is an automatic cropping method (CN 104392202 a) based on picture recognition, which performs face recognition first, and performs background recognition if there is no face in the picture, and then finds out the main body part of the picture that needs to be preserved. The second method (CN 106650737 a) extracts an aesthetic response map and a gradient energy map of an image to be cropped, extracts candidate cropped images to be densely cropped, screens the candidate cropped images based on the aesthetic response map, estimates composition scores of the screened candidate cropped images based on the aesthetic response map and the gradient energy map, and determines the candidate cropped image with the highest score as the cropped image. The third method is a picture automatic clipping method (CN 108154464 a) based on reinforcement learning, which uses a reinforcement learning model to perform feature extraction on a current clipping window to obtain local features, and splices the local features with global features of a picture to be clipped to obtain a new feature vector, uses the new feature vector as current observation information, uses historical observation information obtained by the reinforcement learning model and the current observation information to be combined as current state representation, and performs a clipping action on the picture to be clipped in a serialization manner according to a clipping strategy and the current state representation to obtain a clipping result.
The three methods are only suitable for the application field of the method, but are not suitable for the picture printing field because the contents of the pictures to be printed are complex and various, and the pictures containing the faces possibly only contain partial faces or have hair ornaments, hats and the like on the heads; characters, dates, human gestures, landmark buildings and the like can exist in the picture, the areas can not be cut during picture printing, and the cutting position cannot be correctly determined only by human face detection or background recognition; the second and third methods have no requirement on the size of the finally generated cutting frame, can be adjusted at will according to the detection result, and do not meet the requirement on cutting the picture in the field of image development.
Disclosure of Invention
The invention aims to solve the problems and provides an automatic picture clipping method which has the advantages of automatically clipping pictures, being suitable for outputting clipping results of any size and automatically judging whether the pictures are to be clipped or left white and carrying out left white processing on pictures which cannot be clipped.
In order to achieve the purpose, the invention adopts the following technical scheme:
an automatic picture clipping method comprises the following specific steps:
calculating the picture proportion by utilizing the height and width information of the picture to be processed;
setting the output size of the picture to be processed, and determining the output proportion of the picture to be processed according to the output size;
determining and calculating the size of the cropping frame according to the picture proportion and the output proportion;
judging whether the picture proportion of the picture to be processed and the required output proportion are proper or not, if not, executing reduction or enlargement operation on the picture to be processed, and if so, not executing the operation;
predicting the category of the picture to be processed by using the trained picture classification model;
detecting a salient region of the picture to be processed by using a trained saliency prediction model according to the category of the picture to be processed, and extracting a saliency characteristic map of the picture to be processed;
distinguishing a more important region and a less important region according to the significant feature map of the picture to be processed to obtain a minimum circumscribed rectangle of the more important region;
judging whether the minimum circumscribed rectangle is covered by the cutting frame or not, and executing cutting or blank leaving processing according to a judgment result; if the clipping processing is executed, the picture salient feature map is scanned by using a clipping frame, and the position of the clipping frame in the picture to be clipped is further accurately adjusted;
and outputting a cutting result.
The specific method for judging whether the picture proportion of the picture to be processed and the required output proportion are appropriate is as follows:
setting the fluctuation range of the output proportion, judging whether the picture proportion of the picture to be processed is in the fluctuation range of the output proportion, if the picture proportion of the picture to be processed is not in the fluctuation range of the output proportion, reducing or amplifying the picture to be processed according to the picture proportion of the picture to be processed, and uniformly scaling the longer sides of the picture to be processed into a fixed size, otherwise, not executing the operation of reducing or amplifying.
The specific method for distinguishing the more important region from the less important region according to the significant feature map of the picture to be processed comprises the following steps:
and setting an image binarization threshold, wherein the area larger than the set threshold is a more important area, and the area is a less important area.
And judging whether the minimum circumscribed rectangle is covered by the cutting frame, wherein the principle of executing cutting or leaving white processing is as follows:
if the minimum circumscribed rectangle is not covered by the cutting frame, the picture to be processed is not cut, and the white area is filled on the left side, the right side or the upper side and the lower side of the picture to be processed by comparing the picture proportion and the washing proportion to obtain the output size, namely the white leaving operation; and if the minimum circumscribed rectangle is covered by the cutting frame, executing cutting operation.
When the cutting operation is executed, the method for further accurately adjusting the position of the cutting frame in the picture to be cut comprises the following steps:
scanning a salient feature map of the picture to be processed by using the cropping frame;
scanning each position of the salient feature map by the cutting frame to ensure that the more important area is completely contained in the cutting frame;
calculating a saliency Score of the cut frame at each position using formula (1);
under the condition that the cutting frame contains the more important area, w1 and w2 represent the transverse variable range of the cutting frame; h1, h2 represents the longitudinal variable range of the crop box; i and j respectively represent the horizontal and vertical coordinate positions of the image to be processed;
and determining the position with the highest score as the final position of the cutting frame.
The invention has the beneficial effects that:
the method is used for cutting the picture, manual intervention is not needed, the picture can be automatically cut, and the method is suitable for outputting the cutting result of any size;
automatically judging whether the picture needs to be cut or left white, and carrying out left white processing on the picture which cannot be cut;
according to the picture contents with different styles, different cutting strategies are adopted to meet the picture cutting requirements;
the image classification model based on deep learning and the significance detection model are effectively combined, and the problem that the cutting rule is difficult to make in a unified mode due to the fact that the content of the image is rich is solved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a modeling flow diagram of the present invention;
FIG. 3 is a schematic diagram of cropping of a picture to be processed;
FIG. 4 is a schematic illustration of a blank of a picture to be processed;
FIG. 5 shows an original picture (one) in the example;
FIG. 6 is a salient feature diagram of the original picture (I);
FIG. 7 shows an original picture (II) in the example;
FIG. 8 is a significant characteristic diagram of the original picture (II);
FIG. 9 shows an original picture (III) in the example;
fig. 10 is a salient feature diagram of the original picture (iii).
Detailed Description
The invention is further described with reference to the following figures and examples.
The image classification refers to distinguishing targets of different categories according to different characteristics reflected in image information; the saliency detection refers to extracting the salient region in the image by simulating the visual characteristics of a human through an algorithm, namely when the salient region faces a scene, automatically processing the interested regions and selectively ignoring the non-interested regions, wherein the interested regions are called as the saliency regions. Respectively collecting training data sets and labeling by adopting a supervised learning method, designing a network structure to learn model parameters from the prepared training data sets, and predicting a result according to the model when new data comes.
As shown in fig. 2, the specific method for constructing the image classification model includes:
s1, selecting picture data, manually screening and classifying, constructing a data set, and manually determining the number of categories and the picture types contained in each category;
s2, constructing a picture classification model by taking a VGG16 model as a basic network model classification framework; VGG16 networks are classical classification networks well known in the art;
s3, training the constructed picture classification model by using the constructed data set, and calculating the probability of each category by using a softmax function, wherein the softmax function can be expressed as:
wherein z isiRepresenting the output of the ith neuron of the last layer of the image classification model, K is the number of prediction categories, and p (i) represents the probability of predicting the image to be processed into the ith category; z is a radical ofkRepresenting the last layer of the picture classification modelThe output of k neurons;
and S4, stopping training and outputting the picture classification model when the softmax function loss of the picture classification model is reduced to a set image binarization threshold value.
The specific method for constructing the significance prediction model comprises the following steps:
s1, selecting picture data, manually screening and classifying, constructing a data set, and manually defining a significant area of a picture according to the category to which the picture belongs;
s2, marking the picture at the pixel level manually according to different categories;
s3, constructing a significance prediction model by taking a VGG16 model as a basic network model classification framework; VGG16 networks are classical classification networks well known in the art;
s4, training the designed significant detection network by using the constructed data set, and determining a significant area of the picture by using a softmax function, wherein the softmax function can be expressed as:
wherein z isiRepresenting the output of the ith position neuron of the last layer of the significance prediction model, wherein K is the number of image pixels, and p (i) represents the probability that the ith position pixel of the picture to be processed is predicted as a significant region;
and S5, reducing the softmax function loss of the model to be subjected to significance prediction to a set image binarization threshold value, stopping training and outputting the significance prediction model.
As shown in fig. 1, an automatic picture cropping method specifically includes the steps of:
calculating the picture proportion by utilizing the height and width information of the picture to be processed;
setting the output size of the picture to be processed, and determining the output proportion of the picture to be processed according to the output size;
determining and calculating the size of the cropping frame according to the picture proportion and the output proportion;
judging whether the picture proportion of the picture to be processed and the required output proportion are proper or not, setting the fluctuation range of the output proportion, judging whether the picture proportion of the picture to be processed is in the fluctuation range of the output proportion or not, if the picture proportion of the picture to be processed is not in the fluctuation range of the output proportion, reducing or amplifying the picture to be processed according to the picture proportion of the picture to be processed, and uniformly scaling the longer sides to be fixed size, otherwise, not executing the operation of reducing or amplifying;
predicting the category of the picture to be processed by using the trained picture classification model;
according to the category of the picture to be processed, detecting a salient region of the picture to be processed by using a trained saliency prediction model, and extracting a saliency map of the picture to be processed, as shown in fig. 5 to 10;
distinguishing a more important region and a less important region according to the significant feature map of the picture to be processed, setting an image binarization threshold value by adopting an image binarization method, wherein the region larger than the set threshold value is the more important region, and otherwise, the region is the less important region, and obtaining a minimum circumscribed rectangle of the more important region; the minimum circumscribed rectangle refers to the maximum range of a plurality of two-dimensional shapes, namely, a rectangle with the lower boundary determined by the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate of each vertex of a given two-dimensional shape, and the image binarization threshold is set by using an image binarization method to be the prior art;
judging whether the minimum circumscribed rectangle is covered by the cutting frame or not, and executing cutting or blank leaving processing according to a judgment result; if the clipping processing is executed, the picture salient feature map is scanned by using a clipping frame, and the position of the clipping frame in the picture to be clipped is further accurately adjusted; if the minimum circumscribed rectangle is not covered by the cutting frame, the picture to be processed is not cut, and the white areas are filled on the left side, the right side or the upper side and the lower side of the picture to obtain the output size by comparing the picture proportion with the washing proportion, namely, the white leaving operation is carried out; if the minimum circumscribed rectangle is covered by the trimming frame, performing trimming operation, as shown in fig. 3 and 4;
and outputting a cutting result.
When the cutting operation is executed, the method for further accurately adjusting the position of the cutting frame in the picture to be cut comprises the following steps:
scanning a salient feature map of the picture to be processed by using the cropping frame;
scanning each position of the salient feature map by the cutting frame to ensure that the more important area is completely contained in the cutting frame;
calculating a saliency Score of the cut frame at each position using formula (1);
under the condition that the cutting frame contains the more important area, w1 and w2 represent the transverse variable range of the cutting frame; h1, h2 represents the longitudinal variable range of the crop box; i and j respectively represent the horizontal and vertical coordinate positions of the image to be processed;
and determining the position with the highest score as the final position of the cutting frame.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.