CN112839167A

CN112839167A - Image processing method, image processing device, electronic equipment and computer readable medium

Info

Publication number: CN112839167A
Application number: CN202011613934.4A
Authority: CN
Inventors: 尹康
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-25
Anticipated expiration: 2040-12-30
Also published as: CN112839167B

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a computer readable medium, which relate to the technical field of images, and the method comprises the following steps: acquiring an image to be processed; cutting the image to be processed according to a preset proportion based on the size parameter of the image to be processed to obtain a plurality of sub-images; determining an aesthetic score for each of the sub-images through a pre-trained evaluation model; determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image. The image obtained by the clipping mode of the application is more and not limited to the image which necessarily comprises the target subject, and because the aesthetic scores can evaluate the aesthetic sense of the sub-images, the reference sub-images are richer by searching the target image for each aesthetic score in the sub-images, and the determination of the target sub-images by adopting the evaluation model is more reasonable in terms of the aesthetic sense of the mechanized evaluation image compared with the method only adopting a few rules.

Description

Image processing method, image processing device, electronic equipment and computer readable medium

Technical Field

The present application relates to the field of image technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer-readable medium.

Background

The current commonly used picture intelligent cutting method is divided into two stages, the first step is to detect the position of a subject (such as main characters, main buildings and the like) in the picture, and the second step is to select a sub-region which accords with the shooting rule (such as a 'trisection method') around the detected subject to cut so as to select the picture with rich aesthetic feeling. However, the human perception system is quite complex, and cannot reflect the aesthetic feeling of the user by using a few simple preset rules, so that the picture selected by the method is not accurate enough.

Disclosure of Invention

The application provides an image processing method, an image processing device, an electronic device and a computer readable medium, so as to overcome the defects.

In a first aspect, an embodiment of the present application provides an image processing method, including: acquiring an image to be processed; cutting the image to be processed according to a preset proportion based on the size parameter of the image to be processed to obtain a plurality of sub-images; determining an aesthetic score for each of the sub-images through a pre-trained evaluation model; determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image.

In a second aspect, an embodiment of the present application further provides an image processing apparatus, including: the device comprises an acquisition unit, a determination unit, an evaluation unit and a processing unit. And the acquisition unit is used for acquiring the image to be processed. And the determining unit is used for cutting the image to be processed according to a preset proportion to obtain a plurality of sub-images based on the size parameters of the image to be processed. An evaluation unit for determining the aesthetic score of each of the sub-images by a pre-trained evaluation model. A processing unit for determining a target sub-image from a plurality of said sub-images based on the aesthetic score of each of said sub-images.

In a third aspect, an embodiment of the present application further provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the above-described method.

In a fourth aspect, the present application also provides a computer-readable storage medium, where a program code executable by a processor is stored, and when executed by the processor, the program code causes the processor to execute the above method.

According to the image processing method, the image processing device, the electronic equipment and the computer readable medium, firstly, an image is cut, namely a plurality of sub-images are obtained by cutting the image to be processed according to the size parameters of the image to be processed and the preset proportion, compared with the method that a target subject in the image is extracted and then the image is cut according to the position of the subject, the image obtained by the cutting mode is more and not limited to include the target subject, and then the aesthetic score of each sub-image is determined through a pre-trained evaluation model; determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image. Since the aesthetic scores can evaluate the aesthetic feeling of the sub-images, the reference sub-images are richer by searching the target sub-images for each aesthetic score in the sub-images, and the target sub-images are determined by adopting an evaluation model, so that the method is more reasonable in terms of mechanically evaluating the aesthetic feeling of the images compared with a method of only using a few rules.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating an application scenario provided by an embodiment of the present application;

FIG. 2 is a flow chart of a method of image processing according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an image preview interface provided by an embodiment of the application;

FIG. 4 is a schematic diagram illustrating image segmentation provided by embodiments of the present application;

FIG. 5 is a flow chart of a method of image processing according to another embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a model training process provided by an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating candidate regions provided by an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating coordinates of candidate regions provided by an embodiment of the present application;

fig. 9 shows a block diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 shows a block diagram of an electronic device provided in an embodiment of the present application;

fig. 11 illustrates a storage unit for storing or carrying program codes for implementing an image processing method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

With increasingly powerful image capturing capabilities given to portable mobile devices such as mobile phones, recording life drops by taking pictures at will has become a daily habit of ordinary people. However, due to lack of professional photographic knowledge and skills, pictures taken by ordinary people often have defects in color, composition and other elements, and therefore, much time is generally spent on post-processing to meet the requirements of social sharing. In this context, the intelligent cropping algorithm of pictures is beginning to be concerned by the industry, and its task can be summarized as: an original picture is input, the optimal subarea of the original picture is output according to a certain measurement rule, and other parts are cut out.

The current commonly used picture intelligent cutting method is divided into two stages, the first step is to detect the position of a subject (such as a main person, a main building and the like) in a picture, and the second step is to select a sub-area which accords with a shooting rule (such as a 'trisection method') around the detected subject to cut.

Specifically, a target subject in the image is identified and classified, and specifically, the target subject may be obtained by using a target detection algorithm or a target extraction algorithm. Specifically, all contour line information in the image acquired by the image acquisition device is extracted through a target extraction or clustering algorithm, and then the category of the object corresponding to each contour line is found in a pre-learned model, wherein the learning model uses a matching database, and a plurality of contour line information and the category corresponding to each contour line information are stored in the matching database, wherein the categories include human bodies, animals, mountains, rivers, lake surfaces, buildings, roads and the like.

For example, when the target body is an animal, the contour of the target body and characteristic information, such as the ear, corners, ears and limbs, may be collected. When the target subject is a human body, the human face feature extraction may be performed on the target subject, wherein the method of extracting the human face feature may include a knowledge-based characterization algorithm or a characterization method based on algebraic features or statistical learning. In addition, when the target subject is a wide landscape such as a lake or a continuous mountain, grassland, or the like, it is possible to determine whether or not the target subject has a long horizontal line, that is, a horizon line, and if the target subject has a horizon line, it is determined that the target subject has a wide landscape. It is needless to say that whether or not the target subject is a landscape may be determined by color, and for example, when green or khaki is detected in a relatively concentrated area, it is determined that the target subject is a landscape or a desert. Similarly, the detection of other objects such as rivers, buildings, roads, etc. can also be performed by the above detection algorithm, and is not described herein again.

After the target subject is determined in the image acquired by the image acquisition device, the composition mode is determined according to the target subject, specifically, the category of the target subject may be determined, and then the composition mode is determined according to the category of the target subject. Specifically, a corresponding relationship between the categories and the composition modes is preset, and the corresponding relationship includes a plurality of categories and the composition modes corresponding to each category.

For example, a trisection mapping method is adopted for a landscape with a wide space, a symmetrical mapping method is adopted for a building, a diagonal mapping method is adopted for a dynamic object (human or animal), and an S-shaped mapping method is adopted for a path and a river. In addition, multiple composition methods such as a vertical composition method and an X-type composition method may also be included, and specifically, if the composition methods corresponding to different types of target subjects are predetermined, the composition auxiliary pattern corresponding to the composition method and the auxiliary display position of the composition auxiliary pattern within the image preview interface are determined, so that a reasonable composition for guiding the user can be obtained.

However, the inventor has found in research that the above method has two major problems, one is that the most aesthetic sub-region in the picture does not necessarily include the so-called main subject, the other is that the photographing rule is an objective standard summarized by the photographer, and the so-called "aesthetic sense" is a subjective feeling, and there is no strict correspondence between the two. Therefore, the above method results in that the extracted so-called aesthetic image is merely clipped according to the predetermined rule, and the extracted image is too stylized and mechanized to really meet the aesthetic requirement of the user.

Therefore, in order to solve the above-mentioned drawbacks, embodiments of the present application provide an image processing method, an apparatus, an electronic device, and a computer-readable medium, in which a target sub-image is determined from a plurality of sub-images based on the aesthetic score of each sub-image, so that the referenced sub-image is richer, and the determination of the target sub-image by using an evaluation model is more reasonable than the determination of the aesthetic sense of a mechanized evaluation image by only using several rules.

To facilitate understanding of the embodiments of the present application, an application scenario that may be used in the embodiments of the present application is described first, as shown in fig. 1, fig. 1 illustrates an application scenario provided in the embodiments of the present application, where a server 10 and a user terminal 20 are located in a wireless network or a wired network, and data interaction between the server 10 and the user terminal 20 is enabled.

In some embodiments, the user logs in through an account at the user terminal 20, and all information corresponding to the account can be stored in the storage space of the server 10. The server 10 may be an individual server, or a server cluster, or a local server, or a cloud server.

Specifically, the user terminal 20 may be a terminal used by a user, through which the user views an image, and may be a device for capturing an image by the user, and in some embodiments, an image capturing device is disposed in the user terminal. The server 10 may store pictures in the user terminal 20, and in some embodiments, the server 10 may be configured to train the model or algorithm involved in the embodiment of the present application, and in addition, the server 10 may also migrate the trained model or algorithm to the user terminal, and of course, the user terminal 20 may also directly train the model or algorithm involved in the embodiment of the present application. Specifically, in the embodiments of the present application, the execution subject of each method step in the embodiments of the present application is not limited.

Referring to fig. 2, fig. 2 shows an image processing method provided in the embodiment of the present application, where an execution subject of the method may be the server or the user terminal, and specifically, the method includes: s201 to S204.

S201: and acquiring an image to be processed.

As an embodiment, the image to be processed may be an image captured by the user terminal within a camera application. Specifically, the user uses a camera application in the user terminal to acquire an image, and the photographed image is taken as an image to be processed after the photographing of the image is completed, so that a reasonable and aesthetic image can be automatically cut for the user according to the image photographed by the user when the photographing of the user is completed.

As another embodiment, the image to be processed may also be a preview image of a camera application of the user terminal, that is, an image captured by a camera of the user terminal for displaying in a preview interface of the camera application. As shown in fig. 3, fig. 3 is a schematic diagram of an image preview interface provided in an embodiment of the present application, and specifically, the image preview interface corresponds to a camera application installed in a mobile terminal and used for starting a camera of the mobile terminal or a rear camera and controlling the started camera to take a picture, and when a user starts the camera application, the image preview interface is displayed on a screen of the mobile terminal. As shown in fig. 1, the image preview interface is used to display a preview image captured by a camera, which may be an image captured by a first camera or a second camera, and the user can adjust camera parameters including the focal length of the camera, filters, ISO values, EV values, and other parameters within the interface of the camera application.

In addition, the user can use the image preview interface as a view frame of the camera, and can place the object to be photographed in the view frame, that is, can observe the object to be photographed in the preview image displayed on the image preview interface. Then, the user presses a shooting button or inputs a shooting instruction in other modes, and the camera acquires a picture of the object to be shot.

S202: and cutting the image to be processed according to a preset proportion based on the size parameter of the image to be processed to obtain a plurality of sub-images.

The size parameter of the image to be processed may be an image size of the image to be processed, for example, if the resolution of the image to be processed is W × H, the size parameter is width W and height H. As an implementation manner, based on the size parameter of the image to be processed and cutting the image to be processed according to a preset ratio, where the preset ratio may be set according to actual requirements, in some embodiments, the number X of sub-images to be acquired may be predetermined, and then, the image to be processed is cut according to the number X, and assuming that X is 2, the sizes of the two obtained images are (W/2 × H), respectively.

As an embodiment, all the sub-images cover the whole area of the image to be processed. Specifically, it is assumed that each of the sub-images corresponds to a sub-image region in the image to be processed, and all the sub-image regions constitute the entire region of the image to be processed. For example, as shown in fig. 4, four broken lines segment the image to be processed into 8 pictures. The sub-areas in the to-be-processed images corresponding to the 8 images together form a total image area of the to-be-processed images, so that the image data of any one pixel unit of the to-be-processed images is located in at least one sub-image, and the plurality of sub-images can cover the to-be-processed images as a whole. In some embodiments, the sub-images may be equally divided regions of the image to be processed, i.e. the sub-images are all the same size. For example, the size of the image to be processed is 1280 × 720 pixels, the number of the restricted candidate regions (i.e., the number of the sub-images) is a fixed value M, and the restricted candidate regions correspond to fixed sub-images of the image to be processed (for example, the restricted M is 2, the 1 st candidate region corresponds to the leftmost 720 × 720 region of the image to be processed, and the 2 nd candidate region corresponds to the rightmost 720 × 720 region of the image to be processed).

As another embodiment, the to-be-processed image may be further cut by using a default box generation rule in the target detection framework SSD to obtain a plurality of sub-images.

In addition, the above-mentioned scaling may be equal-scale cutting, and may be non-equal-scale cutting, for example, using dense sampling to approximately traverse the original image to obtain a plurality of sub-images for the image to be processed.

In addition, the sub-image may be an image with an already formed image format, or may be sub-image data obtained by sampling image data corresponding to an image to be processed, for example, the image data of the image to be processed is a matrix L, where L is an image of each image of the image to be processedPixel value of a pixel point, each sub-image may be R_iAnd the value of i is 0 to M, wherein M is the number of the sub-images. R of each sub-image_iThe sub-matrix corresponding to the matrix L formed by partial pixel points in the matrix L is represented.

As an embodiment, the sub-image does not necessarily include the target subject or a complete image of the target subject, and the target subject may be an object in the image to be processed, such as a person, a building, an animal, and the like, and the image of the target subject may be distributed in different sub-images, that is, the image cropping manner provided by the embodiment of the present application does not necessarily include the target subject as the object, so that the cropped image does not necessarily include the target subject or include the complete target subject, that is, the cropped image obtained by cropping the image around the target subject includes the target subject.

S203: determining an aesthetic score for each of the sub-images by a pre-trained evaluation model.

The aesthetic score corresponds to a user's aesthetic perception of the image, which may include, in particular, the perception of composition, hue, light, and the like. As an embodiment, the evaluation model may be trained in advance, so that the trained model can determine the score, i.e., the aesthetic score, of each sub-image based on the perception of the user. Specifically, the evaluation model may be trained based on a sample image, where the sample image is a sample labeled by a user, that is, the user labels the image based on his own perception, so that the trained evaluation model can obtain the aesthetic score of the sub-image based on the perception of the user.

In one embodiment, the evaluation model may be a Neural Network (NN) model or a Convolutional Neural Network (CNN) model based on deep learning. Specifically, no limitation is made herein.

S204: determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image.

As an embodiment, the higher the aesthetic score of the sub-image, the more aesthetically pleasing the user is to characterize the sub-image. For example, assuming that the aesthetic perception of the user corresponds to an aesthetic coefficient, a higher aesthetic coefficient of an object indicates a higher aesthetic preference of the user for the object, the aesthetic coefficient is related to an aesthetic view of the user or a value view for the object, and the aesthetic perception of each person is different, and thus, the evaluation model may correspond to a specific user, i.e., the evaluation models corresponding to different users may be different. As an embodiment, the higher the aesthetic score, the higher the aesthetic coefficient, and the lower the aesthetic score, the lower the aesthetic coefficient. In the embodiments of the present application, in some embodiments, the determining the target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image may be performed by using a sub-image with the highest aesthetic score from the plurality of sub-images as the target sub-image. In other embodiments, the target sub-image may be determined from a plurality of sub-images based on the aesthetic score of each sub-image by sorting all sub-images according to the aesthetic score from high to low to obtain a sequence, and using the first number of sub-images ranked top in the sequence as the target sub-image.

Referring to fig. 5, fig. 5 shows an image processing method provided in the embodiment of the present application, where an execution main body of the method may be the server or the user terminal, and specifically, the method includes: s501 to S507.

S501: obtaining a first sample image set, wherein each first sample image in the first sample image set corresponds to a first scoring value, and the first scoring value is used for representing the aesthetic feeling of a user.

In one embodiment, the first sample image set may be a sample image labeled by a user for each first sample image in advance, specifically, the user sets a first score value for each first sample image in advance, and then a plurality of first sample images labeled with the first score values are used as the first sample image set. The aesthetic feeling of different users is different, so that the first sample image sets corresponding to different users are different, and therefore the first sample image sets correspond to the identity information of the users, and the identity information of the users can be the terminal identification of the user terminal used by the users, and can also be biometric information such as fingerprints or facial features of the users. The terminal identifier of the user terminal may be a processor identifier of the user terminal, a MAC address, or other unique identifier that can be used to characterize the identity of the user terminal.

As an embodiment, the purpose of the evaluation model is to assign an aesthetic score to the sub-images, and the first sample image set may also be a plurality of sub-images obtained after the cutting of at least one image, so as to avoid that the number of the sub-images possibly obtained by cutting of one image is too large, and if each sub-image is labeled by a user, the workload of the user may be increased, therefore, when determining the first sample image set corresponding to the target user, it may be determined whether a reference user of the target user exists, and the reference user corresponds to the first sample image set, and if so, the first sample image set of the reference user is taken as the first sample image set corresponding to the target user.

Specifically, the target user and the reference user satisfy a preset condition. In some embodiments, the preset condition may be that the intimacy degree between the target user and the reference user is greater than a specified value, and in other embodiments, the preset condition may be that the user relationship between the target user and the reference user is a specified relationship, where the specified relationship may be a couple relationship, a lover relationship, or a close friend relationship, and the close friend relationship also indicates that the intimacy degree is greater than the specified value.

The server records the friend relationship of the user, and the friend relationship of the user comprises the user identification of each user belonging to the friend relationship with the user. The intimacy degree between the user and the friend can reflect the interaction frequency between the user and the friend and the intimacy of the friend relationship, which is equivalent to the classification of the social relationship of the user.

As one implementation mode, the degree of closeness between the user and each friend can be determined through social information of the user. The social information of the user comprises interaction information of the user and each friend, the interaction information comprises forwarding times, collection times, comment times and grouping information, specifically, the forwarding times, the collection times and the comment times can be times of forwarding, collecting and commenting contents of the friend issued by the user, and the grouping information can be keywords of a plurality of groups established by the user and user identifications in the groups.

As an implementation manner, parameters may be set for the forwarding times, the collection times, the comment times, and the grouping information, that is, the parameters respectively include a first parameter, a second parameter, a third parameter, and a fourth parameter, where the first parameter corresponds to the forwarding times, the second parameter corresponds to the collection times, the third parameter corresponds to the comment times, and the fourth parameter corresponds to the grouping information. For ease of calculation, the first parameter, the second parameter, the third parameter, and the fourth parameter may all be normalized to a [0,1] range of values.

Specifically, the forwarding times of the user to each user are obtained, that is, the forwarding times corresponding to each friend are determined, then the forwarding times of all friends are added to obtain a total forwarding time, and then the forwarding time of each friend is divided by the total forwarding time to obtain a value as a first parameter of each friend. Similarly, the second parameter and the third parameter can also be obtained.

The fourth parameter may be obtained by determining a group of each friend by the user, determining a keyword of each group, determining a category corresponding to the keyword of the group, and determining a first value corresponding to the keyword of the group according to preset scores corresponding to different categories, where the scores corresponding to different categories are different, and for some categories, for example, for friends, the set score is higher, and for friends who are not grouped belong to a default group, the score of the default group is lower, so that the first value can represent the group where the friend is located, and whether the friend is a group corresponding to a friend with higher affinity of the user. The first value is normalized as a fourth parameter.

Then, the degree of closeness between the user and each friend is obtained according to the first parameter, the second parameter, the third parameter, and the fourth parameter, and as an implementation manner, the first parameter, the second parameter, the third parameter, and the fourth parameter may be summed, and the summed result is used as the degree of closeness.

And searching a reference user meeting preset conditions with the target user, and taking the first sample image set of the reference user as the first sample image set of the target user, so that the labor cost consumed in generating the sample can be reduced.

As another embodiment, the aesthetic model may be trained in advance, and the first sample image is scored based on the trained aesthetic model, so that the number of sample images in the first sample image set can be expanded rapidly, and the labor cost consumed in generating the sample can be reduced. Specifically, the embodiment of acquiring the first sample image set may be that the original image is cropped to obtain a plurality of first sample images; and determining the score value of each first sample image according to a pre-trained aesthetic model to obtain the first sample image set, wherein the aesthetic model is trained on the basis of a second sample image set, each second sample image in the second sample image set corresponds to a second score value, and the second score value is a result of the user labeling the images based on aesthetic perception. In the embodiment of clipping the original image to obtain a plurality of first sample images, the size parameter based on the image to be processed and the embodiment of cutting the image to be processed according to the preset ratio to obtain a plurality of sub-images may be referred to, and details are not repeated here. The aesthetic model may be the same as the network structure in the evaluation model that gives the image an aesthetic score, e.g., both the aesthetic model and the evaluation model may be CNN networks.

Referring to fig. 6 as an embodiment, the process of the aesthetic model and the evaluation model will be described based on fig. 6 in the examples of the present application. Specifically, as shown in fig. 6, the aesthetic CNN is recorded as an aesthetic model, the cropping CNN is an evaluation model, the marked aesthetic data set is a second sample image set, the unmarked cropping data set is a plurality of first sample images obtained by cropping the original image, and the marked cropping data set is a plurality of first sample images corresponding to the first score value, that is, the first sample image set.

As shown in fig. 6, a second sample image set is obtained, specifically, a large number of second sample images are collected, and then the user labels the second sample images, that is, the second sample images are given a second score, specifically, the user can score the second sample images based on his own subjective aesthetic perception, so as to configure a second score for each second sample image. The public data set AVA may be used if time or cost limitations cannot be built in by itself. As an embodiment, the manner of determining the second sample image may also refer to the reference user corresponding to the determination target user, and then, the second sample image set of the reference user is taken as the second sample image set of the target user.

And performing preprocessing operation on the configured second scored sample images to obtain a second sample image set. Specifically, first, a score normalization operation is performed, i.e., the scores of all the second scores are linearly scaled to be between 0 and 10, even if the lowest score is 0, the highest score is 10. And secondly, normalizing the second sample image in scale, and scaling all the second sample images to the first specified size by a bilinear interpolation method. Specifically, the first specified size may be set based on actual use requirements. For example, the specified size may be 224 × 224 pixel size. And finally, executing a specified format storage operation, specifically, storing and converting the processed second sample image and the corresponding second score into a specified format, so that the second sample image and the corresponding second score are convenient to read. Wherein the specified format may be a binary format.

Then, a second predicted value of each of the second sample images is determined by the aesthetic model to be trained. Specifically, second image samples are input into the aesthetic model to be trained, the aesthetic model to be trained outputs a second predicted value corresponding to each second sample image, and the aesthetic model to be trained is trained based on the loss between the second predicted value and a second score value of each second sample image to obtain the trained aesthetic model. Specifically, the loss between the second predicted value and the second score value may be a regression loss between the second predicted value and the second score value as a deviation between the second predicted value and the second score value, i.e., a larger deviation therebetween indicates a lower similarity of the output result of the aesthetic model to the aesthetic perception of the user. And (3) performing iterative optimization on the network based on a random gradient descent method, and solidifying the network parameters after the training is converged to be used as the aesthetic model after the training, so that the aesthetic model after the training can be applied to the training of the evaluation model. Because the process of manually establishing the cutting data set is too tedious, only enough non-labeled data needs to be collected, and then the data is automatically labeled through the aesthetic model.

Specifically, an original image is cut to obtain a plurality of first sample images; determining a first score value of each first sample image according to a pre-trained aesthetic model to obtain the first sample image set, namely, a plurality of first sample images corresponding to the first score values are collectively called the first sample image set, wherein the aesthetic model is trained on the basis of a second sample image set, each second sample image in the second sample image set corresponds to a second score value, and the second score values are the labeling results of the user on the images based on aesthetic perception.

In this embodiment, the number of the original drawings is plural, and the original drawings are images that are not labeled, that is, the original drawings are not provided with a score value corresponding to the aesthetic sense of the user. As shown in fig. 7, several candidate regions may be determined on the original image, each candidate region corresponding to one first sample image. In one embodiment, before determining the plurality of candidate regions corresponding to the original, the original is preprocessed, specifically, a size normalization operation is performed on the original, and all pictures are scaled to a second designated size by a bilinear interpolation method, where the second designated size may be set based on actual usage requirements. For example, the second specified size may be 320 × 320 pixel size.

S502: and determining a first predicted value of each first sample image through an evaluation model to be trained.

S503: training the evaluation model to be trained based on the loss between the first predicted value and the first score value of each first sample image to obtain a trained evaluation model.

Determining the score value of each first sample image according to a pre-trained aesthetic model to obtain a first score value of each first sample image, wherein each first sample image corresponding to the first score value forms the first sample image set, specifically, all candidate regions, namely the first sample images are respectively numbered as 1, 2, 3, …, N, wherein N is a positive integer, and the score s of each first sample image is respectively predicted by the trained aesthetic model₁，s₂，s₃，…，s_NPost-cascading into an aesthetic vector S ═ S₁,s₂,s₃,…,s_N) As the annotation information corresponding to the original image, i.e., as the first score value of each first sample image.

Then, determining a first predicted value of each first sample image through an evaluation model to be trained; training the evaluation model to be trained based on the loss between the first predicted value and the first score value of each first sample image to obtain a trained evaluation model. Specifically, the cross entropy loss between a network prediction vector (namely a first prediction value of each candidate area) and an aesthetic vector S of an original image is taken as an optimization target, iterative optimization is carried out on the network based on a random gradient descent method, and a network parameter is solidified after training convergence to serve as an evaluation model.

As an embodiment, the trained evaluation model may also perform the above-mentioned cutting operation on the image, specifically, input the image to be processed into the evaluation model, and the evaluation model can crop the image to be processed into a plurality of sub-images and determine the aesthetic score of each sub-image.

S504: and acquiring an image to be processed.

As an embodiment, after the to-be-processed image is acquired, size processing is performed on the to-be-processed image, and the to-be-processed image after size processing is used as the to-be-processed image corresponding to the target sub-image required to be determined this time. Specifically, the to-be-processed image is scaled to a third designated size by bilinear interpolation based on the size parameter of the to-be-processed image, in the embodiment of the present application, the third designated size may be set according to actual use, for example, the third designated size may be 320 × 320 pixels.

S505: and cutting the image to be processed according to a preset proportion based on the size parameter of the image to be processed to obtain a plurality of sub-images.

S506: determining an aesthetic score for each of the sub-images by a pre-trained evaluation model.

Inputting an image to be processed into an evaluation model, determining a plurality of candidate regions by the evaluation model, taking image data corresponding to each candidate region as a sub-image, then obtaining the aesthetic score of each sub-image, and obtaining a score vector P ═ P (P)₁,P₂,P₃,…,P_N). Wherein, P₁Aesthetic score for sub-image numbered 1, others P₂,P₃,…,P_NThe meaning is the same. Each sub-image corresponds to a position coordinate, and the position coordinate can be set according to a preset candidate area coordinate generation mode. Specifically, assuming that the coordinates are (x, y, w, h), as shown in fig. 8, the preset candidate region coordinates are generated in the following manner: as an embodiment, x and y in the position coordinates are coordinates of a vertex of the candidate region corresponding to the sub-image, and may be pixel coordinates of a vertex at the upper left corner, w is a width of the candidate region, and h is a height of the candidate region.

S507: determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image.

In one embodiment, the sub-image with the highest american score among the plurality of sub-images is used as the target sub-image. Recording the index corresponding to the maximum value as k, and outputting the coordinates (x) of the kth candidate region according to a preset candidate region generation mode_k,y_k,w_k,h_k) Then, the candidate region corresponding to the sub-image with the highest aesthetic score is recorded as the optimal sub-region, and the image corresponding to the optimal sub-region is taken as the target sub-image.

Then, determining the coordinates of the target sub-image in the image to be processed; and cutting the to-be-processed image based on the coordinates of the target sub-image to obtain a cut image, wherein the cut image is used as a designated image, and the designated image is an image which is finally extracted from the to-be-processed image and meets the aesthetic perception of a user.

Specifically, the designated image is determined based on the coordinates of the sub-image with the highest aesthetic score, and specifically, the designated coordinates (X, Y, W, H) of the optimal sub-area in the image to be processed are calculated according to the following formula (1), and the designated image can be obtained by cutting out and outputting the coordinates.

In the formula (1), Width is the Width of the image after the size processing is performed on the image to be processed, and Height is the Height of the image after the size processing is performed on the image to be processed. 320 in the formula of X and W indicates that the width in the third specified size is 320, and if the width in the third specified size is other values, 320 in the formula of X and W is changed accordingly, 320 in the formula of Y and H indicates that the height in the third specified size is 320, and if the height in the third specified size is other values, 320 in the formula of Y and H is changed accordingly.

As an embodiment, the to-be-processed image is a preview image corresponding to a camera application of the user terminal, when the user uses the user terminal to take a picture, the camera application may be opened at the user terminal, and when the camera application displays a preview interface, the camera is controlled to capture images of other scenes around a scene that the user needs to take, so as to obtain an image in a larger range around the taken scene, and then all the images are combined into one image, for example, in a current synthesis mode of a panoramic image, the synthesized image is used as the to-be-processed image, the above method is executed, so as to obtain a specified image, and the specified image is saved, for example, stored in an album.

In one embodiment, after the specific image is determined, optimization processing may be performed on the specific image to obtain a display effect of high-definition visual effect of the specific image. Specifically, the display enhancement processing is performed on a specified image, wherein the display enhancement processing specifically includes optimizing image parameters for optimizing the image quality of the image. The image quality includes parameters that determine the viewing effect, such as definition, sharpness, lens distortion, color, resolution, color gamut, and purity. The combination of different parameters enables different display enhancement effects, for example, a barrel distortion effect centered on the position of the portrait, and a horror atmosphere effect created by modifying the hue of the current picture to gray.

In the implementation of the present application, the image parameter optimization includes at least one of exposure enhancement, denoising, edge sharpening, contrast increase or saturation increase.

In order to enhance the brightness of an image by enhancing the exposure, the luminance value of an area where the luminance values intersect may be increased by a histogram of the image, or the luminance of the image may be increased by nonlinear superposition, specifically, if I denotes a dark image to be processed and T denotes a comparatively bright image after the processing, the exposure may be enhanced by T (x) I (x) (1-I (x)). Wherein, T and I are both [0,1] valued images. The algorithm can iterate multiple times if one is not effective.

The image data is denoised to remove noise of the image, and particularly, the image is degraded due to interference and influence of various noises in the generation and transmission processes, which adversely affects the processing of subsequent images and the image visual effect. The noise may be of various types, such as electrical noise, mechanical noise, channel noise, and other noise. Therefore, in order to suppress noise, improve image quality, and facilitate higher-level processing, it is necessary to perform denoising preprocessing on an image. From the probability distribution of noise, there are gaussian noise, rayleigh noise, gamma noise, exponential noise and uniform noise.

Specifically, the image can be denoised by a gaussian filter, wherein the gaussian filter is a linear filter, and can effectively suppress noise and smooth the image. The principle of action is similar to that of an averaging filter, and the average value of pixels in a filter window is taken as output. The coefficients of the window template are different from those of the average filter, and the template coefficients of the average filter are all the same and are 1; while the coefficients of the template of the gaussian filter decrease with increasing distance from the center of the template. Therefore, the gaussian filter blurs the image to a lesser extent than the mean filter.

For example, a 5 × 5 gaussian filter window is generated, and sampling is performed with the center position of the template as the origin of coordinates. And substituting the coordinates of each position of the template into a Gaussian function, wherein the obtained value is the coefficient of the template. And then the Gaussian filter window is convolved with the image to denoise the image.

Wherein edge sharpening is used to sharpen the blurred image. There are generally two methods for image sharpening: one is a differential method, and the other is a high-pass filtering method.

In particular, contrast stretching is a method for enhancing an image, and also belongs to a gray scale transformation operation. By stretching the grey value through the grey scale transformation to the whole interval 0-255, the contrast is clearly greatly enhanced. The following formula can be used to map the gray value of a certain pixel to a larger gray space:

I(x,y)＝[(I(x,y)-Imin)/(Imax-Imin)](MAX-MIN)+MIN；

where Imin, Imax are the minimum and maximum grayscale values of the original image, and MIN and MAX are the minimum and maximum grayscale values of the grayscale space to be stretched.

As an embodiment, a region to be optimized may be determined based on a specified image, and the above-described display enhancement processing is to be performed on the optimized region. The region to be optimized may be determined based on the content of the designated image, specifically, a target object within the designated image is determined, the type of the target object is determined, for example, the target object may be male, female, sky, mountain, river, or signboard, the region of the target object is taken as the region to be optimized, and then the region of the target object is optimized based on the type of the target object, for example, the strategies for display enhancement are different for different types of target objects.

As another embodiment, the area to be optimized may be an area determined by a user within a designated image, specifically, after the user selects a requirement for display enhancement processing, for example, when a display enhancement instruction is input, the designated image is displayed on a screen of a user terminal, and the user selects the area to be optimized on the designated image, specifically, the user presses the designated image, the area pressed by the user is taken as the area to be optimized, or the area of a target object within the image pressed by the user is taken as the area to be optimized.

Therefore, in the embodiment of the application, the evaluation index in the traditional intelligent cutting method is replaced by subjective aesthetic feeling of people from an objective shooting principle, so that the output result has more subjective aesthetic feeling. And a single-stage cutting process is designed, the original image is approximately traversed through dense sampling, a main body detection algorithm is not relied on, and the method is more universal.

Referring to fig. 9, which shows a block diagram of an image processing apparatus according to an embodiment of the present application, the apparatus 900 may include: the device comprises an acquisition unit, a determination unit, an evaluation unit and a processing unit.

And the acquisition unit is used for acquiring the image to be processed.

And the determining unit is used for cutting the image to be processed according to a preset proportion to obtain a plurality of sub-images based on the size parameters of the image to be processed.

As an embodiment, all the sub-images cover the whole area of the image to be processed. The size of each of the sub-images is the same.

An evaluation unit for determining the aesthetic score of each of the sub-images by a pre-trained evaluation model.

A processing unit for determining a target sub-image from a plurality of said sub-images based on the aesthetic score of each of said sub-images.

Further, the processing unit is further configured to use the sub-image with the highest american score among the plurality of sub-images as the target sub-image.

Further, the device also comprises an extraction unit, wherein the extraction unit is used for determining the coordinates of the target sub-image in the image to be processed; and cutting the image to be processed based on the coordinates of the target sub-image to obtain a cut image.

Further, the device further comprises a training unit, wherein the training unit is used for acquiring a first sample image set, each first sample image in the first sample image set corresponds to a first scoring value, and the first scoring value is used for representing the aesthetic feeling of the user; determining a first predicted value of each first sample image through an evaluation model to be trained; training the evaluation model to be trained based on the loss between the first predicted value and the first score value of each first sample image to obtain a trained evaluation model.

Specifically, the training unit is further configured to crop the original image to obtain a plurality of first sample images; and determining a first score value of each first sample image according to a pre-trained aesthetic model to obtain the first sample image set, wherein the aesthetic model is trained on the basis of a second sample image set, each second sample image in the second sample image set corresponds to a second score value, and the second score value is a result of the user labeling the images based on aesthetic perception.

Specifically, the training unit is further configured to obtain a second sample image set; determining a second predicted value of each second sample image through an aesthetic model to be trained; training the aesthetic model to be trained based on the loss between the second predicted value and the second score value of each of the second sample images to obtain a trained aesthetic model.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 10, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application. The electronic device 100 in the present application may be the server or the user terminal. In particular, the electronic device may comprise one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall electronic device 100 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the terminal 100 in use, such as a phonebook, audio-video data, chat log data, and the like.

Referring to fig. 11, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 1100 has stored therein program code that can be called by a processor to perform the method described in the above-described method embodiments.

The computer-readable storage medium 1100 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 1100 includes a non-volatile computer-readable storage medium. The computer readable storage medium 1100 has storage space for program code 1110 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 1110 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, comprising:

acquiring an image to be processed;

cutting the image to be processed according to a preset proportion based on the size parameter of the image to be processed to obtain a plurality of sub-images;

determining an aesthetic score for each of the sub-images through a pre-trained evaluation model;

determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image.

2. Method according to claim 1, characterized in that all said sub-images cover the whole area of the image to be processed.

3. The method of claim 1, further comprising, after determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image:

determining the coordinates of a target sub-image in the image to be processed;

and cutting the image to be processed based on the coordinates of the target sub-image to obtain a cut image.

4. The method of claim 1, wherein prior to determining the aesthetic score for each of the sub-images via a pre-trained evaluation model, further comprising:

acquiring a first sample image set, wherein each first sample image in the first sample image set corresponds to a first score value, and the first score value is used for representing the aesthetic feeling of a user;

determining a first predicted value of each first sample image through an evaluation model to be trained;

training the evaluation model to be trained based on the loss between the first predicted value and the first score value of each first sample image to obtain a trained evaluation model.

5. The method of claim 4, wherein said acquiring a first sample image set comprises:

clipping an original image to obtain a plurality of first sample images;

and determining a first score value of each first sample image according to a pre-trained aesthetic model to obtain the first sample image set, wherein the aesthetic model is trained on the basis of a second sample image set, each second sample image in the second sample image set corresponds to a second score value, and the second score value is a result of the user labeling the images based on aesthetic perception.

6. The method of claim 5, wherein prior to determining the score value for each of the first sample images according to a pre-trained aesthetic model, further comprising:

acquiring a second sample image set;

determining a second predicted value of each second sample image through an aesthetic model to be trained;

training the aesthetic model to be trained based on the loss between the second predicted value and the second score value of each of the second sample images to obtain a trained aesthetic model.

7. The method of any of claims 1-6, wherein determining a target sub-image from the plurality of sub-images based on the aesthetic score of each sub-image comprises:

and taking the sub-image with the highest American score in the plurality of sub-images as a target sub-image.

8. An image processing apparatus characterized by comprising:

the acquisition unit is used for acquiring an image to be processed;

the determining unit is used for cutting the image to be processed according to a preset proportion to obtain a plurality of sub-images based on the size parameters of the image to be processed;

an evaluation unit for determining an aesthetic score of each of the sub-images by a pre-trained evaluation model;

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable medium having stored program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1-7 when executed by the processor.