CN116778212A

CN116778212A - Image processing method and device

Info

Publication number: CN116778212A
Application number: CN202210225189.9A
Authority: CN
Inventors: 余黄奇; 金鑫; 肖超恩; 赵鸿儒; 娄豪; 黄横
Original assignee: Shaoding Artificial Intelligence Technology Co ltd; Huawei Technologies Co Ltd
Current assignee: Shaoding Artificial Intelligence Technology Co ltd; Huawei Technologies Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-09-19

Abstract

The application provides an image processing method and device. The method can acquire attribute information of the first picture, wherein the attribute information is used for representing brightness characteristics and/or tone characteristics; when the first picture does not accord with the set picture size, preprocessing the first picture to obtain a second picture accord with the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling; inputting attribute information of the second picture and the first picture into an aesthetic evaluation model to obtain the value of aesthetic evaluation parameters of the first picture; the aesthetic evaluation model is used for representing the corresponding relation between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture. The method keeps the original composition of the picture, and when evaluating the picture, the aesthetic score is not determined only based on the aesthetic feature information, but also the attribute information of the first picture is referred to, so that the obtained aesthetic score is more reliable.

Description

Image processing method and device

Technical Field

The embodiment of the application relates to the technical field of artificial intelligence, in particular to an image processing method and device.

Background

With the continued development of social networks, more and more people wish to record and share their own moments of beauty in life through photos. However, it is difficult for users who have not undergone professional training to photograph excellent works, and in order for users to photograph excellent works, competition for mobile phone photographing is transitioning from "clapping clear" to "clapping beautiful".

Research on aesthetic quality assessment mainly focuses on three stages of aesthetic classification (ugly), aesthetic scoring, and aesthetic distribution. In the related art, under the condition that an AVA data set (for example, fig. 1 is mainly focused on 5 and 6 minutes), scaling the width and height of a picture to 224x224 (the unit is: pixel points), inputting the picture into a base network, extracting features of high dimension, performing model training by using a bulldozer distance (EMD) as a loss function after a full connection layer is adopted, and obtaining aesthetic scores through aesthetic distribution weighting. This approach forces scaling of the picture width to height 224x224, destroying the original aspect ratio of the picture, and for the same content, the aesthetic assessment that different aspect ratio pictures bring to the person is inconsistent, but it cannot distinguish whether or not there is a different aesthetic because the width and height are adjusted to the same size. And the method only outputs aesthetic scores, and cannot effectively point out the improvement directions of photographers and editors.

Disclosure of Invention

The application provides an image processing method and device, which are used for improving the reliability of aesthetic scoring of images.

In a first aspect, the present application provides an image processing method, which may be executed by an electronic device, or may directly feed back an aesthetic score to a client after execution by a server, where the present application is not particularly limited herein, and the electronic device may be: the server may be an entity server, a cloud server, or the like, and the present application is not limited herein. The following is performed:

acquiring attribute information of a first picture, wherein the attribute information is used for representing brightness characteristics and/or tone characteristics; when the first picture does not accord with the set picture size, preprocessing the first picture to obtain a second picture accord with the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling; inputting attribute information of the second picture and the first picture into an aesthetic evaluation model to obtain the value of aesthetic evaluation parameters of the first picture; the aesthetic evaluation model is used for representing the corresponding relation between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

In the application, the picture to be evaluated (i.e. the first picture) may not meet the input requirement of the feature extraction network, and in practical application, the picture to be evaluated accords with the set picture size by filling black edges, or scaling in proportion, or filling black edges and scaling in proportion. According to the application, the picture to be evaluated is adjusted through filling and proportional scaling, the composition information of the picture to be evaluated is not destroyed, and the composition information of the picture to be evaluated is destroyed by the existing stretch cutting.

In addition, when the aesthetic score is obtained, the method is not only determined according to the aesthetic characteristic information extracted by the aesthetic evaluation model, but also refers to the attribute information of the picture to be evaluated, the information considered in the mode is more comprehensive, and the reliability of the output result is higher.

In an alternative, the aesthetic evaluation parameters include at least one of: aesthetic property parameters, aesthetic integration parameters, and aesthetic classification parameters; wherein the aesthetic property parameters include: patterning, color, and illumination.

In the present application, the aesthetic score includes multiple dimensions, which facilitate providing more reliable shooting guidance to the user.

In an alternative, the attribute information includes one or more of the following:

luminance standard deviation, luminance average, brightness standard deviation, brightness average, number of dominant hues, and dominant hue contrast.

In the application, the attribute information comprises a plurality of dimensions, so that the attribute information can be fused with the aesthetic feature information better, and the aesthetic score is output.

In an alternative, the aesthetic evaluation model comprises: the feature extraction module and the feature fusion module; the feature extraction module is used for extracting aesthetic feature information of the second picture; the feature fusion module is used for fusing the aesthetic feature information of the second picture with the attribute information of the first picture to obtain the value of the aesthetic evaluation parameter of the first picture.

According to the application, the aesthetic evaluation model not only can extract the aesthetic characteristic information of the picture, but also can fuse the attribute information of the picture, so that the value of the output aesthetic evaluation parameter is more reliable.

In an alternative manner, the feature extraction module includes: a backbone network, a classification sub-network, an attribute regression sub-network, and a total score regression sub-network; the aesthetic characteristic information of the second picture includes: aesthetic classification feature information, aesthetic attribute feature information, aesthetic comprehensive feature information; the main network is used for extracting local characteristic information of the second picture and inputting the local characteristic information into the classification sub-network, the attribute regression sub-network and the total score regression sub-network respectively; the classifying sub-network is used for extracting aesthetic classifying characteristic information of the second picture according to the local characteristic information; the attribute regression sub-network is used for extracting aesthetic attribute characteristic information of the second picture according to the local characteristic information; the total partial regression sub-network is used for extracting aesthetic comprehensive characteristic information of the second picture according to the local characteristic information.

In the application, the feature extraction network consists of a main network and a plurality of sub-networks, so that feature information of different dimensions of the preprocessed picture can be conveniently extracted, and more reliable aesthetic scores can be obtained.

In an alternative manner, the picture size is set to 800 pixels by 800 pixels.

In the application, the size of the picture is set to 800 pixels which are 800 pixels, so that the data processing of the feature extraction network is facilitated.

In a second aspect, the present application provides an image processing apparatus comprising:

the acquisition unit is used for acquiring attribute information of the first picture, wherein the attribute information is used for representing brightness characteristics and/or tone characteristics; the picture adjusting unit is used for preprocessing the first picture to obtain a second picture which accords with the set picture size when the first picture does not accord with the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling; the aesthetic evaluation parameter acquisition unit is used for inputting the attribute information of the second picture and the first picture into the aesthetic evaluation model to obtain the value of the aesthetic evaluation parameter of the first picture; the aesthetic evaluation model is used for representing the corresponding relation between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

In an alternative way, the attribute information includes at least one of:

In an alternative, the aesthetic evaluation model comprises: the feature extraction module and the feature fusion module;

the feature extraction module is used for extracting aesthetic feature information of the second picture; the feature fusion module is used for fusing the aesthetic feature information of the second picture with the attribute information of the first picture to obtain the value of the aesthetic evaluation parameter of the first picture.

In an alternative manner, the feature extraction module includes: a backbone network, a classification sub-network, an attribute regression sub-network, and a total score regression sub-network; the aesthetic characteristic information of the second picture includes: aesthetic classification feature information, aesthetic attribute feature information, aesthetic comprehensive feature information;

the main network is used for extracting local characteristic information of the second picture and inputting the local characteristic information into the classification sub-network, the attribute regression sub-network and the total score regression sub-network respectively; the classifying sub-network is used for extracting aesthetic classifying characteristic information of the second picture according to the local characteristic information; the attribute regression sub-network is used for extracting aesthetic attribute characteristic information of the second picture according to the local characteristic information; the total partial regression sub-network is used for extracting aesthetic comprehensive characteristic information of the second picture according to the local characteristic information.

In an alternative manner, the picture size is set to 800 pixels by 800 pixels.

In a third aspect, the present application provides an image processing apparatus comprising at least one processor and a memory; the memory is for storing a computer program or instructions which, when executed by the apparatus, cause the communication apparatus to perform the method of the first aspect or embodiments of the first aspect as described above.

In a fourth aspect, the application also provides a computer readable storage medium having stored therein computer readable instructions which when run on a computer cause the computer to perform a method as in the first aspect or any of the possible designs of the first aspect.

In a fifth aspect, the application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or embodiments of the first aspect described above.

The technical effects achieved by the second to fifth aspects are described with reference to the technical effects achieved by the corresponding possible design schemes in the first aspect, and the description of the present application is not repeated here.

Drawings

FIG. 1 shows a schematic diagram of an image processing scenario;

fig. 2 is a schematic diagram of an image processing system according to an embodiment of the present application;

fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of image adjustment provided by an embodiment of the present application;

fig. 5 shows a schematic view of an image processing scenario provided by an embodiment of the present application;

fig. 6 is a schematic diagram illustrating a picture scoring distribution provided by an embodiment of the present application;

FIG. 7 is a schematic diagram showing a total distribution of pictures according to an embodiment of the present application;

FIG. 8 shows a schematic view of a scene of acquiring a preprocessed picture;

FIG. 9 shows a schematic diagram of a high efficiency channel attention module;

FIG. 10 is a schematic diagram of an application scenario in which the image processing method of the present application is implemented;

fig. 11 is a schematic diagram showing the structure of an image processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic diagram showing the structure of an image processing apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings. The specific method of operation in the method embodiment may also be applied to the device embodiment or the system embodiment. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more. Therefore, the implementation of the apparatus and the method can be referred to each other, and the repetition is not repeated.

In the present application, "and/or" describing the association relationship of the association object means that there may be three relationships, for example, a and/or B may mean: a alone, a and B together, and B alone, wherein a, B may be singular or plural. And, unless specified to the contrary, references to "first," "second," etc. ordinal words of embodiments of the present application are used for distinguishing between multiple objects and are not used for limiting the order, timing, priority, or importance of the multiple objects.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Fig. 2 is a schematic diagram of an image processing system according to the present application, where the image processing system includes a client device and a server device, and the client device and the server device may be connected through wired communication or wireless communication, which is not specifically limited herein. The client device can be a mobile phone, a vehicle-mounted device and the like, and the server device can be an entity server or a cloud server and the like. In practical application, under the condition that the computing capability of the client device is strong, after the client device collects the images, image processing can be performed to obtain aesthetic scores of the images, and the aesthetic scores are displayed in a screen of the client device for a user to view. Under the condition that the computing capacity of the client device is weak, the acquired image can be transmitted to the server device, and the server device determines aesthetic scores of the images and feeds the aesthetic scores back to the client device for viewing and reference by a user after data processing.

Referring to fig. 3, the present application provides an image processing method, which can be applied to the image processing system of fig. 2, and can be executed by a client device with a relatively high computing capability or by a server device. Only a client device with a high computing power will be described herein as an example. The method is implemented as follows:

In step 301, the client device obtains attribute information of the first picture, where the attribute information is used to characterize brightness features and/or hue features.

Taking the client device as a mobile phone as an example, the first picture can be an image acquired by a camera of the mobile phone, an image stored in a mobile phone gallery, or an image downloaded by the mobile phone, and the application is not particularly limited herein.

Wherein the brightness value indicates brightness or brightness information of the picture, and the tone refers to the relative brightness degree of the image. The attribute information is used for representing brightness characteristics and/or tone characteristics, is convenient to be better fused with aesthetic characteristic information, and outputs the value of aesthetic evaluation parameters.

Optionally, the attribute information may include at least one of:

In the application, the attribute information can comprise a plurality of dimensions, so that the attribute information can be better fused with the aesthetic feature information to output aesthetic scores.

Step 302, when the first picture does not conform to the set picture size, the client device preprocesses the first picture to obtain a second picture conforming to the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling.

If the width of the first picture is a first preset value and the height is a second preset value, no adjustment is needed. If the width is the first preset value and the height is smaller than the second preset value, referring to (a) in fig. 4, the height of the picture can be filled to the second preset value through black edges. If the width is smaller than the first preset value and the height is larger than the second preset value, referring to (b) in fig. 4, the width of the picture is filled to the first preset value by filling the scaled picture through the black edge after the height of the picture is reduced to the second preset value under the condition of ensuring the aspect ratio of the picture. If the width is smaller than the first preset value and the height is smaller than the second preset value, the width of the picture can be filled to the first preset value through the black edge with reference to (c) in fig. 4, the height of the picture can be filled to the second preset value through the black edge, and the width can be enlarged to the first preset value after the height of the picture is enlarged to the second preset value with reference to (d) in fig. 4 under the condition that the aspect ratio of the picture is ensured. The present application is described herein by way of example only and is not particularly limited to the manner in which the pre-processed pictures are determined.

The method adjusts the first picture to be evaluated through filling and proportional scaling, and does not destroy composition information of the first picture; whereas existing stretch clipping destroys the patterning information of the picture. Therefore, compared with the prior art, the first picture of the application has higher reliability of output results when performing aesthetic evaluation.

Optionally, the picture size is set to 800 pixels by 800 pixels. In the application, the first preset value and the second preset value are 800 pixel points, which is convenient for the data processing of the following feature extraction network.

Step 303, the client device inputs attribute information of the second picture and the first picture to the aesthetic evaluation model to obtain a value of an aesthetic evaluation parameter of the first picture; the aesthetic evaluation model is used for representing the corresponding relation between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

The aesthetic evaluation model of the present application may be built by EffentNet-B0, but may also be built by other algorithms, and the present application is not particularly limited herein.

When the aesthetic score is obtained, the method is not only determined according to the aesthetic characteristic information extracted by the aesthetic evaluation model, but also refers to the evaluation attribute information of the first picture, the information considered in the mode is more comprehensive, and the reliability of the output result is higher.

Optionally, the aesthetic evaluation parameters include at least one of: aesthetic property parameters, aesthetic integration parameters, and aesthetic classification parameters. Wherein the aesthetic property parameters include: patterning, color, and illumination. Of course, in practical applications, other aesthetic attributes may be included in the aesthetic attribute parameters, and the application is not particularly limited herein. The aesthetic scoring results in the present application include multiple dimensions to facilitate providing more reliable shooting guidance to the user.

Optionally, the aesthetic evaluation model comprises: the feature extraction module and the feature fusion module; the feature extraction module is used for extracting aesthetic feature information of the second picture; the feature fusion module is used for fusing the aesthetic feature information of the second picture with the attribute information of the first picture to obtain the value of the aesthetic evaluation parameter of the first picture.

The above feature extraction module may be understood as extracting feature information of the second image by convolution kernel, downsampling, upsampling, or the like. The feature fusion module can be understood as that after the aesthetic feature information extracted by the feature extraction module and the attribute information of the first picture are processed into the same unit value, superposition or weighted summation processing is carried out.

Alternatively, the feature extraction module in the aesthetic evaluation model built by EffentNet-B0 may include: a backbone network, a classification sub-network, an attribute regression sub-network, and a total score regression sub-network; the aesthetic characteristic information of the second picture includes: aesthetic classification feature information, aesthetic attribute feature information, aesthetic comprehensive feature information; the main network is used for extracting local feature information of the second picture (namely semantic feature information of the second picture) and respectively inputting the local feature information into the classification sub-network, the attribute regression sub-network and the total score regression sub-network; the classifying sub-network is used for extracting aesthetic classifying feature information (namely aesthetic comprehensive classification) of the second picture according to the local feature information; the attribute regression sub-network is used for extracting aesthetic attribute characteristic information (namely scoring of aesthetic attributes) of the second picture according to the local characteristic information; the total score regression sub-network is used to extract aesthetic integrated feature information (i.e., aesthetic integrated score) of the second picture based on the local feature information.

The feature extraction module is composed of a main network and a plurality of sub-networks, and is convenient for extracting feature information of different dimensions of the second picture so as to obtain more reliable aesthetic evaluation parameter values. As shown in the aesthetic evaluation model in fig. 5, the input information is an image with any size, the image with any size is filled to obtain a preprocessed image with 3 channels having pixels of 800 x 800, the preprocessed image is input into a backbone network of a feature extraction module for feature extraction (an intermediate value (48, 112, 112) is obtained through aesthetic self-adaptive block processing, and then an output result (1280,7,7) is obtained through further feature extraction), the input result is respectively input into a classification sub-network, an attribute regression sub-network and a total component regression sub-network, and when the feature fusion module fuses the attribute information of the first image (the feature fusion module may be the last layer of each sub-network in practical application), and then respectively outputs an aesthetic attribute score, an aesthetic comprehensive score and an aesthetic comprehensive classification, wherein the results of the aesthetic attribute score and the aesthetic comprehensive classification may be used for adjusting the aesthetic comprehensive score so as to obtain a more reliable result. In fig. 5, only 1 attribute regression sub-network is taken as an example to illustrate, but in practical application, the number of attribute sub-networks is the same as the number of aesthetic attribute parameters, wherein each sub-network mainly extracts characteristic information through an efficient channel attention module (after the output result is processed by the data processing of the global average pooling layer to obtain a processing result, the processing result is input to the global average pooling layer of the fully connected network to perform data processing), for example, the aesthetic attribute parameters are 3, namely, composition, color and illumination, and then the number of attribute sub-networks is also 3.

To better determine the aesthetic evaluation model, the aesthetic evaluation model may be trained in the following manner until the aesthetic evaluation model meets a preset requirement (e.g., the aesthetic evaluation model converges, or training reaches a preset number of times):

s1, acquiring a picture sample set.

In one embodiment, step S1 described above, a picture sample set is obtained, and an aesthetic dataset is constructed from the sample set, the constructed aesthetic dataset comprising two parts: attribute data sets and regression attribute mixture data sets. Each picture in the attribute dataset has not only an aesthetic total score, but also a score of 3 aesthetic attributes, including in particular:

step S1.1: attribute dataset (11166 sheets) for attribute regression training, in the present application, the dataset is composed of EVA (1539 sheets, disclosure), AADB (3574 sheets, disclosure), PCCD (29 sheets, disclosure), PADB (524 sheets, self-building) and data set (5500 sheets, self-building). The 3 aesthetic property parameters are: illumination, color, patterning. The total score and the segmented distribution of each attribute are shown in fig. 6-1 to 6-4. Wherein the abscissa is the normalized fraction, and the ordinate is the number of pictures. Wherein, FIG. 6-1 is the scoring of the total score distribution, the scoring of the external attribute of the picture may be comprehensively considered manually, FIG. 6-2 is the scoring of the illumination distribution, the scoring of the illumination condition of the picture may be considered manually, FIG. 6-3 is the scoring of the color distribution, the scoring of the color condition of the picture may be considered manually, and FIG. 6-4 is the scoring of the composition distribution, the scoring of the composition condition of the picture may be considered manually.

In this step, the picture tags in the attribute dataset are composed not only with an aesthetic total score for the entire picture, but also with scores for three aesthetic attribute parameters for the entire picture.

Step S1.2: regression attribute mix dataset (16924 sheets): the dataset was used for the total score and classification tasks, which was a mix of the regression dataset (5758 sheets) and the attribute dataset (11166 sheets). The total division of the segment distributions is shown in fig. 7. Wherein the abscissa is the normalized fraction, and the ordinate is the number of pictures.

In this step, the only labels needed for the pictures in the regression attribute mix dataset that are constructed are aesthetic totals for the entire picture. This dataset is mainly used for the overall score and classification tasks of the feature extraction module.

S2, filling is added in the image processing stage (in actual application, the filling number 0 is used as a black edge) so as to maintain the composition aspect ratio, and regional pooling is used to generate the high channel number of 48x112x112 features as the input of the backbone network. And constructing a backbone network by using EffentNet-B0 for extracting the characteristics of the input image, and constructing a classification sub-network behind the backbone network. And performing total classification and classification tasks through the backbone network and the classification sub-network, and extracting the total image characteristics for each regression task. And then three attribute regression sub-networks and a total component regression sub-network are built behind the backbone network EffentNet-B0. The constructed multi-task network structure module can output the scores of the aesthetic total scores of the images and the scores of 3 aesthetic attributes of the images, and specifically comprises the following steps:

Step 2.1: to maintain aspect ratio scaling, the picture is filled in black to 800x800, passed through a convolution layer, and finally by pooling by region, a high channel count of 48x112x112 features is generated, as shown in fig. 8 below, with the convolution kernel 3*3 in fig. 8.

Step 2.2: and a backbone network EffentNet-B0 is built and is mainly used for extracting the characteristics of the input image.

Step 2.3: the number of branch networks is 5, and the branch networks are respectively: ten classification sub-networks are mainly combined with a main network to perform total classification tasks on an aesthetic evaluation model, and are also used for performing soft loss guidance on sub-networks with total classification regression; three attribute subnetworks: the illumination attribute sub-network, the color attribute sub-network and the composition attribute sub-network are used for carrying out regression training on three aesthetic attributes and outputting scores of the three aesthetic attributes of the image in a test stage; and finally, a total score regression sub-network is used for carrying out regression training on the total score attribute and outputting the score of the total score in the test stage.

Step 2.4: each branch network adds a channel attention in the network structure, and the depth features of the attribute sub-networks are spliced at the last layer of the last total score regression network, and the regression of the total score is guided by attribute regression high-dimensional depth features. FIG. 9 shows an efficient channel attention module that obtains self-attention of a channel through matrix dot product of channel dimensions, primarily after data processing by a global averaging pooling layer.

Step 2.5: the last layer of the total score regression network is provided with full-connection layers with the same number as the nodes of the classification network, the number of the full-connection layers is 10 nodes, and soft loss is obtained by calculating relative entropy (KL divergence) according to the output of the classification network and the output of the last layer of the total score regression network, so that the characteristic distribution of the total score regression network is guided by using the classification result of the classification network, and the final loss value is 0.1 time of the added value of the soft loss and the loss function of the regression MSE.

Wherein P (i) indicates a predicted kth classification probability; q (i) indicates the actual kth classification probability; y is _i Indicating an actual total score;indicating a score for model prediction; n indicates the number of pictures.

Step 2.6: the above multi-tasking network structure may score both the total score and the three aesthetic attributes. The specific aesthetic evaluation model is shown in fig. 4, wherein the attribute branching portions are similar in structure, so that omitting the redundant attribute branching only shows one attribute branching.

S3, before training the three attribute regression sub-networks, calculating attribute values of the input image, wherein the attribute values are respectively as follows: the calculated evaluation attribute characteristic values are stored in a database firstly. When training the three attribute regression sub-networks, the pre-stored attribute values are fused with neurons at the penultimate layer for subsequent training. Through designing the external feature of the attribute, the feature extraction capability is improved by late fusion with the attribute branch network, and the method specifically comprises the following steps:

Step 3.1: illumination intensity characteristics (average luminance f1, luminance standard deviation f2, average luminance f3, and luminance standard deviation f 4). The illumination intensity characteristic calculation is mainly obtained by calculating average values and standard deviations of L channels and V channels in HSL (hue (H), saturation (S), brightness (L)) and HSV (H (hues), wherein S (saturation) represents saturation, and B (brightness) represents brightness), and the brightness and brightness range are 0-255. Typically, x is used to represent a pixel in space, an average luminance f1, a luminance standard deviation f2, an average luminance f3, and a luminance standard deviation f4, and the specific calculation formula is as follows (std represents standard deviation):

f ₂ ＝std(L(x))

f ₄ =std (V (x)) formula 2, where I indicates the total number of picture pixels.

Step 3.2: the color features include color channel features (color channel weight l 1), color dominant features (RGB dominant color number l2, RGB dominant color dominant degree l3, HSV dominant color l4, HSV dominant color dominant degree l5, dominant hue number l6, dominant hue contrast l 7). The color channel characteristics divide the image into a color three-channel diagram, an approximate gray-scale channel diagram and a gray-scale single-channel diagram by calculating the approximation degree of RGB color channels. Converting the image into an RGB three-channel image, and if the three channels are completely consistent, obtaining a gray single-channel image, wherein the color channel is characterized by f1=0; if the three channels are not completely consistent, calculating the average difference between the three channels of RGB channels and the average value on each pixel point, and if the average difference is smaller than 10, considering the three channels as an approximate gray scale map, wherein the color channel is characterized in that l1=0.5; otherwise, the color three-channel chart is considered, and the color channel is characterized in that l1=1; the color dominant features are calculated mainly from the color histogram in the RGB channel and the hue histogram in the HSV channel. For a given image, each RGB channel is quantized to 8 values, creating a 512-dimensional histogram hrgb= { h0, h1, …, h511}, where hi represents the number of pixels in the ith histogram. F2 may be used to represent the number of dominant colors of the feature RGB. Where c1=0.01 is the threshold parameter. The RGB dominant color dominant degree l3 represents the degree to which the dominant color is dominant in the image. Similarly, the RGB channel is replaced by an HSV channel, so that the features of the HSV dominant color l4 and the dominant degree l5 thereof can be obtained. By eliminating pixels with saturation sum values less than 0.2 and quantifying the hue in the image, i.e. eliminating all white or black pixels. The hue histogram of the remaining pixels is then calculated with 20 uniform bins, each bin occupying a sector of 18 ° of the hue circle. This results in hhue= { h1, h2, …, h20}, where hi represents the set of pixels in the i-th interval. Then, we extract features l6-l8. The number of dominant hues and dominant hue contrast are specifically shown below, where c2=0.01:

l ₇ ＝max _i，j |h _i -h _j Equation 3

Wherein I indicates the total number of picture pixels; h is a _k Indicating the number of pixels of the K-th class of the histogram; max h _i Indicating the maximum class of pixel number in the histogram; i, j indicate different category indexes.

Step 3.3: the gray portion of the penultimate layer of the attribute subnetwork in fig. 4 is the evaluation attribute feature. And during training, the feature extraction capability is improved by late fusion of the feature extraction capability and the attribute branch network.

S4, determining a loss function according to the scores of the aesthetic attributes, the aesthetic comprehensive scores and the aesthetic comprehensive classifications. The calculation of the loss function can be determined with reference to equation 1 above, and is not described here.

And S5, adjusting the main network, the attribute regression sub-network and the total score regression sub-network according to the loss function. And stopping training the aesthetic evaluation model if the value of the loss function remains unchanged.

In the application, the feature extraction network is adjusted based on the loss function, and the feature extraction network is adjusted through the loss function, so that the feature extraction network can be conveniently and rapidly converged, and the trained feature extraction network can be rapidly acquired.

After the image processing method provided by the application is adopted, the user can be guided to take mirror images based on aesthetic scores, as shown in (1) in fig. 10, and the central position can be aligned with the prompt position during taking so as to take better pictures. As shown in fig. 10 (2), the photos of the gallery are originally arranged according to 1-11, after the scheme of the application is adopted to optimize the photos of the gallery, the photos in the gallery are reordered, and are ranked into 2, 3, 5, 8, 1, 6, 4 and 7 according to the aesthetic comprehensive scores of the photos, and the three photos with poor quality, namely 9-11, are deleted, and the user can be prompted to share the photos in instant messaging software. As shown in (3) of fig. 10, the scheme of the application can be adopted to conduct guidance when carrying out composition cutting, color enhancement and other beautifying treatments on the picture, and the picture can be cut along a broken line as shown in the figure, so as to obtain a better composition picture.

It will be appreciated that in order to achieve the above-described functionality, each device may comprise corresponding hardware structures and/or software modules that perform each function. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional units of the device according to the method example, for example, each functional unit can be divided corresponding to each function, and two or more functions can be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

As shown in fig. 11, an image processing apparatus provided by the present application may include an acquisition unit 1101, a picture adjustment unit 1102, and an aesthetic evaluation parameter acquisition unit 1103.

Wherein, the obtaining unit 1101 is configured to obtain attribute information of the first picture, where the attribute information is used to characterize brightness features and/or tone features; the picture adjustment unit 1102 is configured to, when the first picture does not conform to the set picture size, perform preprocessing on the first picture to obtain a second picture conforming to the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling; an aesthetic evaluation parameter obtaining unit 1103, configured to input attribute information of the second picture and the first picture to the aesthetic evaluation model, so as to obtain a value of an aesthetic evaluation parameter of the first picture; the aesthetic evaluation model is used for representing the corresponding relation between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

In an alternative way, the attribute information includes at least one of:

In an alternative manner, the picture size is set to 800 pixels by 800 pixels.

Based on the same concept, as shown in fig. 12, an image processing apparatus 1200 is provided for the present application. The image processing apparatus 1200 may be a chip or a chip system, for example. Alternatively, the chip system in the embodiment of the present application may be formed by a chip, and may also include a chip and other discrete devices.

The image processing apparatus 1200 may include at least one processor 1210, and the image processing apparatus 1200 may also include at least one memory 1220 for storing computer programs, program instructions and/or data. Memory 1220 is coupled to processor 1210. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules. Processor 1210 may operate in conjunction with memory 1220. Processor 1210 may execute computer programs stored in memory 1220. Optionally, at least one of the at least one memory 1220 may be included in the processor 1210.

The image processing apparatus 1200 may further include a transceiver 1230, and the image processing apparatus 1200 may perform information interaction with other devices through the transceiver 1230. The transceiver 1230 may be a circuit, bus, transceiver, or any other device that may be used to interact with information.

In one possible implementation manner, the image processing apparatus 1200 may be applied to the foregoing network device, and the specific image processing apparatus 1200 may be the foregoing network device, or may be an apparatus capable of supporting the foregoing network device to implement any of the foregoing embodiments. Memory 1220 holds the necessary computer programs, program instructions and/or data to implement the functions of the network device in any of the embodiments described above. The processor 1210 may execute a computer program stored in the memory 1220 to perform the method of any of the above embodiments.

The specific connection medium between the transceiver 1230, the processor 1210, and the memory 1220 is not limited in the embodiment of the application. The embodiment of the present application is shown in fig. 12 with the memory 1220, the processor 1210 and the transceiver 1230 being connected by a bus, which is shown in bold lines in fig. 12, and the connection between other components is merely illustrative and not limiting. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 12, but not only one bus or one type of bus.

In an embodiment of the present application, the processor may be a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.

In the embodiment of the present application, the memory may be a nonvolatile memory, such as a hard disk (HDD) or a Solid State Drive (SSD), or may be a volatile memory (RAM). The memory may also be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory in embodiments of the present application may also be circuitry or any other device capable of implementing a memory function for storing computer programs, program instructions and/or data.

Based on the above embodiments, the embodiments of the present application further provide a readable storage medium storing instructions that, when executed, cause the method performed by the electronic device in any of the above embodiments to be performed. The readable storage medium may include: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. An image processing method, comprising:

acquiring attribute information of a first picture, wherein the attribute information is used for representing brightness characteristics and/or tone characteristics;

when the first picture does not accord with the set picture size, preprocessing the first picture to obtain a second picture accord with the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling;

inputting the attribute information of the second picture and the attribute information of the first picture into an aesthetic evaluation model to obtain the value of the aesthetic evaluation parameter of the first picture; the aesthetic evaluation model is used for representing the correspondence between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

2. The method of claim 1, wherein the aesthetic evaluation parameters include at least one of: aesthetic property parameters, aesthetic integration parameters, and aesthetic classification parameters; wherein the aesthetic property parameters include: patterning, color, and illumination.

3. The method according to claim 1 or 2, wherein the attribute information comprises at least one of:

4. The method of claim 2, wherein the aesthetic evaluation model comprises: the feature extraction module and the feature fusion module;

5. The method of claim 4, wherein the feature extraction module comprises: a backbone network, a classification sub-network, an attribute regression sub-network, and a total score regression sub-network; the aesthetic characteristic information of the second picture includes: aesthetic classification feature information, aesthetic attribute feature information, aesthetic comprehensive feature information;

The main network is used for extracting local characteristic information of the second picture and inputting the local characteristic information into the classification sub-network, the attribute regression sub-network and the total molecular regression sub-network respectively;

the classifying sub-network is used for extracting aesthetic classifying characteristic information of the second picture according to the local characteristic information; the attribute regression sub-network is used for extracting aesthetic attribute characteristic information of the second picture according to the local characteristic information; the total partial regression sub-network is used for extracting the aesthetic comprehensive characteristic information of the second picture according to the local characteristic information.

6. The method according to any one of claims 1-5, wherein the set picture size is 800 pixels by 800 pixels.

7. An image processing apparatus, comprising:

the acquisition unit is used for acquiring attribute information of the first picture, wherein the attribute information is used for representing brightness characteristics and/or tone characteristics;

the picture adjusting unit is used for preprocessing the first picture to obtain a second picture which accords with the set picture size when the first picture does not accord with the set picture size; wherein the preprocessing comprises at least one of the following operations: filling and scaling;

The aesthetic evaluation parameter acquisition unit is used for inputting the attribute information of the second picture and the first picture into the aesthetic evaluation model to obtain the value of the aesthetic evaluation parameter of the first picture; the aesthetic evaluation model is used for representing the correspondence between the attribute information of the second picture and the first picture and the value of the aesthetic evaluation parameter of the first picture.

8. The apparatus of claim 7, wherein the aesthetic evaluation parameters include at least one of: aesthetic property parameters, aesthetic integration parameters, and aesthetic classification parameters; wherein the aesthetic property parameters include: patterning, color, and illumination.

9. The apparatus according to claim 7 or 8, wherein the attribute information comprises at least one of:

10. The apparatus of claim 8, wherein the aesthetic evaluation model comprises: the feature extraction module and the feature fusion module;

11. The apparatus of claim 10, wherein the feature extraction module comprises: a backbone network, a classification sub-network, an attribute regression sub-network, and a total score regression sub-network; the aesthetic characteristic information of the second picture includes: aesthetic classification feature information, aesthetic attribute feature information, aesthetic comprehensive feature information;

12. The apparatus according to any one of claims 7-11, wherein the set picture size is 800 pixels by 800 pixels.

13. An image processing apparatus, comprising: at least one processor and memory;

The memory is used for storing a computer program or instructions;

the at least one processor configured to execute the computer program or instructions to cause the method of any one of claims 1-6 to be performed.

14. A computer readable storage medium storing instructions which, when executed by a computer, cause the method of any one of claims 1-6 to be performed.

15. A computer program product comprising a computer program or instructions which, when run on a computer, causes the method of any of the preceding claims 1-6 to be performed.