CN111062930A

CN111062930A - Image selection method and device, storage medium and computer equipment

Info

Publication number: CN111062930A
Application number: CN201911323354.9A
Authority: CN
Inventors: 刘刚
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-04-24

Abstract

The application relates to a method, a device, a storage medium and computer equipment for selecting an image, wherein the method comprises the following steps: acquiring an image candidate set of service content; determining the category of the business content, and acquiring a corresponding aesthetic quality model according to the category of the business content; obtaining an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model; and selecting a target image according to the aesthetic quality scores of all candidate images in the image candidate set. The method and the device improve the attractiveness of the selected image.

Description

Image selection method and device, storage medium and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for selecting an image, a storage medium, and a computer device.

Background

With the rapid popularization of intelligent devices such as cameras, video cameras, smart phones and the like, the data of visual contents such as images, videos and the like are increasing day by day, and how to find contents which are considered to have high aesthetic quality from a large amount of data becomes a difficult problem. In recent years, with the rapid development of technologies such as computer vision algorithms and pattern recognition, it is desired that a computer can simulate human perception and understanding of beauty, automatically evaluate the "beauty" of an image, and select an image with high aesthetic quality from the evaluation results.

However, the "aesthetic" of an image is an abstract concept. The traditional image selection mode generally adopts manual image feature design, trains a classifier according to extracted image features, divides the image into two types of high quality or low quality, and selects the image from the high quality image. The image selection mode mainly evaluates the distortion degree of the image (such as distortion caused by poor imaging conditions, distortion caused by lossy compression, distortion caused by channel attenuation in the image transmission process, and the like), and evaluates the image in the aspect of aesthetic quality (such as composition, color, depth of field, and the like) less, so that the selected image is not beautiful enough.

Disclosure of Invention

In view of the above, it is necessary to provide an image selecting method, an image selecting apparatus, a storage medium, and a computer device, which are directed to the problem that the image selected by the conventional image selecting method is not beautiful enough.

A method for selecting an image, the method comprising:

acquiring an image candidate set of service content;

determining the category of the business content, and acquiring a corresponding aesthetic quality model according to the category of the business content;

obtaining an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model;

and selecting a target image according to the aesthetic quality scores of all candidate images in the image candidate set.

An apparatus for selecting an image, the apparatus comprising:

the acquisition module is used for acquiring an image candidate set of the service content;

the determining module is used for determining the category to which the business content belongs and acquiring a corresponding aesthetic quality model according to the category to which the business content belongs;

the obtaining module is further configured to obtain an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model;

and the selecting module is used for selecting the target image according to the aesthetic quality scores of all the candidate images in the image candidate set.

A storage medium having stored thereon computer-executable instructions which, when executed by a processor, cause the processor to perform the steps of a method of selecting an image.

A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of a method of selecting an image.

The image selecting method, the image selecting device, the storage medium and the computer equipment acquire the image candidate set of the business content, determine the category of the business content, acquire the corresponding aesthetic quality model according to the category of the business content, acquire the aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model, and select the target image according to the aesthetic quality score of each candidate image in the image candidate set. According to the image selection method, the aesthetic quality of the image is evaluated through the aesthetic quality model, and the evaluation of the image in the aspect of the aesthetic quality is enhanced, so that the aesthetic degree of the selected image is improved; moreover, different aesthetic quality evaluation standards are provided for different types of images, fine-grained differentiation of the aesthetic quality of the images is realized, and the accuracy of image selection is improved.

Drawings

Fig. 1 is an internal structure diagram of a terminal for implementing a method of selecting an image in one embodiment;

FIG. 2 is a flow chart illustrating a method for selecting an image according to an embodiment;

FIG. 3 is a block diagram of a portion of the structure of a selected model of an image in one embodiment;

FIG. 4 is a block diagram of a flow chart for training a selection model of an image in one embodiment;

FIG. 5 is a block diagram of a selected model of an image in one embodiment;

FIG. 6 is a block diagram showing a partial structure of a selected model of an image in another embodiment;

FIG. 7 is a flowchart illustrating a method for selecting an image according to another embodiment;

FIG. 8 is a diagram illustrating the effect of a method for selecting an image according to an embodiment;

FIG. 9 is a block diagram of an image selection system in one embodiment;

FIG. 10 is a block diagram showing an example of a structure of an image selecting apparatus;

FIG. 11 is a block diagram showing an arrangement of an image selecting apparatus according to another embodiment;

FIG. 12 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a schematic diagram of an internal structure of a terminal in one embodiment. As shown in fig. 1, the terminal includes a processor, a nonvolatile storage medium, an internal memory, and a network interface, a display screen, and an input device, which are connected through a system bus. The non-volatile storage medium of the terminal stores an operating system and also stores computer readable instructions, and when the computer readable instructions are executed by the processor, the processor can realize a method for selecting an image. The processor is used for providing calculation and control capability and supporting the operation of the whole terminal. The internal memory may also have computer readable instructions stored thereon that, when executed by the processor, cause the processor to perform a method of image selection. The network interface is used for network communication with a server or other terminals. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen. The input device may be a touch layer covered on the display screen, or may be a key, a track ball or a touch pad arranged on the terminal housing, or may be an external keyboard, a touch pad or a mouse, etc.

Those skilled in the art will appreciate that the configuration shown in fig. 1 is a block diagram of only a portion of the configuration relevant to the present application, and does not constitute a limitation on the terminal to which the present application is applied, and that a particular terminal may include more or less components than those shown in the drawings, or may combine certain components, or have a different arrangement of components.

In one embodiment, as shown in FIG. 2, a method of selecting an image is provided. Referring to fig. 2, this embodiment mainly exemplifies that the method is applied to the terminal in fig. 1, and the method for selecting the image specifically includes the following steps:

s202, acquiring an image candidate set of the service content.

The service content may be video content, graphics content, and the like, which relate to images.

The type of the service Content may be PGC (Professional Generated Content), which is Content that is produced by a traditional television provider in a manner of a television program and adjusted according to the propagation characteristics of the internet; or UGC (User Generated Content), which is the original Content of the User and is displayed or provided to other users through an internet platform; or PUGC (Professional user generated Content), which is Content combining UGC and PGC; the content may be MCN (Multi-Channel Network), which is a content that is continuously output with the support of capital by combining PGC contents.

The image candidate set is a set of candidate images for selecting a target image, and the target image may be a cover page, a drawing, and the like of business content.

Taking the target image as an outer cover, if the service content is video content, the candidate image can be an image obtained by framing the video, and a set of images obtained by framing is taken as an image candidate set; if the service content is the image-text content, the candidate image can be an image in the image-text content, and the set of images in the image-text content is used as an image candidate set. Taking the target image as an illustration in the image-text content as an example, the candidate image may be a local image, and an image candidate set is generated according to the local image corresponding to the received image selection instruction.

S204, determining the category of the business content, and acquiring a corresponding aesthetic quality model according to the category of the business content.

The category refers to a field to which the service content relates. The categories may be subdivided into multiple levels, such as a first level category, a second level category, and a third level category, with the classification of the first level category, the second level category, and the third level category being refined in turn. For example, a primary category may be science, gaming, live, military, financial, sports, entertainment, real estate, fashion, education, and the like; aiming at the fact that the first class is news, and the second class can be domestic news and international news; the second class is domestic news, and the third class can be political news, economic news, legal news, military news, scientific news, sports news, social news and the like. The category in this embodiment may be one of the above-described primary category, secondary category, or tertiary category.

The category to which the service content belongs may be determined based on meta-information of the service content. The meta-information may include attribute information for characterizing attributes of the service content itself, such as: size, format, title, author, publisher, publishing time, whether the business content is original, shooting tag (shooting device, place, time, etc.), (of video) code rate, link of cover art (for the case of user uploading cover art), category to which the business content belongs (the category is automatically identified by the system or the category of received input), etc.; the marking information is used for representing the mark of manually checking the business content, such as: the category to which the service content belongs (the category is manually labeled), etc. The type of the service content can be preferentially determined according to the marking information, and if the service content has no marking information, the type of the service content can be determined according to the attribute information.

Wherein the aesthetic quality model is used to evaluate the aesthetic quality of each candidate image in the image candidate set. The aesthetic quality model may score each candidate image, characterizing the aesthetic quality assessment of the candidate image by a score.

Because the average level of aesthetic quality is different for different categories of images, there may be different aesthetic quality evaluation criteria for different categories of images. Generally speaking, the aesthetic quality standard of a user for a live broadcast category is lower than that of a news category, because the average level of the aesthetic quality of a news category image is higher than that of a live broadcast category image, the same image may be scored by the user for 9 points in the live broadcast category, and may be scored by the user for 5 points in the news category, so that different aesthetic quality evaluation standards may be set for different categories of images, thereby realizing fine-grained differentiation of the aesthetic quality of the images.

Different aesthetic quality models may be set for different classes of images. Specifically, when the aesthetic quality model is trained, the sample image used for training carries the label category and the label score. And training the aesthetic quality model by using the sample images of different classes to enable the score predicted by the aesthetic quality model to continuously approach the mark score of the sample image, thereby obtaining aesthetic quality model parameters corresponding to different classes, namely the aesthetic quality models corresponding to different classes. And storing the different categories in association with the aesthetic quality models corresponding to the categories.

The aesthetic quality model may be a convolutional neural network model. Because the value of each pixel point is stored in the computer by the image, the full connection of the traditional neural network can possibly cause the explosive increase of parameters, the computer can not update and calculate so many parameters, and the weight sharing, local perception and downsampling method of the convolutional neural network model can effectively solve the problem of excessive parameters, so that the parameter scale of the convolutional neural network model is reduced to the acceptable step of the computer, and the use of a plurality of convolutional layers can enable the characteristics of the image extracted by the convolutional neural network model to be richer.

After the feature map obtained by the last layer of convolution is input, the feature map is subjected to processing of various convolution kernels, namely processing of various sizes of convolution kernels such as 4 x1, 3 x 3 and 5 x 5 and processing of a maximum pooling layer, wherein the convolution kernels with different sizes are used for extracting different aesthetic features aiming at different sizes of receptive fields, and finally, the combination of the convolution kernels means the fusion of the aesthetic features with different scales. Although this may increase the aesthetic features of the extracted image, it may lead to an increase in network parameters due to the presence of the 5 x 5 convolution kernel, which increases the complexity of the network computation. In order not to reduce the richness of the extracted aesthetic features, an inclusion network module, a VGG Net network module and the like can be added in the convolutional neural network model. As shown in fig. 3, taking the inclusion network module as an example, a convolution operation of 1 × 1 may be added before convolution operations of 3 × 3 and 5 × 5 to reduce the parameter scale of the convolutional neural network model.

If the size of the aesthetic features of the image extracted from the previous layer of mesh is 192 × 28, i.e., 192 feature maps with a size of 28 × 28, and if the image is directly processed by 32 convolution kernels with a size of 5 × 5, where the step size is 1, then the size of the obtained feature maps is 32 × 24, i.e., 32 feature maps with a size of 24 × 24, and the total number of parameters required for this convolution is 192 × 32 × 5+32, i.e., 153632 parameters. If the convolution kernel processing of 1 × 1 is added before the convolution kernel processing of 5 × 5 volume axletree terminal kernel processing, namely, after 32 convolution kernel processing feature maps with the size of 1 × 1, the feature maps are changed into the size of 32 × 28, the number of parameters required in the step is 192 × 32 × 1+32, namely 6176 parameters, and then the convolution kernel processing of 32 with the size of 5 × 5 is carried out, finally, the original feature maps are also changed into the feature maps with the size of 32 × 24, the number of parameters required in the step is 32 × 5+32, namely 25632 parameters, and the number of parameters required in the latter mode is 6176+25632, 31808, and compared with the number of parameters in the former mode, the number of parameters in the convolution neural network model is reduced greatly. In this way, the aesthetic features extracted by the convolutional neural network model are not reduced, and the parameter size of the convolutional neural network model can be reduced.

S206, obtaining the aesthetic quality scores of the candidate images in the image candidate set according to the aesthetic quality model.

Wherein the aesthetic quality score is used to characterize an aesthetic quality evaluation of the candidate image by the aesthetic quality model. A plurality of preset scores (for example, 1 to 10) can be preset, the candidate image is input into the aesthetic quality model, the probability value of each preset score corresponding to the candidate image is obtained, and the preset score corresponding to the maximum probability value is selected as the aesthetic quality score of the candidate image.

S208, selecting a target image according to the aesthetic quality scores of the candidate images in the image candidate set.

The candidate images can be ranked according to the aesthetic quality scores of the candidate images, and the target image is selected according to the ranking result. For example, a preset number of candidate images are selected as the target images according to the aesthetic quality scores from high to low. The preset number can be set according to the actual application scene.

The image selection method provided in this embodiment obtains an image candidate set of service content, determines a category to which the service content belongs, obtains a corresponding aesthetic quality model according to the category to which the service content belongs, obtains an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model, and selects a target image according to the aesthetic quality score of each candidate image in the image candidate set. According to the image selection method, the aesthetic quality of the image is evaluated through the aesthetic quality model, and the evaluation of the image in the aspect of the aesthetic quality is enhanced, so that the aesthetic degree of the selected image is improved; moreover, different aesthetic quality evaluation standards are provided for different types of images, fine-grained differentiation of the aesthetic quality of the images is realized, and the accuracy of image selection is improved.

In one embodiment, the training of the aesthetic quality model includes: acquiring a sample image and annotation information of the sample image, wherein the annotation information of the sample image comprises an annotation category of the sample image and an annotation score of the sample image; and training the pre-training model according to the sample image and the labeling information of the sample image to obtain the aesthetic quality model.

The aesthetic quality model is trained based on a pre-training model, The pre-training model is a convolutional neural network model trained according to an image data set, and The image data set can be an ImageNet image data set, an AVA (The Aestistic visual Analysis dataset) data set, a PN (photo Net) data set, a CUHK-PQ (The CUHK-Photoquality) data set and The like. The pre-training model can be an inclusion-v 1, an inclusion-v 2, an inclusion-v 3, an inclusion-v 4 model, and the like trained based on the ImageNet image data set.

The source of the sample image may be sample business content, and the sample business content may come from various business websites, such as a QQ viewpoint, a WeChat-watch, a today's first item and a point information, and the like. If the sample service content is video content, the video can be subjected to frame extraction, and an image obtained by frame extraction is labeled to obtain a sample image; and if the sample service content is the image-text content, acquiring an image in the image-text content, and labeling the acquired image to obtain a sample image.

The annotation information refers to content for annotating the sample image. The annotation information can include an annotation category and an annotation score. In one embodiment, the input annotation information is received and used as the annotation information of the sample image, that is, the annotation information can be generated by the annotating personnel.

The annotation category refers to a field related to the sample image, and can be determined with reference to the field related to the sample business content. The labeling categories can be subdivided into multiple levels, such as a first-level labeling category, a second-level labeling category and a third-level labeling category, and the classification of the first-level labeling category, the second-level labeling category and the third-level labeling category is sequentially refined. The primary label category may be science, gaming, live, military, financial, sports, entertainment, real estate, fashion, education, etc.; aiming at the fact that the first-level labeling category is news, and the second-level labeling category can be domestic news and international news; the second level labeling category is domestic news, and the third level labeling category can be political news, economic news, legal news, military news, scientific news, sports news, social news and the like. The labeling category in this embodiment may be one of the above-described first-level labeling category, second-level labeling category, or third-level labeling category.

A plurality of predetermined scores (e.g., 1-10) may be preset, and the scores may be labeled to the sample images based on consideration of the index parameters of the sample images. The index parameter refers to a factor influencing the annotation score of the sample image. Optionally, the index parameter includes at least one of beauty degree information, key object information, user attention information, and relevancy information.

The aesthetic information is used to characterize the aesthetic quality of the sample image, which may include color features as well as composition features. Color characteristics may include brightness, contrast, saturation, color temperature, hue, color composition, color co-schedule, and the like; the patterning feature may include a one-third patterning rule, a symmetric patterning rule, a frame-type patterning rule, a center patterning rule, a guideline patterning rule, a diagonal patterning rule, a triangle patterning rule, a uniform patterning rule, and the like.

The key object information is used for representing the proportion of key areas in the sample image, and the key areas can be human face areas, motion areas, salient areas and the like. The salient region refers to a region related to the subject matter of the sample business content. The key object information may include the number and aggregate size of face regions, the number and aggregate size of motion regions, and the number and aggregate size of salient regions.

The user attention information is used to characterize factors, such as moving objects, in the sample image that attract the attention of the user. The attention level can be represented by the motion features of the frames, the user attention information is extracted according to the optical flow result, and the motion information can comprise motion statistical features, boundary jitter features, camera jitter features, motion entropy features and key target motion features. The motion statistic feature is used for describing the motion intensity of the moving object, the boundary jitter feature is used for describing the motion of the moving object relative to the camera, the camera jitter feature is used for describing the absolute camera motion in the motion direction, the motion entropy feature is used for describing the change of the motion direction of the moving object, and the key target motion feature is used for describing the motion characteristic of a key target.

The relevancy information is used for representing the relevancy of the sample image and the subject of the sample business content.

As shown in fig. 4, the training process of the aesthetic quality model is as follows: and dividing the sample image into a training set and a testing set, training parameters of the aesthetic quality model by using the training set, and testing the evaluation effect of the aesthetic quality model by using the testing set.

According to the image selection method provided by the embodiment, the sample image and the labeling information of the sample image are utilized to train the aesthetic quality model, the evaluation capability of the aesthetic quality model on the aesthetic quality of the image is improved, and the aesthetic quality model can be used for carrying out fine-grained differentiation on the aesthetic quality of the image.

In one embodiment, the determining of the sample image includes: acquiring at least two annotation scores of each original image; and selecting an original image with the difference value between the at least two labeling scores within a preset range as the sample image.

The original image can be determined in the sample service content, namely if the sample service content is video content, the video can be subjected to frame extraction, and the image obtained by frame extraction is labeled to obtain the original image; and if the sample service content is the image-text content, acquiring images in the image-text content, and labeling the acquired images to obtain original images, wherein each original image can be labeled by at least two labeling personnel. And selecting a sample image from the original image, namely taking the original image as a sample image if the difference value between the labeling scores of at least two labeling personnel is within a preset range. The preset range may be set according to practical applications, for example, the preset range may be [1,2 ].

When different annotators annotate the same sample image, the cognition of the different annotators on the annotation category and the annotation score can be different, and if the cognition deviation of the annotators on the annotation score is too large, the learning of an aesthetic quality model is not facilitated, so that the annotation scores of the different annotators on the same sample image are controlled within a certain range.

By obtaining at least two labeling scores of each sample image, the distribution information of the labeling scores of the sample images can be obtained, the probability value of each labeling score relative to each preset score is further obtained, and the probability value can be used for training the aesthetic quality model.

According to the image selection method provided by the embodiment, the distribution information of the labeling scores of the sample images is obtained, and then the probability value of each labeling score relative to each preset score is obtained, and the probability value is used for training the aesthetic quality model, so that the training accuracy of the aesthetic quality model is improved.

In an embodiment, the training the pre-training model according to the sample image and the labeling information of the sample image to obtain the aesthetic quality model includes: inputting the sample image and the labeling information of the sample image into the pre-training model to obtain the prediction score probability distribution information of the sample image; updating parameters of the pre-training model according to the difference between the prediction score probability distribution information and reference score probability distribution information to obtain the aesthetic quality model, wherein the reference score probability distribution information is generated according to the mark score of the sample image.

The prediction score probability distribution information refers to a probability value of a sample image, predicted by the aesthetic quality model, relative to each preset score for the sample image; the reference score probability distribution information is a probability value of a sample image relative to each preset score, which is obtained by calculating at least two labeling scores for the sample image.

Each sample image comprises at least two annotation scores, and reference score probability distribution information of the sample image can be determined according to the at least two annotation scores. For example, if the preset score is 1-10 and the label score is 9 and 10, the reference score probability distribution information is: the probability values of scores 1-8 are 0, and the probability values of 9 and 10 are all 50%.

And for the sample image, the aesthetic quality model outputs prediction score probability distribution information, the prediction score probability distribution information can be an N-dimensional vector, each dimension represents the probability value of the image relative to each preset score, and N refers to the number of the preset scores.

Calculating loss information according to a difference between the prediction score probability distribution information and the reference score probability distribution information, and updating parameters of the aesthetic quality model according to the loss information.

According to the image selection method provided by the embodiment, the aesthetic quality model is trained according to the difference between the prediction score probability distribution information and the reference score probability distribution information, so that the training accuracy of the aesthetic quality model is improved.

In one embodiment, the prediction score probability distribution information includes first prediction score probability distribution information, second prediction score probability distribution information, and third prediction score probability distribution information, the aesthetic quality model includes a first loss function, a second loss function, and a third loss function, the first loss function, the second loss function, and the third loss function differ in level in the aesthetic quality model, the first loss function is used to calculate a difference between the first prediction score probability distribution information and the reference score probability distribution information, the second loss function is used to calculate a difference between the second prediction score probability distribution information and the reference score probability distribution information, the third loss function is used to calculate a difference between the third prediction score probability distribution information and the reference score probability distribution information;

updating the parameters of the pre-training model according to the difference between the prediction score probability distribution information and the reference score probability distribution information to obtain the aesthetic quality model, wherein the method comprises the following steps: calculating according to the first loss function to obtain first loss information, calculating according to the second loss function to obtain second loss information, and calculating according to the third loss function to obtain third loss information; and updating parameters of the pre-training model according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

The aesthetic quality model can set the activation function and the loss function at different levels, and the specific setting level and the setting number can be set according to the actual application. Each activation function may use a Sigmoid function, and each loss function may use a softmax function.

The first prediction score probability distribution information, the second prediction score probability distribution information and the third prediction score probability distribution information are obtained by calculating activation functions (a first activation function, a second activation function and a third activation function) of different levels in the aesthetic quality model; the first loss function, the second loss function and the third loss function are loss functions of different levels in the aesthetic quality model, and the first loss function is used for calculating the difference between the first prediction score probability distribution information and the reference score probability distribution information to obtain first loss information; the second loss function is used for calculating the difference between the second prediction score probability distribution information and the reference score probability distribution information to obtain second loss information; the third loss function is used for calculating the difference between the third prediction score probability distribution information and the reference score probability distribution information to obtain third loss information.

Specifically, corresponding weights may be set for the first loss information, the second loss information, and the third loss information, total loss information is obtained according to a weighted sum of the first loss information, the second loss information, and the third loss information, and parameters of the pre-training model are updated by using the total loss information. The parameters of the path where the first loss function is located, the path where the second loss function is located, and the path where the third loss function is located may also be updated respectively by using the first loss information, the second loss information, and the third loss information.

As shown in fig. 5, the three loss functions softmax0, softmax1, and softmax2 differ in the hierarchy in the aesthetic quality model, and acquire the first loss information, the second loss information, and the third loss information, respectively. Taking the softmax0 branch in fig. 5 as an example, as shown in fig. 6, the training process of the aesthetic quality model is described: firstly, the first three layers perform convolution operation on the sample image, the convolution cores of 7 × 7, 3 × 3 and 1 × 1 are respectively used for processing the sample image, and an LRN layer is added in the middle, mainly for accelerating the convergence speed of the network. From the fourth layer to the ninth layer, the inclusion modules were used, respectively. Firstly, carrying out convolution processing on a sample image by adopting different convolution check of 1 x1 in a fourth layer, further processing the sample image by respectively connecting convolution check of 3 x 3 and convolution check of 5 x 5 in the front of a fifth layer, and merging processing results obtained by the fifth layer to be used as a first part of characteristics of the sample image obtained by the network; repeating the structures of the fourth layer and the fifth layer on the sixth layer and the seventh layer to obtain a second part of characteristics of the sample image; and repeating the structure of the fourth layer and the fifth layer once again by the eighth layer and the ninth layer, and simultaneously obtaining a third part of characteristics of the sample image by the convolution operation of adding the 1 x1 convolution kernel of the tenth layer after the ninth layer. At the eleventh level, a fully connected layer is used to fuse all features of the sample image together into one feature vector. And the twelfth layer is also a full connection layer, and the probability values of the sample images belonging to the preset scores are obtained through the first activation function. And finally, calculating by adopting a first loss function to obtain first loss information.

The three loss functions may adopt different loss calculation methods, such as Cross Entropy (Cross Entropy), JS divergence (JS-Shannon divergence), KL divergence (kllbback-leiblerdientgence), EMD (empirical mode decomposition), euclidean distance, and the like. In one embodiment, the first loss function may measure a difference between the predicted score probability distribution information and the reference score probability distribution information using JS divergence, the second loss function may measure a difference between the predicted score probability distribution information and the reference score probability distribution information using EMD (empirical mode decomposition), and the third loss function may measure a difference between the predicted score probability distribution information and the reference score probability distribution information using euclidean distance.

Take JS divergence as an example:

JS(Pr,Pg)＝KL(Pr||Pg)+KL(Pg||Pm)

wherein Pm is (Pr + Pg)/2

And Pr is the predicted score probability distribution information, and Pg is the reference score probability distribution information.

Taking EMD as an example:

W(Pr,Pg)＝infγ∈π(Pr,Pg)E(x,y)～y[||x-y||]

where Pr is the predicted fractional probability distribution information, Pg is the reference fractional probability distribution information, and W (Pr, Pg) is all possible joint distributions of Pr and Pg combined. For each possible joint distribution gamma, sampling (x, y) -gamma to obtain samples x and y, calculating the distance of the pair of samples I x-y I, and calculating the expected value E (x, y) -gamma [ | | x-y | of the samples to the distance under the joint distribution gamma. The lower bound taken on this expectation in all possible joint distributions is the EMD distance.

According to the image selection method provided by the embodiment, loss functions are arranged at different levels of the aesthetic quality model, parameters of the aesthetic quality model are updated according to loss information obtained by calculation of the loss functions, and the training accuracy of the aesthetic quality model is improved.

In one embodiment, the updating the parameters of the pre-training model according to the first loss information, the second loss information, and the third loss information to obtain the aesthetic quality model includes: and respectively updating parameters of the path of the first loss function, the path of the second loss function and the path of the third loss function according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

Specifically, the parameters on the path where the first loss function is located are updated by using the first loss information, the parameters on the path where the second loss function is located are updated by using the second loss information, and the parameters on the path where the third loss function is located are updated by using the third loss function.

According to the image selection method provided by the embodiment, the loss functions are arranged at different levels of the aesthetic quality model, the parameters on the path where the loss functions are located are respectively updated according to the loss information obtained by calculation of the loss functions, and the training accuracy of the aesthetic quality model is improved.

In one embodiment, the method further comprises: and regularly acquiring a newly added sample image and the labeling information of the newly added sample image, and training the pre-training model according to the newly added sample image and the labeling information of the newly added sample image to obtain the aesthetic quality model.

In the application process of the aesthetic quality model, the sample image library used for storing the sample images is continuously updated, and the newly added sample images refer to the sample images newly added into the sample image library.

Based on the service content fed back by the user, when the aesthetic degree of the service content is manually checked to be low, the image in the service content is obtained, the image is marked, and the marked image is added into the sample image library. The newly added sample image and the labeling information of the newly added sample image can be obtained at regular time, and the aesthetic quality model is optimized and updated.

The image selection method provided by the embodiment optimizes the aesthetic quality model at regular time, and ensures the prediction accuracy of the aesthetic quality model.

In one embodiment, the obtaining of the candidate set of images of the business content includes: if the service content is a video, extracting a video frame from the service content according to a frame extraction rule to obtain the image candidate set, wherein the frame extraction rule comprises: at least one of a motion rule, a user attention rule, a sparse coding rule, a sparse reconstruction rule, a keypoint rule, and a portrait rule.

In this embodiment, the video scene conversion frame may be extracted, and the video scene conversion frame may be identified by a difference degree between a current frame and a previous frame, where the difference degree may be quantified by at least one characteristic of brightness, color, edge, and the like. Taking brightness as an example, if the difference between the brightness of the current frame and the brightness of the previous frame is greater than the brightness threshold, the current frame is a video scene conversion frame; if the difference between the brightness of the current frame and the brightness of the previous frame is less than or equal to the brightness threshold, the current frame is not a video scene conversion frame.

After extracting the video scene conversion frame, determining a candidate image by combining a frame extraction rule. Wherein, the frame extraction rule may include: at least one of a motion rule, a user attention rule, a sparse coding rule, a sparse reconstruction rule, a keypoint rule, and a portrait rule. The motion rule is that: selecting a frame with a motion amplitude larger than a preset amplitude; the user attention rule refers to: selecting frames with characteristics attracting attention of a user, such as frames with motion characteristics; the sparse coding rule is: selecting a frame with sparse coding amplitude larger than a preset amplitude; the sparse reconstruction rule is: selecting frames with the sparse reconstruction number smaller than the preset number; the key point rule is: selecting frames with the number of key points larger than the preset number, wherein the key points refer to points related to the theme of the service content; the portrait rule refers to: a frame with a portrait is selected.

The image selection method provided by the embodiment improves the aesthetic quality of the candidate image.

In one embodiment, said selecting a target image according to an aesthetic quality score of each candidate image in said image candidate set comprises: ranking each candidate image in the image candidate set according to the aesthetic quality score of the candidate image; and selecting a preset number of candidate images as the target images according to the sorting result.

The image selection method provided by the embodiment selects the target image according to the aesthetic quality score, and improves the aesthetic degree of the selected image.

As shown in fig. 7, in a specific embodiment, the image selecting method includes the following steps:

s702, acquiring an image candidate set of service content;

s704, determining the category of the business content, and acquiring a corresponding aesthetic quality model according to the category of the business content;

s706, obtaining the aesthetic quality scores of all candidate images in the image candidate set according to the aesthetic quality model;

s708, selecting a cover according to the aesthetic quality scores of the candidate images in the image candidate set.

In one embodiment, the image selection method can be used to select the cover page of the business content. In the traditional internet industry, business contents are often pushed in the form of information streams, such as QQ view, wechat view, today's first item and little information. When browsing information flow, the user firstly pays attention to the title, cover page and author of the business content, and the cover page directly influences the click conversion rate of the business content as the first impression of the user on the business content.

As shown in fig. 8, the left side of fig. 8 is the cover selected by the conventional method, and the right side of fig. 8 is the cover selected by the present embodiment.

The image selection method provided by the embodiment can select the cover with high aesthetic quality and more focused theme, thereby improving the click conversion rate of the business content.

In one embodiment, as shown in fig. 9, the image selecting system includes: the system comprises a content production module, an uplink and downlink content interface module, a content information storage module, a content warehousing module, a video frame extraction and image-text analysis module, an aesthetic quality model, an aesthetic quality scoring module, a sample image module, a content distribution outlet module, a content consumption module, a feedback module and an auditing module. The main functions of the various modules are as follows:

wherein the content production module is configured to: and receiving the service content uploaded by the producer of the service content. The service Content may be PGC (Professional Generated Content), UGC (User Generated Content), PUGC (Professional User Generated Content), MCN (Multi-Channel Network), and the like.

An uplink and downlink content interface module for: the method comprises the steps of obtaining service content sent by a content production module and meta-information of the service content, wherein the meta-information can comprise attribute information and mark information, the attribute information is used for representing the attribute of the service content, and the mark information is used for representing the mark of the service content checked by a manual method. And sending the meta information of the service content to a content information storage module.

A content information storage module to: meta-information of the service content is stored.

A content warehousing module for: and the scheduling of the whole image selecting system is responsible. The content storage module acquires service content through the uplink and downlink content interface module and acquires meta information of the service content through the content information storage module; calling a video frame extraction and image-text analysis module to process the service content to obtain an image candidate set of the service content; calling an aesthetic quality model corresponding to the category to score each candidate image according to the category to which the business content belongs, so as to determine a target image of the business content; and sending the service content, the meta information of the service content and the target image to the content distribution module so that the content distribution outlet module can call the content in the content distribution module and output the content to the content consumption module.

The video frame extraction and image-text analysis module is used for: and performing frame extraction on the video to obtain an image candidate set of the video, or extracting the image in the image-text content to obtain the candidate set of the image-text content.

An aesthetic quality model to: evaluating the aesthetic quality of the candidate image; and obtaining images carrying the label information in the sample image module at regular time for optimization training.

An aesthetic quality scoring module to: the aesthetic quality model is serviced, building a service that can be invoked over the link.

A sample image module to: and storing the image carrying the marking information acquired from the content information storage module and the image carrying the marking information sent by the auditing module, wherein the image stored in the sample image module is used for training an aesthetic quality model.

A content distribution module to: and acquiring and storing the service content, the meta information of the service content and the target image from the content storage module.

A content distribution egress module to: the information stream is provided to the content consumption module according to the content stored in the content distribution module.

A content consumption module to: the information flow is displayed, and the content consumption module is provided with a feedback module.

A feedback module to: and receiving input feedback information, and acquiring the service content corresponding to the feedback information, wherein the feedback information is used for representing that the attractiveness of the service content is low.

An audit module to: and receiving the service content sent by the feedback module, rechecking the service content, acquiring the image in the service content after confirming that the aesthetic degree of the service content is low, marking the image, and adding the marked image into the sample image module to serve as the basis of the iterative optimization of the subsequent aesthetic quality model.

Fig. 2 and 7 are schematic flow charts of image selecting methods in one embodiment. It should be understood that although the steps in the flowcharts of fig. 2 and 7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 7 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the sub-steps or stages of other steps.

As shown in fig. 10, in one embodiment, there is provided an image selecting apparatus 1000, including: an acquisition module 1002, a determination module 1004, and a selection module 1006.

An obtaining module 1002, configured to obtain an image candidate set of service content;

a determining module 1004, configured to determine a category to which the service content belongs, and obtain a corresponding aesthetic quality model according to the category to which the service content belongs;

the obtaining module 1002 is further configured to obtain an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model;

a selecting module 1006, configured to select a target image according to an aesthetic quality score of each candidate image in the image candidate set.

The image selecting device 1000 obtains an image candidate set of the service content, determines a category to which the service content belongs, obtains a corresponding aesthetic quality model according to the category to which the service content belongs, obtains an aesthetic quality score of each candidate image in the image candidate set according to the aesthetic quality model, and selects a target image according to the aesthetic quality score of each candidate image in the image candidate set. According to the image selection method, the aesthetic quality of the image is evaluated through the aesthetic quality model, and the evaluation of the image in the aspect of the aesthetic quality is enhanced, so that the aesthetic degree of the selected image is improved; moreover, different aesthetic quality evaluation standards are provided for different types of images, fine-grained differentiation of the aesthetic quality of the images is realized, and the accuracy of image selection is improved. .

In an embodiment, as shown in fig. 11, the image selecting apparatus 1000 further includes a training module 1008, and the obtaining module 1002 is further configured to: acquiring a sample image and annotation information of the sample image, wherein the annotation information of the sample image comprises an annotation category of the sample image and an annotation score of the sample image; the training module 1008 is configured to: training the pre-training model according to the sample image and the labeling information of the sample image to obtain the aesthetic quality model, wherein the labeling score of the sample image is determined according to index parameters, and the index parameters comprise at least one of aesthetic degree information, key object information, user attention information and correlation degree information; the aesthetic degree information comprises color characteristics and composition characteristics; the key object information comprises the number and the aggregation size of the face regions, the number and the aggregation size of the motion regions and the number and the aggregation size of the saliency regions; the user attention information comprises a motion statistical characteristic, a boundary jitter characteristic, a camera jitter characteristic, a motion entropy characteristic and a key target motion characteristic; the relevancy information includes a relevancy of the sample image to a subject of sample business content.

In an embodiment, the obtaining module 1002 is further configured to: acquiring at least two annotation scores of each original image; and selecting an original image with the difference value between the at least two labeling scores within a preset range as the sample image.

In one embodiment, the training module 1008 is further configured to: inputting the sample image and the labeling information of the sample image into the pre-training model to obtain the prediction score probability distribution information of the sample image; updating parameters of the pre-training model according to the difference between the prediction score probability distribution information and reference score probability distribution information to obtain the aesthetic quality model, wherein the reference score probability distribution information is generated according to the mark score of the sample image.

In one embodiment, the prediction score probability distribution information includes first prediction score probability distribution information, second prediction score probability distribution information, and third prediction score probability distribution information, the aesthetic quality model includes a first loss function, a second loss function, and a third loss function, the first loss function, the second loss function, and the third loss function differ in level in the aesthetic quality model, the first loss function is used to calculate a difference between the first prediction score probability distribution information and the reference score probability distribution information, the second loss function is used to calculate a difference between the second prediction score probability distribution information and the reference score probability distribution information, the third loss function is used to calculate a difference between the third prediction score probability distribution information and the reference score probability distribution information; the training module 1008 is further configured to: calculating according to the first loss function to obtain first loss information, calculating according to the second loss function to obtain second loss information, and calculating according to the third loss function to obtain third loss information; and updating parameters of the pre-training model according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

In one embodiment, the training module 1008 is further configured to: and respectively updating parameters of the path of the first loss function, the path of the second loss function and the path of the third loss function according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

In an embodiment, the obtaining module 1002 is further configured to: acquiring a newly added sample image and the labeling information of the newly added sample image at regular time; the training module 1008 is further configured to: and training the pre-training model according to the newly added sample image and the labeling information of the newly added sample image to obtain the aesthetic quality model.

In an embodiment, the obtaining module 1002 is further configured to: if the service content is a video, extracting a video frame from the service content according to a frame extraction rule to obtain the image candidate set, wherein the frame extraction rule comprises: at least one of a motion rule, a user attention rule, a sparse coding rule, a sparse reconstruction rule, a keypoint rule, and a portrait rule.

In an embodiment, the selecting module 1006 is further configured to: ranking each candidate image in the image candidate set according to the aesthetic quality score of the candidate image; and selecting a preset number of candidate images as the target images according to the sorting result.

FIG. 12 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the terminal in fig. 1. As shown in fig. 12, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by the processor, causes the processor to implement the method of selecting an image. The internal memory may also store a computer program, which when executed by the processor, causes the processor to perform a method for selecting an image.

Those skilled in the art will appreciate that the architecture shown in fig. 12 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the image selecting device provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 12. The memory of the computer device may store various program modules constituting the image selecting means, such as the acquiring module 1002, the determining module 1004, and the selecting module 1006 shown in fig. 10. The computer program constituted by the respective program modules causes the processor to execute the steps in the image selecting method of the embodiments of the present application described in the present specification.

In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-mentioned image selection method. Here, the steps of the image selecting method may be steps in the image selecting methods of the above embodiments.

In an embodiment, a storage medium is provided, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of the above-mentioned image selection method. Here, the steps of the image selecting method may be steps in the image selecting methods of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Sync hour link) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for selecting an image, the method comprising:

acquiring an image candidate set of service content;

2. The method of claim 1, wherein the aesthetic quality model is trained based on a pre-trained model that is a convolutional neural network model trained from an image data set.

3. The method of claim 2, wherein the aesthetic quality model is trained by:

acquiring a sample image and annotation information of the sample image, wherein the annotation information of the sample image comprises an annotation category of the sample image and an annotation score of the sample image;

and training the pre-training model according to the sample image and the labeling information of the sample image to obtain the aesthetic quality model.

4. The method of claim 3, wherein the annotation score for the sample image is determined from an index parameter comprising at least one of aesthetic information, key object information, user attention information, and relevance information.

5. The method of claim 4, wherein the aesthetic information includes color features and composition features; the key object information comprises the number and the aggregation size of the face regions, the number and the aggregation size of the motion regions and the number and the aggregation size of the saliency regions; the user attention information comprises a motion statistical characteristic, a boundary jitter characteristic, a camera jitter characteristic, a motion entropy characteristic and a key target motion characteristic; the relevancy information includes a relevancy of the sample image to a subject of sample business content.

6. The method of claim 3, wherein the sample image is determined in a manner comprising:

acquiring at least two annotation scores of each original image;

and selecting an original image with the difference value between the at least two labeling scores within a preset range as the sample image.

7. The method of claim 3, wherein the training the pre-trained model according to the sample image and the label information of the sample image to obtain the aesthetic quality model comprises:

inputting the sample image and the labeling information of the sample image into the pre-training model to obtain the prediction score probability distribution information of the sample image;

updating parameters of the pre-training model according to the difference between the prediction score probability distribution information and reference score probability distribution information to obtain the aesthetic quality model, wherein the reference score probability distribution information is generated according to the mark score of the sample image.

8. The method of claim 7, wherein the prediction score probability distribution information comprises first prediction score probability distribution information, second prediction score probability distribution information, and third prediction score probability distribution information, the aesthetic quality model includes a first loss function, a second loss function, and a third loss function, the first loss function, the second loss function, and the third loss function differ in level in the aesthetic quality model, the first loss function is used to calculate a difference between the first prediction score probability distribution information and the reference score probability distribution information, the second loss function is used to calculate a difference between the second prediction score probability distribution information and the reference score probability distribution information, the third loss function is used to calculate a difference between the third prediction score probability distribution information and the reference score probability distribution information;

updating the parameters of the pre-training model according to the difference between the prediction score probability distribution information and the reference score probability distribution information to obtain the aesthetic quality model, wherein the method comprises the following steps:

calculating according to the first loss function to obtain first loss information, calculating according to the second loss function to obtain second loss information, and calculating according to the third loss function to obtain third loss information;

and updating parameters of the pre-training model according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

9. The method of claim 8,

the first loss function measures the difference between the first prediction score probability distribution information and the reference score probability distribution information in a divergence mode; the second loss function measures the difference between the second prediction score probability distribution information and the reference score probability distribution information by adopting an empirical mode decomposition mode; the third loss function measures a difference between the third prediction score probability distribution information and the reference score probability distribution information using a euclidean distance.

10. The method of claim 8, wherein updating parameters of the pre-trained model based on the first loss information, the second loss information, and the third loss information to obtain the aesthetic quality model comprises:

and respectively updating parameters of the path of the first loss function, the path of the second loss function and the path of the third loss function according to the first loss information, the second loss information and the third loss information to obtain the aesthetic quality model.

11. The method of claim 3, further comprising:

acquiring a newly added sample image and the labeling information of the newly added sample image at regular time;

and training the aesthetic quality model according to the newly added sample image and the labeling information of the newly added sample image.

12. The method of claim 1,

the acquiring of the candidate set of images of the service content includes: if the service content is a video, extracting a video frame from the service content according to a frame extraction rule to obtain the image candidate set, wherein the frame extraction rule comprises: at least one of a motion rule, a user attention rule, a sparse coding rule, a sparse reconstruction rule, a keypoint rule, and a portrait rule; or

Selecting a target image according to the aesthetic quality scores of the candidate images in the image candidate set, including: ranking each candidate image in the image candidate set according to the aesthetic quality score of the candidate image; and selecting a preset number of candidate images as the target images according to the sorting result.

13. An apparatus for selecting an image, the apparatus comprising:

14. A computer device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 12.

15. A storage medium having stored thereon computer-executable instructions that, when executed by a processor, cause the processor to perform the steps of the method of any one of claims 1 to 12.