CN112860941A

CN112860941A - Cover recommendation method, device, equipment and medium

Info

Publication number: CN112860941A
Application number: CN202110156055.1A
Authority: CN
Inventors: 陈德健; 蔡佳然
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2021-05-28

Abstract

The invention discloses a cover recommendation method, device, equipment and medium. The method comprises the steps of training at least two feature extraction models in advance, wherein each feature extraction model has different first features which can be extracted from pictures, acquiring first scores corresponding to different first features of each picture according to at least two feature extraction models which are trained in advance for each picture after the picture corresponding to a video frame in a video to be recommended is acquired, determining the comprehensive score of the picture according to each first score of the picture, and determining whether the picture is a cover of the video to be recommended or not according to whether the comprehensive score meets preset recommendation conditions or not, so that a picture with pertinence and representativeness is selected comprehensively as the cover according to different first features of the picture, and the recommended video frame effect and user experience are improved.

Description

Cover recommendation method, device, equipment and medium

Technical Field

The invention relates to the technical field of internet, in particular to a cover page recommendation method, device, equipment and medium.

Background

With the development of internet technology, the traffic of mobile multimedia is greatly increased in recent years, and novel entertainment modes such as live broadcast and short video attract wide attention and participation of people. In recent years, short videos attract a plurality of users to participate in the production and browsing of short videos by virtue of the characteristics of strong participation, low production threshold, high propagation speed and the like, and how to stand out from a plurality of short videos also becomes a research hotspot of short video producers. The short video cover can briefly outline the content of the short video to the user, quickly draw the attention of the user and play an important role in improving the video click rate and browsing volume.

The existing cover recommendation technology mainly selects a certain video frame in a short video randomly or fixedly as a cover of the short video. Although the method is simple and efficient, the selected short video cover has no pertinence to the content of the short video, the recommended cover effect is poor, and the popularization and the propagation of the video are greatly influenced.

Disclosure of Invention

The embodiment of the invention provides a cover page recommendation method, device, equipment and medium, which are used for solving the problems that the content of a short video is not targeted and the recommended video frame effect is poor when the existing recommended short video cover is used.

The embodiment of the invention provides a cover recommendation method, which comprises the following steps:

acquiring a picture corresponding to a video frame in a video to be recommended;

respectively acquiring first scores corresponding to different first characteristics of the picture based on at least two characteristic extraction models trained in advance; wherein, each feature extraction model is different from the first feature extracted from any one picture;

determining a comprehensive score of the picture according to each first score of the picture;

and if the comprehensive score is determined to meet the preset recommendation condition, determining the picture as the cover of the video to be recommended.

The embodiment of the invention provides a cover recommendation device, which comprises:

the acquisition unit is used for acquiring a picture corresponding to a video frame in a video to be recommended;

the first processing unit is used for acquiring first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively; wherein, each feature extraction model is different from the first feature extracted from any one picture;

the determining unit is used for determining a comprehensive score of the picture according to each first score of the picture;

and the second processing unit is used for determining the picture as the cover of the video to be recommended if the comprehensive score is determined to meet the preset recommendation condition.

An embodiment of the present invention provides an electronic device, where the electronic device at least includes a processor and a memory, and the processor is configured to implement the steps of any one of the cover recommendation methods described above when executing a computer program stored in the memory.

The embodiment of the invention provides a computer-readable storage medium, which stores a computer program, wherein when the computer program is executed by a processor, the computer program implements the steps of any one of the above-mentioned cover recommendation methods, in the cover recommendation process, since at least two feature extraction models are trained in advance, each feature extraction model has different first features which can be extracted from pictures, and subsequently, after the pictures corresponding to the video frames in the video to be recommended are obtained, for each picture, first scores corresponding to different first features of the picture are obtained based on at least two feature extraction models trained in advance, respectively, a comprehensive score of the picture is determined according to each first score of the picture, and then whether the picture is the cover of the video to be recommended is determined according to whether the comprehensive score meets a preset recommendation condition, so that different first features according to the picture are implemented, a picture with pertinence and representativeness in content is comprehensively selected as a cover, and the recommended video frame effect and the user experience are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a diagram illustrating a cover recommendation process according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a cover recommendation process according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a cover recommendation device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the attached drawings, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve accuracy and effect of a cover page of a recommended video, embodiments of the present invention provide a cover page recommendation method, apparatus, device, and medium.

Example 1:

fig. 1 is a schematic diagram of a cover recommendation process provided in an embodiment of the present invention, where the process includes:

s101: and acquiring a picture corresponding to a video frame in the video to be recommended.

The cover recommendation method provided by the embodiment of the invention is applied to electronic equipment, and the electronic equipment can be intelligent equipment such as a mobile terminal and the like and can also be a server.

In an actual application scene, after receiving a processing request for performing cover page recommendation on a certain video, an electronic device performing cover page recommendation determines the video as a video to be recommended, and performs corresponding processing by using the cover page recommendation method provided by the embodiment of the invention based on the video to be recommended.

The method for performing cover page recommendation includes that an electronic device performing cover page recommendation receives a processing request for performing cover page recommendation on a certain video, and mainly includes at least one of the following conditions:

in the first case, when a user needs to process a certain video, a service processing request may be input to the intelligent device, and after receiving the service processing request for the video, the intelligent device may send a processing request for performing cover page recommendation for the video to the electronic device performing cover page recommendation.

And in the second situation, when the intelligent device detects that a certain video is recorded, a processing request for performing cover page recommendation on the video can be actively generated and sent to the electronic device for performing cover page recommendation.

And thirdly, when the user needs to determine the cover of a certain video, a cover processing request can be input into the intelligent device, and after the intelligent device receives the cover processing request of the video, the intelligent device can send a processing request for cover recommendation of the video to the electronic device for cover recommendation.

The electronic device for making cover page recommendations may be the same as or different from the smart device.

In a specific implementation process, after receiving a processing request for performing cover page recommendation on a certain video, an electronic device performing cover page recommendation determines the video as a video to be recommended, and obtains a picture corresponding to a video frame in the video to be recommended.

The pictures corresponding to the video frames in the video to be recommended can be obtained by the electronic device performing cover page recommendation according to the video to be recommended, and can also be sent by other electronic devices.

In the embodiment of the invention, part of the video frames can be extracted from the video to be recommended according to a preset frame extraction strategy, and the extracted part of the video frames can be converted into corresponding pictures, and all the video frames in the video to be recommended can be converted into corresponding pictures according to a full frame extraction mode.

S102: respectively acquiring first scores corresponding to different first characteristics of the picture based on at least two characteristic extraction models trained in advance; wherein each feature extraction model is different from the first feature extracted from any one of the pictures.

In an actual application scenario, in order to implement cover page recommendation of a video to be recommended, after a picture corresponding to a video frame in the video to be recommended is obtained, a trained cover page recommendation model is used to obtain a score corresponding to a certain type of first feature in the picture, such as brightness, color saturation and the like, and then a cover page of the video to be recommended is determined according to whether the score corresponding to the type of first feature meets a preset recommendation condition. According to the method, the picture with the prominent first feature in the video to be recommended can be selected as the cover page, but the cover page is recommended only from the single first feature of the picture, the cover page of the video to be recommended is not determined by combining the different first features of the picture, some other pictures with very poor first features are easily determined as the cover page, other first features of the picture, composition aesthetics and other factors are omitted, and the accuracy and effect of the determined cover page are reduced.

For example, if the cover is determined only according to the definition of the picture, it is likely that a certain illegal picture with higher definition is determined as the cover, so that the cover of the video to be recommended is likely to cause adverse effects.

In order to solve the above problem, in the embodiment of the present invention, at least two feature extraction models are trained in advance, and each feature extraction model extracts a different first feature for any one picture. After the picture corresponding to the video frame in the video to be recommended is acquired based on the above embodiment, the scores (for convenience of description, recorded as first scores) corresponding to different first features of the picture are acquired through at least two feature extraction models which are trained in advance respectively. And subsequently, corresponding processing is carried out based on the first scores corresponding to different first characteristics of the picture.

S103: and determining a comprehensive score of the picture according to each first score of the picture.

In order to facilitate subsequent determination of whether the picture is a cover of a video to be recommended, after first scores corresponding to different first characteristics of the picture are obtained, corresponding processing can be performed on each first score corresponding to the picture, and a comprehensive score is determined. Whether the picture is a cover page of the video to be recommended or not can be determined based on the comprehensive score.

In one possible implementation, a weight value corresponding to each first feature may be preset. After the first scores corresponding to different first features of the picture are obtained, for each first feature, the weighted first score is determined according to the weight value corresponding to the first feature and the corresponding first score. And determining the comprehensive score of the picture according to the sum of the weighted first scores corresponding to each first characteristic.

S104: and if the comprehensive score is determined to meet the preset recommendation condition, determining the picture as the cover of the video to be recommended.

In order to accurately determine the cover page of the video to be recommended, recommendation conditions are preset in the embodiment of the invention. The preset recommendation condition may be that the comprehensive score of the picture is the maximum value of the comprehensive scores of each picture corresponding to the to-be-recommended picture, and is greater than a preset threshold. After the comprehensive score is obtained based on the above embodiments, whether the comprehensive score meets a preset recommendation condition is determined. Specifically, if the comprehensive score is determined to meet the preset recommendation condition, determining the picture as a cover of the video to be recommended; and if the comprehensive score is determined not to meet the preset recommendation condition, the picture is not suitable for being used as the cover of the video to be recommended, and then the next picture corresponding to the video to be recommended is obtained.

In the cover page recommendation process, because at least two feature extraction models are trained in advance, the first features which can be extracted by each feature extraction model for a picture are different, subsequently, after the picture corresponding to a video frame in a video to be recommended is obtained, for each picture, first scores corresponding to different first features of the picture are obtained respectively based on the at least two feature extraction models trained in advance, according to each first score of the picture, the comprehensive score of the picture is determined, and subsequently, according to whether the comprehensive score meets a preset recommendation condition or not, whether the picture is the cover page of the video to be recommended is determined, so that a picture with targeted and representative content is selected comprehensively as the cover page according to different first features of the picture, and the recommended video frame effect and the user experience are improved.

Example 2:

in order to accurately determine the cover page of the video to be recommended, on the basis of the above embodiment, in an embodiment of the present invention, the determining that the comprehensive score meets a preset recommendation condition includes:

if the comprehensive score is determined to be the maximum value of the comprehensive scores of all the pictures corresponding to the video to be recommended, determining that the comprehensive score meets a preset recommendation condition; and/or

And if the comprehensive score is determined to be larger than a preset threshold value, determining that the comprehensive score meets a preset recommendation condition.

In the embodiment of the present invention, the preset recommendation condition may be that the comprehensive score is a maximum value of the comprehensive scores of each picture corresponding to the video to be recommended, or that the comprehensive score is greater than a preset threshold. Therefore, the determination that the comprehensive score meets the preset recommendation condition mainly comprises the following ways:

in a first mode, in order to ensure that a picture with the most targeted and representative content can be selected from each picture corresponding to the video to be recommended comprehensively as a cover page, a preset recommendation condition can be determined as the maximum value of the comprehensive scores of each picture corresponding to the video to be recommended. After the comprehensive score of each picture corresponding to the video to be recommended is obtained based on the above embodiment, the maximum value of each comprehensive score is obtained, and if the maximum value meets the preset recommendation condition, the picture corresponding to the maximum value is determined as the cover of the video to be recommended.

In a second mode, in order to ensure the quality of the cover page of the determined video to be recommended, a threshold value is preset in the embodiment of the invention, and the preset recommendation condition is determined as that the comprehensive score is larger than the preset threshold value. After the comprehensive score of the picture corresponding to the video to be recommended is obtained based on the above embodiment, the comprehensive score is compared with a preset threshold value. If the comprehensive score is determined to be larger than a preset threshold value, the picture can be used as a cover of the video to be recommended, the comprehensive score is determined to meet a preset recommendation condition, and the picture is determined to be the cover of the video to be recommended; and if the comprehensive score is determined to be not greater than the preset threshold value, which indicates that the picture can not be used as a cover of the video to be recommended, obtaining a next picture corresponding to the video to be recommended.

In the embodiment of the invention, after whether the comprehensive score of each picture corresponding to the video to be recommended is greater than the preset threshold value or not can be randomly or sequentially determined, any picture is determined to be the cover of the video to be recommended from the pictures of which the comprehensive score is greater than the preset threshold value. Whether the comprehensive score of any picture corresponding to the video to be recommended is larger than a preset threshold value or not can be randomly or sequentially determined, and when the comprehensive score of the picture is larger than the preset threshold value, the picture is determined as the cover of the video to be recommended.

When the threshold is set, different values can be set according to different scenes, and if the accuracy of the cover page of the determined video to be recommended is improved as much as possible, the threshold can be set to be larger; the threshold value may be set smaller if it is avoided to avoid transmission of a situation in which the cover page of the video to be recommended cannot be determined. In the specific implementation process, the flexible setting can be performed according to the actual requirement, and is not specifically limited herein.

In a third mode, because the situation that the comprehensive score of each picture corresponding to the video to be recommended is not greater than the preset threshold value may occur, in order to determine the cover page of the video to be recommended, the recommendation conditions preset in the first mode and the second mode may be combined, that is, when the acquired picture meets any one of the preset recommendation conditions, the picture is determined as the cover page of the video to be recommended. By the method, when the comprehensive score of each picture corresponding to the video to be recommended is determined to be not greater than the preset threshold value, the picture corresponding to the maximum value in the comprehensive scores can be determined as the cover of the video to be recommended.

Example 3:

in order to ensure that a video cover meets a network culture requirement, on the basis of the foregoing embodiments, in an embodiment of the present invention, after obtaining a picture corresponding to a video frame in a video to be recommended, before obtaining first scores corresponding to different first features of the picture based on at least two feature extraction models that are trained in advance, the method further includes:

determining whether the picture is illegal through an illegal detection model;

and if the picture is determined to be in violation, performing subsequent steps of obtaining first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively.

With the continuous development of internet technology, the appearance of short video software and the mass spread of short videos greatly enrich the spiritual life of people. The short video cover is used as a quick presentation form of the short video content, can briefly present the short video content to the user, attracts the attention of the user, and achieves the purpose of improving the video click rate. But also illegal users can attract the attention of other users by taking illegal pictures in the video as cover pages in the video, thereby greatly influencing the network culture. Therefore, in order to avoid adverse effects caused by illegal pictures, in the embodiment of the present invention, an illegal detection model is trained in advance, so as to determine whether the picture is illegal through the illegal detection model. After the picture corresponding to the video frame in the video to be recommended is acquired based on the above embodiment, whether the illegal content exists in the picture can be determined through the illegal detection model, so that whether the picture is an illegal picture is determined.

In a possible implementation manner, a probability value of whether the picture violates the rule or not may be output through the violation detection model, the probability value represents the possibility of violation of the picture, the higher the value of the probability value, the higher the possibility of violation of the picture, and in order to accurately determine whether the picture violates the rule or not, a violation threshold is preset. And subsequently, according to a preset violation threshold and the obtained probability value, whether the picture violates rules can be determined.

For convenience in determining whether the picture is illegal, the size of the probability value is a probability value within a preset value range, such as a value between (0, 1).

In another possible implementation, the violation detection model may also directly output an identification value of whether the picture violates the rule, and whether the picture violates the rule may be determined directly according to the identification value output by the violation detection model. For example, the output flag value is "1" indicating that the picture is illegal, and the output flag value is "0" indicating that the picture is not illegal.

When the picture is determined to be in violation, acquiring first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models, determining a comprehensive score of the picture based on each first score, and determining whether the picture is a front cover of the video to be recommended according to whether the comprehensive score meets a preset recommendation condition. When the picture is determined to have the illegal content, the picture is determined not to meet the requirement of network culture, the picture is directly determined not to be the cover of the video to be recommended, and the next picture corresponding to the video to be recommended is obtained.

In a possible implementation manner, in order to timely remind a user of processing a video frame corresponding to an illegal picture in a video to be recommended, after the picture is determined to be illegal, prompt information of illegal content of the video to be recommended can be generated, and intelligent equipment is controlled to output the prompt information. The user can determine that the illegal content exists in the video to be recommended in time according to the prompt information output by the intelligent device, so that the illegal content in the video to be recommended can be deleted or shielded in time.

In another possible implementation, in order to accurately remind the user of the illegal content of the video to be recommended, after the picture is determined to be illegal, a video frame (for convenience of explanation, recorded as an illegal video frame) corresponding to the picture in the video to be recommended is determined, prompt information of the illegal content of the video to be recommended and the position of the illegal video frame in the video to be recommended is generated, and the intelligent device is controlled to output the prompt information. According to the prompt information output by the intelligent equipment, the user can timely determine that the illegal content exists in the video to be recommended and the position of the illegal content in the video to be recommended, so that the illegal video frames in the video to be recommended can be timely and accurately deleted or shielded.

The prompt information output by the intelligent device may be prompt information in an audio format, such as voice broadcast prompt information "content violating current video exists", or prompt information corresponding to a text form may be displayed on a display interface, such as modes of displaying the prompt information "content violating current video contents in 3 rd to 5 th seconds" on the display interface, and frame pop prompt. Of course, two ways of outputting the prompt information may also be combined at the same time, such as simultaneously broadcasting the prompt information in the audio format and displaying the prompt information in the text format on the display interface. Specifically, the setting can be flexibly set according to actual requirements, and is not limited herein.

In order to obtain the violation detection model, in the embodiment of the present invention, a sample set (for convenience of description, denoted as a first sample set) for training the violation detection model needs to be collected in advance, where the first sample set includes a sample video (for convenience of description, denoted as a first sample video), a sample picture (for convenience of description, denoted as a first sample picture) corresponding to a video frame included in the first sample video is obtained, each first sample picture is labeled, and a first label of each first sample picture is determined. The first label of any first sample picture is used for identifying whether the first sample picture violates rules or not, and can be represented by numbers, letters, character strings and the like or other forms, and only the first sample picture violating the rules and the first sample picture not violating the rules are guaranteed to be distinguishable. A first sample picture with a first label identifying that the first sample picture does not violate may be determined as a positive sample, and a first sample picture with a first label identifying that the first sample picture violates may be determined as a negative sample. And subsequently training the original violation detection model based on the acquired first labels of the positive sample and the positive sample, and the acquired first labels of the negative sample and the negative sample.

The first label of the first sample picture can be determined in a manual labeling mode, or can be determined by a general violation detection model which is trained and completed based on a large number of pictures in advance. The violation detection model is obtained based on a convolutional neural network.

The electronic device performing the violation detection model training may be the same as or different from the electronic device performing the cover recommendation.

In a specific implementation process, any first sample picture is obtained, and whether the first sample picture is an illegal identification result is determined through an original illegal detection model; and determining a loss value of the first sample picture according to the identification result and the first label of the first sample picture. Based on the loss value of the first sample picture, training an original violation detection model to adjust parameter values of parameters in the original violation detection model.

The first sample video in the first sample set corresponds to a plurality of first sample pictures, and the steps are executed for each first sample picture until the convergence condition is reached, and then the violation detection model training is determined to be completed.

The condition that the preset convergence condition is met may be that the sum of the loss values of each first sample picture is determined to be smaller than a preset convergence threshold, or that the number of iterations for training the original violation detection model reaches a set maximum number of iterations, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.

In a possible implementation manner, when the violation detection model is trained, the first sample picture may be divided into a training sample and a test sample, the original violation detection model is trained based on the training sample, and then the reliability of the trained violation detection model is verified based on the test sample.

By the method, only pictures which are not illegal in the video to be recommended can be determined as the cover of the video to be recommended, so that the situation that low-quality illegal pictures are determined as the cover of the video to be recommended is avoided, the probability that the illegal pictures are recommended as the cover is reduced, subsequently, a picture with pertinence and representativeness in content is selected from the pictures which are not illegal on the basis of at least two characteristic extraction models as the cover, a user is attracted to click and browse the video through the high-quality cover, and the purpose of improving the click rate and the video transmission degree is achieved.

Example 4:

in order to further ensure the quality of the cover page of the determined video to be recommended, on the basis of the above embodiments, in an embodiment of the present invention, the determining a comprehensive score of the picture according to each first score of the picture includes:

acquiring a second score corresponding to a second feature of the picture; wherein the second characteristic comprises at least one of brightness, sharpness, color consistency, sharpness, presence or absence of motion blur of the picture;

and determining a comprehensive score of the picture according to the second score of the picture, a first preset weight value corresponding to the second score and a second weight value corresponding to each first characteristic of each first score and each first preset characteristic.

In an actual application scene, characteristics (for convenience of description, referred to as a second characteristic) such as brightness, sharpness, color consistency, definition, and whether motion blur exists in a video cover also have an influence on the attraction of a user, the click rate of a video, and the propagation degree of the video. Generally, the better the second feature of the cover, the stronger the attraction of the video to the user, the higher the click rate of the video, and the wider the dissemination of the video. Therefore, in order to further ensure the quality of the cover page of the determined video to be recommended, in the embodiment of the invention, whether the picture is the cover page of the video to be recommended or not may be determined based on the second characteristic and the first characteristic of the picture.

In one possible embodiment, if the second feature includes only one, the feature score corresponding to the second feature is determined directly by conventional image processing techniques. And determining a second score corresponding to a second feature of the picture according to the feature score.

The specific process of obtaining the feature score belongs to the prior art, and for example, the feature score corresponding to the definition of the picture is calculated by using a laplacian transform algorithm, and the feature score corresponding to the motion blur is determined by using the minimum gradient direction.

In another possible implementation manner, if the second feature includes at least two, the obtaining a second score corresponding to the second feature of the picture includes:

for each second feature, acquiring a feature score corresponding to the second feature of the picture;

and determining a second score corresponding to a second feature of the picture according to each feature score of the picture.

When the second features include at least two kinds, a feature score corresponding to each second feature of the picture can be obtained through a conventional image processing algorithm. And then, according to the feature score corresponding to each second feature, performing corresponding processing to determine a second score corresponding to the second feature of the picture.

In a possible implementation manner, after the feature score corresponding to each second feature is obtained, a harmonic mean of each feature score may be obtained, and the harmonic mean may be determined as a second score corresponding to the second feature of the picture.

In order to facilitate subsequent determination of the comprehensive score of the picture, after the feature score corresponding to the second feature of the picture is obtained through a traditional image processing technology, the obtained feature score corresponding to each second feature is mapped into the same value range according to a preset change threshold corresponding to each second feature, and the feature score corresponding to each second feature is updated according to the transformed feature score corresponding to each second feature. For example, the feature score corresponding to each second feature is linearly changed to a value range (0,1) according to a preset change threshold corresponding to each second feature. And if the transformed feature score corresponding to the second feature is larger, the second feature in the picture is better.

In order to comprehensively consider the second features and the first features of the picture, a weight value (for convenience of description, denoted as a first weight value) corresponding to each second feature and a weight value (for convenience of description, denoted as a second weight value) corresponding to each first feature are configured in advance. After a second score corresponding to a second feature of the picture and a first score corresponding to a first feature of the picture are obtained, a first weight value corresponding to the second feature is determined for each second feature, and then a weighted second score corresponding to the second feature is determined according to the first weight value corresponding to the second feature and the second score corresponding to the second feature. And for each first feature, determining a second weight value corresponding to the first feature, and then determining a weighted first score corresponding to the first feature according to the second weight value corresponding to the first feature and the first score corresponding to the first feature. And determining the comprehensive score of the picture according to each weighted second score and each weighted first score corresponding to the picture.

Example 5:

in an actual application scene, a picture corresponding to a video frame containing a human face in a video is more suitable for being used as a cover of the video, and a picture which is not excessively exposed, dark, blurred and noisy is more suitable for being used as a cover of the video, namely, a picture with better picture quality is more suitable for being used as a cover of the video, and a picture which accords with aesthetics is more suitable for being used as a cover of the video. Therefore, in order to ensure that different first features of the picture are synthesized and the cover page of the video to be recommended is accurately determined, in the embodiment of the invention, the pre-trained feature extraction model comprises at least two of a face recognition model, a picture quality evaluation model and an aesthetic scoring model, so that the first feature of the face contained in the picture can be recognized through the face recognition model, the first feature in the picture quality can be evaluated through the picture quality evaluation model, and the first feature in the aesthetic aspect of the picture can be evaluated through the aesthetic scoring model.

Based on this, in the embodiment of the present invention, the obtaining of the first scores corresponding to different first features of the picture based on at least two feature extraction models that are trained in advance respectively mainly includes the following steps:

in case one, if the feature extraction model includes a face recognition model, the obtaining the first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively includes:

acquiring first position information of each target detection frame containing a human face in the picture and second position information of key points of a mouth on the human face in each target detection frame through the human face recognition model;

determining a face position score corresponding to the picture based on the first position information of each target detection frame, determining a face proportion score corresponding to the picture based on the size of each target detection frame and the size of the picture, and determining a facial expression score corresponding to the picture based on the second position information of key points of a mouth on the face in each target detection frame;

and determining face attribute values corresponding to the face attribute features in the picture according to the face position values, the face proportion values and the face expression values.

In the practical application process, the picture containing the face is more suitable for being used as a video cover than the picture without the face, the picture containing the smiling face is more suitable for being used as a video cover than the picture containing the non-smiling face, the picture containing the face in the center of the picture is more suitable for being used as a video cover than the picture containing the face at the corner position of the picture, and the picture containing the face with moderate proportion in the picture is more suitable for being used as a video cover than the picture containing the face with too large proportion or too small proportion in the picture. Based on this, in the embodiment of the present invention, a face recognition model is trained in advance, so that whether a picture includes features such as a face, a position of the face, a proportion of the face in the picture, and an expression of the face can be recognized through the face recognition model, thereby facilitating determination of a first score corresponding to the first feature in the picture, that is, determining a face attribute score corresponding to a face attribute feature in the picture, and further facilitating determination of whether the picture is suitable for being used as a cover according to the face attribute feature.

In a specific implementation process, after a picture corresponding to a video frame in a video to be recommended is acquired based on the above embodiment, through a face recognition model trained in advance, position information (for convenience of description, denoted as first position information) of a target detection frame containing a face in the picture and position information (for convenience of description, denoted as second position information) of a key point of a mouth on the face in each target detection frame can be recognized. And subsequently, according to the acquired first position information and the acquired second position information, performing corresponding processing to determine whether the picture contains the characteristics of the face, the position of the face, the ratio of the face in the picture, the expression of the face and the like.

If the first position information of each target detection frame containing the face in the picture can be acquired through the face recognition model, the picture containing the face is indicated, and the position information containing the face and the face in the picture can be determined according to the first position information of each target detection frame. And subsequently, respectively determining the area of each target detection frame in the picture based on the first position information of each target detection frame, performing corresponding processing according to the area of each target detection frame in the picture, and determining the face position score corresponding to the picture.

In one possible embodiment, the correspondence between each region contained in the picture and the score is pre-configured. After the area where each target detection frame is located in the picture is determined based on the above embodiment, the target score corresponding to the area where each target detection frame is located in the picture is determined according to the preset corresponding relationship between each area and the score. And determining the face position score corresponding to the picture according to each target score.

In a possible implementation manner, each target score can be directly added to determine a face position score corresponding to the picture; or determining a target weight value corresponding to the region of each target detection frame in the picture according to a preset corresponding relation between each region and the weight value, and determining a face position score corresponding to the picture according to the target weight value corresponding to the region of each target detection frame in the picture and the corresponding target score; and determining the maximum value of each target score, and determining the face position score corresponding to the picture according to the maximum value.

Based on the first position information of each target detection frame, the size of each target detection frame can be determined, namely the size of each face in the picture is determined, and according to the determined size of each target detection frame and the size of the picture, the proportion of each target detection frame in the picture can be determined, namely the proportion of each face in the picture is determined. And performing corresponding processing according to the proportion of each target detection frame in the picture, namely determining the face proportion value corresponding to the picture.

In one possible implementation, the corresponding relationship between each proportion range and the proportion value contained in the picture is configured in advance. After the ratio of each target detection frame in the picture is determined based on the above embodiment, the target ratio ranges in which the ratio corresponding to each target detection frame is respectively located are determined. And determining the target proportion value corresponding to each target proportion range according to the preset corresponding relation between each proportion range and the proportion value. And determining the face proportion score corresponding to the picture according to each target proportion score.

In a possible implementation manner, each target proportion score can be directly added, and a face proportion score corresponding to the picture is determined; or determining a target weight value corresponding to each target proportion range according to a preset corresponding relation between each proportion range and the weight value, and determining a face proportion score corresponding to the picture according to the target weight value corresponding to each target proportion range and the corresponding target proportion score; and determining the maximum value of each target proportion score, and determining the face proportion score corresponding to the picture according to the maximum value.

And performing corresponding processing based on the second position information of the key points of the mouth on the face in each target detection frame, so as to determine a target expression score corresponding to each face. And determining the facial expression score corresponding to the picture according to the target expression score corresponding to each face.

In one possible implementation, the second position information of the key points of the mouth on the face in any one of the target detection frames may be matched with the position information of the mouth at each expression saved in advance, so as to determine the target expression of the face in the target detection frame. And determining a target expression score corresponding to the target expression according to a preset corresponding relation between the expression and the expression score.

In a possible implementation manner, each target expression score can be directly added to determine a facial expression score corresponding to the picture; or a target weight value corresponding to each target expression can be determined according to a preset corresponding relation between each expression and the weight value, and a facial expression score corresponding to the picture is determined according to the target weight value corresponding to each target expression and the corresponding target expression score; and determining the maximum value of each target expression score, and determining the facial expression score corresponding to the picture according to the maximum value.

After the face position score, the face proportion score and the face expression score are determined based on the embodiment, corresponding processing is carried out, and the face attribute score corresponding to the face attribute feature in the picture is determined.

In a possible implementation manner, after the face position score, the face proportion score and the facial expression score are obtained, a harmonic mean of the face position score, the face proportion score and the facial expression score may be obtained, and the harmonic mean is determined as a facial attribute score corresponding to a facial attribute feature in a picture.

In order to conveniently and subsequently determine the comprehensive score of the picture, after the face position score, the face proportion score and the face expression score are obtained, the obtained face position score, the face proportion score and the face expression score are mapped into the same value range, and the face position score, the face proportion score and the face expression score are updated according to the transformed feature scores corresponding to the face position score, the face proportion score and the face expression score respectively. For example, the face position score, the face proportion score and the facial expression score are linearly changed to a value range (0,1) respectively. The larger the transformed feature score is, the better the face attribute features corresponding to the transformed feature score in the picture are.

In case two, if the feature extraction model includes a picture quality evaluation model, the obtaining the first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance respectively includes:

acquiring a probability vector of picture quality characteristics in the picture through the picture quality evaluation model; wherein the picture quality characteristics include at least one of overexposure, excessive dimming, excessive blurring, and excessive noise;

and determining a picture quality attribute score corresponding to the comprehensive quality feature in the picture according to each probability value contained in the probability vector.

In practical application scenarios, the picture quality such as whether the video cover is over-exposed, over-dark, over-blurred, and over-noise also has an influence on the attractiveness of the user, the click rate of the video, and the video dissemination. Generally, the better the picture quality of the cover, the stronger the attraction of the video to the user, the higher the click rate of the video, and the wider the spreading of the video. Therefore, in order to further ensure the quality of the cover page of the determined video to be recommended, in the embodiment of the invention, the probability vector of the picture quality feature in the picture is acquired through the picture quality evaluation model trained in advance. The probability vector contains a probability value corresponding to each image quality characteristic. Wherein the image quality characteristic comprises at least one of overexposure, excessive darkness, excessive blurring, and excessive noise. And subsequently, based on each probability value contained in the probability vector, performing corresponding processing to determine a picture quality attribute score corresponding to the comprehensive quality feature in the picture.

In a possible implementation manner, in order to facilitate subsequent calculation, each probability value included in the probability vector of the picture quality feature in the picture obtained by using the picture quality assessment model trained in advance is within a preset value range, such as (0, 1).

In a possible implementation manner, after the probability vector is obtained, a harmonic mean of each probability value included in the probability vector may be obtained, and the harmonic mean may be determined as a picture quality attribute score corresponding to the comprehensive quality feature in the picture. The larger a certain probability value contained in the probability vector is, the better the picture quality characteristics corresponding to the probability value in the picture are.

And thirdly, if the feature extraction model comprises an aesthetic scoring model, acquiring first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively, wherein the third step comprises the following steps:

and determining the aesthetic attribute score corresponding to the aesthetic features in the picture through the aesthetic scoring model.

For example, a picture with reasonable composition is more suitable as a video cover. Therefore, in embodiments of the present invention, an aesthetic scoring model is pre-trained. After the picture corresponding to the video frame included in the video to be recommended is acquired based on the above embodiment, the picture is input into the model to be scored aesthetically. And obtaining an aesthetic attribute score corresponding to the aesthetic features in the picture through an aesthetic scoring model.

When the face recognition model, the picture quality evaluation model and the aesthetic scoring model are included, after the face attribute score, the picture quality attribute score and the aesthetic attribute score are obtained based on the above embodiment, the comprehensive score is determined based on the face attribute score and a second weight value pre-configured for the face attribute score, the picture quality attribute score and a second weight value corresponding to the picture quality attribute score, the aesthetic attribute score and a second weight value corresponding to the aesthetic attribute score. And then judging whether the comprehensive score is larger than a preset recommendation condition or not, thereby determining whether the picture is a cover page of a recommended video or not.

In a possible implementation manner, when a feature extraction model including a face recognition model, a picture quality evaluation model, and an aesthetic score model is included, after a second score, a face attribute score, a picture quality attribute score, and an aesthetic attribute score corresponding to a second feature of a picture are obtained based on the above embodiment, a comprehensive score is determined based on the second score and a first weight value corresponding to the second score, the face attribute score and a second weight value configured in advance, the picture quality attribute score and a second weight value corresponding to the picture quality attribute score, and the aesthetic attribute score and a second weight value corresponding to the aesthetic attribute score. And then judging whether the comprehensive score is larger than a preset recommendation condition or not, thereby determining whether the picture is a cover page of a recommended video or not.

In one possible embodiment, the comprehensive score is determined based on the second score and the corresponding first weight value, the face attribute score and the preconfigured second weight value, the comprehensive quality score and the corresponding second weight value, and the aesthetic attribute score and the corresponding second weight value, and may be determined by the following formula:

re_score＝a*tf_score+b*face_score+c*iqa_score+d*aes_score；

wherein tf _ score represents a second score, a represents a first weight value corresponding to the second score, face _ score represents a face attribute score, b represents a first weight value corresponding to the face attribute score, iqa _ score represents a picture quality attribute score, c represents a first weight value corresponding to the picture quality attribute score, aes _ score represents an aesthetic attribute score, and d represents a first weight value corresponding to the aesthetic attribute score.

In order to acquire the face recognition model, in the embodiment of the present invention, a sample set (for convenience of description, referred to as a second sample set) for training the face recognition model needs to be collected in advance. The second set of samples may be the same as the first set of samples or may be different. The second sample set includes sample videos (for convenience of description, denoted as second sample videos), sample pictures (for convenience of description, denoted as second sample pictures) corresponding to video frames included in the second sample videos are obtained, each second sample picture is labeled, and a second label of each second sample picture is determined. The second label of any second sample picture is used for identifying face position information of a detection frame of a face contained in the second sample picture and lip position information of a key point of a mouth on the face. The second sample picture containing the face may be determined as a positive sample, and the second sample picture not containing the face may be determined as a negative sample. And subsequently training the original face recognition model based on the acquired second labels of the positive sample and the positive sample, and the acquired second labels of the negative sample and the negative sample.

Wherein, the second label of the second sample picture can be determined by means of manual labeling. The original face recognition model is a convolutional neural network model.

It should be noted that the electronic device performing face recognition model training may be the same as or different from the electronic device performing cover page recommendation.

In a specific implementation process, any second sample picture is obtained, and through an original face recognition model, position information (for convenience of description, recorded as third position information) of a target detection frame containing a face and position information (for convenience of description, recorded as fourth position information) of a key point of a mouth on the face in the target detection frame are determined in the second sample picture; and determining the loss value of the second sample picture according to the three-position information, the corresponding face position information, the fourth position information and the lip position information corresponding to the second sample picture. And training the original face recognition model based on the loss value of the second sample picture so as to adjust the parameter values of the parameters in the original face recognition model.

And the second sample video in the second sample set corresponds to a plurality of second sample pictures, and the steps are executed for each second sample picture until the convergence condition is reached, and the face recognition model training is determined to be finished.

The condition that the preset convergence condition is met may be that the sum of the loss values of each second sample picture is determined to be smaller than a preset convergence threshold, or that the number of iterations for training the original face recognition model reaches a set maximum number of iterations, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.

In a possible implementation manner, when the face recognition model is trained, the second sample picture may be divided into a training sample and a test sample, the original face recognition model is trained based on the training sample, and then the reliability of the trained face recognition model is verified based on the test sample.

In order to obtain the picture quality assessment model, in the embodiment of the present invention, a sample set (for convenience of description, referred to as a third sample set) for training the picture quality assessment model needs to be collected in advance, and the third sample set may be the same as or different from the first sample set. The third sample set includes sample videos (for convenience of description, denoted as third sample videos), sample pictures (for convenience of description, denoted as third sample pictures) corresponding to video frames included in the third sample videos are obtained, and each third sample picture is labeled to determine a third label of each third sample picture. The third label of any third sample picture is used to identify the picture quality in the third sample picture, i.e. identify whether there are features of overexposure, excessive dimming, excessive blurring, and excessive noise in the third sample picture. And subsequently training the original image quality evaluation model based on the acquired third sample image and the corresponding third label.

Wherein, the third label of the third sample picture can be determined by means of manual labeling. The picture quality evaluation model is obtained based on a convolutional neural network.

The electronic device performing the picture quality evaluation model training may be the same as or different from the electronic device performing the cover page recommendation.

In a specific implementation process, any third sample picture is obtained, and a probability vector of picture quality characteristics in the third sample picture is determined through an original picture quality evaluation model; and determining a loss value of the third sample picture according to the probability vector and a third label of the third sample picture. And training the original image quality evaluation model based on the loss value of the third sample image so as to adjust the parameter values of the parameters in the original image quality evaluation model.

The output probability vector includes probability values of whether overexposure, excessive darkness, excessive blur and excessive noise exist in the third sample picture, and the probability values are all in the same value range, such as (0, 1). The larger the probability value is, the better the picture quality characteristic corresponding to the probability value in the third sample picture is.

And the third sample video in the third sample set corresponds to a plurality of third sample pictures, and the steps are executed for each third sample picture until the convergence condition is reached, and the picture quality evaluation model is determined to be trained completely.

The condition that the preset convergence condition is met may be that the sum of the loss values of each third sample picture is determined to be smaller than a preset convergence threshold, or that the number of iterations for training the original picture quality assessment model reaches a set maximum number of iterations, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.

In a possible implementation manner, when the picture quality assessment model is trained, the third sample picture may be divided into a training sample and a test sample, the original picture quality assessment model is trained based on the training sample, and then the reliability of the trained picture quality assessment model is verified based on the test sample.

In order to obtain the aesthetic scoring model, in the embodiment of the present invention, a sample set (for convenience of description, referred to as a fourth sample set) for training the aesthetic scoring model needs to be collected in advance, and the fourth sample set may be the same as or different from the first sample set. The fourth sample set includes sample videos (for convenience of description, denoted as fourth sample videos), sample pictures (for convenience of description, denoted as fourth sample pictures) corresponding to video frames included in the fourth sample videos are obtained, and each fourth sample picture is labeled to determine a fourth label of each fourth sample picture. The fourth label of any fourth sample picture is used to identify the aesthetic quality in that fourth sample picture. A high quality fourth sample picture may be determined as a positive sample and a low quality fourth sample picture may be determined as a negative sample. And subsequently training the original aesthetic scoring model based on the obtained positive sample and the corresponding fourth label thereof, and the obtained negative sample and the corresponding fourth label thereof.

The fourth label of the fourth sample picture may be determined by manual labeling according to the composition rule of the photography and the rule such as the content information of the picture. The aesthetic scoring model is obtained based on a convolutional neural network.

The electronic device performing the aesthetic scoring model training may be the same as or different from the electronic device performing the cover page recommendation.

In a specific implementation process, any fourth sample picture is obtained, and the aesthetic attribute score of the aesthetic features in the fourth sample picture is determined through an original aesthetic scoring model; determining a loss value of the fourth sample picture according to the aesthetic attribute score and a fourth label of the fourth sample picture. Based on the loss value of the fourth sample picture, the original aesthetic scoring model is trained to adjust parameter values of parameters in the original aesthetic scoring model.

Wherein, the output aesthetic attribute score is within a preset value range, such as (0, 1). The larger the aesthetic attribute score, the better the aesthetic features in the fourth sample picture.

And performing the steps for each fourth sample picture until a convergence condition is reached, and determining that the training of the aesthetic scoring model is finished.

The meeting of the preset convergence condition may be determining that the sum of the loss values of each fourth sample picture is smaller than a preset convergence threshold, or determining that the number of iterations for training the original aesthetic scoring model reaches a set maximum number of iterations, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.

In a possible implementation manner, when performing the aesthetic scoring model training, the fourth sample picture may be divided into a training sample and a test sample, the original aesthetic scoring model is trained based on the training sample, and then the reliability of the trained aesthetic scoring model is verified based on the test sample.

Example 6:

the cover recommendation method provided by the embodiment of the invention is determined through a specific implementation manner, and fig. 2 is a schematic diagram of a cover recommendation process provided by the embodiment of the invention. The process comprises the following steps:

s201: training a violation detection model, a face recognition model, a picture quality evaluation model and an aesthetic scoring model.

The electronic device performing the model training may be the same as or different from the electronic device performing the cover recommendation. The specific training method has been described in the above embodiments, and repeated details are not described.

Because the model training is generally performed in an off-line manner, after the model training is completed in the manner of the above embodiment, the trained model is stored in the electronic device for performing cover recommendation, so that the electronic device for performing cover recommendation subsequently performs cover recommendation based on the trained model.

S202: and acquiring a video frame in the video to be recommended.

The method and the device can extract part of video frames in the video to be recommended and can also extract all video frames in the video to be recommended.

S203: and acquiring a picture corresponding to a certain video frame.

S204: and determining the probability value of whether the picture is illegal through the violation detection model.

S205: and judging whether the probability value is greater than a preset violation threshold, if so, executing S211, and otherwise, executing S206.

S206: and determining a second score tf _ score corresponding to the second feature in the picture by using the conventional image processing technology.

The specific method for determining tf _ score has been described in the above embodiments, and repeated details are not repeated.

S207: and determining a face attribute score face _ score corresponding to the face attribute features in the picture based on the face recognition model.

The specific method for determining the face _ score is described in the above embodiments, and repeated details are not described herein.

S208: and determining a picture quality attribute score iqa _ score corresponding to the comprehensive quality feature in the picture based on the picture quality evaluation model.

The specific method for determining iqa _ score is described in the above embodiments, and repeated descriptions are omitted.

S209: and determining an aesthetic attribute score aes _ score corresponding to the aesthetic features in the picture based on the aesthetic scoring model.

The specific method for determining aes _ score is described in the above embodiments, and repeated details are not described.

S210: from tf _ score, face _ score, iqa _ score, and aes _ score, a composite score re _ score for the picture is determined.

The specific method for determining re _ score has been described in the above embodiments, and repeated details are not repeated.

S211: and judging whether the picture is the last picture corresponding to the video to be recommended, if so, executing S212, otherwise, executing S203.

S212: and determining the picture corresponding to the maximum value of the comprehensive score as the cover of the video to be recommended from the comprehensive score of each picture corresponding to the video to be recommended.

Example 7:

an embodiment of the present invention provides a cover recommendation device, and fig. 3 is a schematic structural diagram of the cover recommendation device provided in the embodiment of the present invention, where the device includes:

the acquiring unit 31 is configured to acquire a picture corresponding to a video frame in a video to be recommended;

the first processing unit 32 is configured to obtain first scores corresponding to different first features of the picture based on at least two feature extraction models that are trained in advance; wherein, each feature extraction model is different from the first feature extracted from any one picture;

a determining unit 33, configured to determine, according to each first score of the picture, a comprehensive score of the picture;

and the second processing unit 34 is configured to determine the picture as a cover of the video to be recommended if it is determined that the comprehensive score meets a preset recommendation condition.

In a possible implementation manner, the second processing unit 34 is specifically configured to determine that the comprehensive score meets a preset recommendation condition if it is determined that the comprehensive score is a maximum value of the comprehensive scores of the pictures respectively corresponding to each video frame in the video to be recommended; and/or if the comprehensive score is determined to be larger than a preset threshold value, determining that the comprehensive score meets a preset recommendation condition.

In a possible implementation manner, the first processing unit 32 is further configured to determine, by using a violation detection model, whether the picture is violated before obtaining first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance after the obtaining unit obtains the picture corresponding to the video frame in the video to be recommended; and if the picture is determined to be in violation, performing subsequent steps of obtaining first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively.

In a possible implementation manner, the determining unit 33 is specifically configured to obtain a second score corresponding to a second feature of the picture; wherein the second characteristic comprises at least one of brightness, sharpness, color consistency, sharpness, presence or absence of motion blur of the picture; and determining a comprehensive score of the picture according to a second score corresponding to the second feature of the picture, a first weight value corresponding to the second feature which is configured in advance, and a second weight value corresponding to each first score and each first feature which is configured in advance.

In a possible implementation manner, the determining unit 33 is specifically configured to, if the second features include at least two types, obtain, for each type of the second features, a feature score corresponding to the second feature of the picture; and determining a second score corresponding to a second feature of the picture according to each feature score of the picture.

In a possible implementation manner, the first processing unit 32 is specifically configured to, if the feature extraction model includes a face recognition model, obtain, through the face recognition model, first position information of each target detection frame in the picture, where the target detection frame includes a face, and second position information of a key point of a mouth on the face in each target detection frame; determining a face position score corresponding to the picture based on the first position information of each target detection frame, determining a face proportion score corresponding to the picture based on the size of each target detection frame and the size of the picture, and determining a facial expression score corresponding to the picture based on the second position information of key points of a mouth on the face in each target detection frame; and determining face attribute values corresponding to the face attribute features in the picture according to the face position values, the face proportion values and the face expression values.

In a possible implementation manner, the first processing unit 32 is specifically configured to, if the feature extraction model includes a picture quality assessment model, obtain, through the picture quality assessment model, a probability vector of a picture quality feature in the picture; wherein the picture quality characteristics include at least one of overexposure, excessive dimming, excessive blurring, and excessive noise; and determining a picture quality attribute score corresponding to the comprehensive quality feature in the picture according to each probability value contained in the probability vector.

In a possible implementation manner, the first processing unit 32 is specifically configured to, if the feature extraction model includes an aesthetic scoring model, determine, through the aesthetic scoring model, an aesthetic attribute score corresponding to an aesthetic feature in the picture.

Example 8:

as shown in fig. 4, which is a schematic structural diagram of an electronic device according to an embodiment of the present invention, on the basis of the foregoing embodiments, an embodiment of the present invention further provides an electronic device, as shown in fig. 4, including: the system comprises a processor 41, a communication interface 42, a memory 43 and a communication bus 43, wherein the processor 41, the communication interface 42 and the memory 43 complete mutual communication through the communication bus 43;

the memory 43 has stored therein a computer program which, when executed by the processor 41, causes the processor 41 to perform the steps of:

In a possible implementation manner, the processor 41 is specifically configured to determine that the comprehensive score meets a preset recommendation condition if it is determined that the comprehensive score is a maximum value of the comprehensive scores of the pictures respectively corresponding to each video frame in the video to be recommended; and/or if the comprehensive score is determined to be larger than a preset threshold value, determining that the comprehensive score meets a preset recommendation condition.

In a possible implementation manner, the processor 41 is further configured to determine, by using a violation detection model, whether the picture is violated before the obtaining unit obtains the first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance after obtaining the picture corresponding to the video frame in the video to be recommended; and if the picture is determined to be in violation, performing subsequent steps of obtaining first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively.

In a possible implementation manner, the processor 41 is specifically configured to obtain a second score corresponding to a second feature of the picture; wherein the second characteristic comprises at least one of brightness, sharpness, color consistency, sharpness, presence or absence of motion blur of the picture; and determining a comprehensive score of the picture according to a second score corresponding to the second feature of the picture, a first weight value corresponding to the second feature which is configured in advance, and a second weight value corresponding to each first score and each first feature which is configured in advance.

In a possible implementation manner, the processor 41 is specifically configured to, if the second features include at least two types, obtain, for each of the second features, a feature score corresponding to the second feature of the picture; and determining a second score corresponding to a second feature of the picture according to each feature score of the picture.

In a possible implementation manner, the processor 41 is specifically configured to, if the feature extraction model includes a face recognition model, obtain, through the face recognition model, first position information of each target detection frame in the picture, where the target detection frame includes a face, and second position information of a key point of a mouth on the face in each target detection frame; determining a face position score corresponding to the picture based on the first position information of each target detection frame, determining a face proportion score corresponding to the picture based on the size of each target detection frame and the size of the picture, and determining a facial expression score corresponding to the picture based on the second position information of key points of a mouth on the face in each target detection frame; and determining face attribute values corresponding to the face attribute features in the picture according to the face position values, the face proportion values and the face expression values.

In a possible implementation manner, the processor 41 is specifically configured to, if the feature extraction model includes a picture quality assessment model, obtain, through the picture quality assessment model, a probability vector of a picture quality feature in the picture; wherein the picture quality characteristics include at least one of overexposure, excessive dimming, excessive blurring, and excessive noise; and determining a picture quality attribute score corresponding to the comprehensive quality feature in the picture according to each probability value contained in the probability vector.

In a possible implementation manner, the processor 41 is specifically configured to determine, by using an aesthetic scoring model, an aesthetic attribute score corresponding to an aesthetic feature in the picture if the feature extraction model includes the aesthetic scoring model.

Because the principle of the electronic device for solving the problems is similar to the cover recommendation method, the implementation of the electronic device can be referred to the implementation of the method, and repeated details are not repeated.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 42 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

Example 9:

on the basis of the foregoing embodiments, the present invention further provides a computer-readable storage medium, in which a computer program executable by a processor is stored, and when the program runs on the processor, the processor is caused to execute the following steps:

In a possible implementation manner, the determining that the composite score meets a preset recommendation condition includes:

if the comprehensive score is determined to be the maximum value of the comprehensive scores of the pictures respectively corresponding to each video frame in the video to be recommended, determining that the comprehensive score meets a preset recommendation condition; and/or

In a possible implementation manner, after obtaining a picture corresponding to a video frame in a video to be recommended, before obtaining first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance, the method further includes:

determining whether the picture is illegal through an illegal detection model;

In a possible implementation manner, the determining a composite score of the picture according to each first score of the picture includes:

and determining a comprehensive score of the picture according to a second score corresponding to the second feature of the picture, a first weight value corresponding to the second feature which is configured in advance, and a second weight value corresponding to each first score and each first feature which is configured in advance.

In a possible implementation manner, if the second feature includes at least two, the obtaining a second score corresponding to the second feature of the picture includes:

In one possible implementation, the feature extraction model includes at least two of a face recognition model, a picture quality assessment model, and an aesthetic scoring model.

In a possible implementation manner, if the feature extraction model includes a face recognition model, the obtaining first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance respectively includes:

In a possible implementation manner, if the feature extraction model includes a picture quality evaluation model, the obtaining first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance respectively includes:

In a possible implementation manner, if the feature extraction model includes an aesthetic scoring model, the obtaining first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively includes:

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A cover recommendation method, the method comprising:

2. The method of claim 1, wherein the determining that the composite score satisfies a predetermined recommendation comprises:

3. The method according to claim 1, wherein after the obtaining of the picture corresponding to the video frame in the video to be recommended, before the obtaining of the first scores corresponding to different first features of the picture based on at least two feature extraction models trained in advance, the method further comprises:

determining whether the picture is illegal through an illegal detection model;

4. The method according to claim 1 or 3, wherein the determining a composite score for the picture from each of the first scores for the picture comprises:

5. The method according to claim 4, wherein if the second feature includes at least two, the obtaining a second score corresponding to the second feature of the picture comprises:

6. The method of claim 1, wherein the feature extraction model comprises at least two of a face recognition model, a picture quality assessment model, and an aesthetic scoring model.

7. The method according to claim 6, wherein if the feature extraction model includes a face recognition model, the obtaining the first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively comprises:

8. The method according to claim 6, wherein if the feature extraction model includes a picture quality evaluation model, the obtaining first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively comprises:

9. The method according to claim 6, wherein if the feature extraction model includes an aesthetic scoring model, the obtaining the first scores corresponding to different first features of the picture based on at least two pre-trained feature extraction models respectively comprises:

10. A cover recommendation device, the device comprising:

11. An electronic device, characterized in that it comprises at least a processor and a memory, said processor being adapted to implement the steps of the cover recommendation method according to any one of claims 1-9 when executing a computer program stored in the memory.

12. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, carries out the steps of the cover recommendation method according to any one of claims 1-9.