CN112256892A

CN112256892A - Video recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN112256892A
Application number: CN202011159050.6A
Authority: CN
Inventors: 刘畅; 李宣平; 张超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-01-22
Anticipated expiration: 2040-10-26
Also published as: CN112256892B

Abstract

The disclosure relates to a video recommendation method, a video recommendation device, an electronic device and a storage medium video, wherein the video recommendation method comprises the following steps: acquiring a target image corresponding to a target video; inputting the target image into a preset target classification model to obtain the target probability of the attention of the target video in a preset interval; obtaining a vector generated by inputting a target image into a target classification model to obtain a target vector; and taking the target probability and the target vector as video recommendation characteristics corresponding to the target video, and inputting a preset video recommendation model to recommend the target video. In the implementation process, a target image corresponding to a target video can often affect the attention of one video to a great extent, so that starting from the target image, the target probability and the generated target vector obtained by inputting the target image into a preset target classification model are recommended to the target video as factors affecting the attention of the video, and the accuracy of the video exposure potential prediction result can be improved.

Description

Video recommendation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video recommendation method and apparatus, an electronic device, and a storage medium video.

Background

With rapid progress of modern information transmission technology and popularization of video receiving equipment such as smart phones, short videos gradually become one of main carriers for people to receive information in daily life, and various short video platforms emerge like bamboo shoots in spring after rain. In the short video platform, factors influencing the exposure of the video are accurately determined, whether the video can become the high-exposure video or not is predicted, and the high-exposure video is recommended, so that the popularization of the video and the benign development of the short video platform are facilitated. In a traditional video recommendation method, the exposure, click rate, like rate, attention rate or play completion rate of a short video in a period of time are generally taken as factors influencing the exposure of the video, and prediction of video exposure potential and recommendation of the video are carried out.

However, in the cold start model, data related to the exposure, click rate, like rate, attention rate, or play-out rate of the video is small, and therefore, with the conventional video recommendation method, the exposure potential of the video cannot be accurately predicted through the exposure, click rate, like rate, attention rate, or play-out rate of the video, and the video to be recommended cannot be accurately recommended.

Disclosure of Invention

The disclosure provides a video recommendation method, a video recommendation device, an electronic device and a storage medium, which are used for at least solving the problem that a video to be recommended cannot be accurately recommended in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video recommendation method, including:

acquiring a target image corresponding to a target video;

inputting the target image into a preset target classification model to obtain a target probability of the attention of the target video in a preset interval; the attention degree is a comprehensive parameter value for measuring the attention degree of the target video, and the target classification model is a model for predicting the attention degree distribution obtained according to historical video training;

obtaining a vector generated by inputting the target image into the target classification model to obtain a target vector;

and taking the target probability and the target vector as video recommendation features corresponding to the target video, and inputting a preset video recommendation model to recommend the target video.

In an exemplary embodiment, the obtaining a vector generated by inputting the target image into the target classification model to obtain a target vector includes:

and inputting the target image into the target classification model, and determining a feature vector output by a preset full connection layer as the target vector.

In an exemplary embodiment, the obtaining manner of the target classification model includes:

acquiring a historical image corresponding to a historical video and the attention of the historical video;

classifying the attention degrees of the historical videos according to the preset interval to obtain attention degree classification results;

and training a preset initial classification model by taking the historical image as input and the attention classification result as supervision information to obtain the target classification model.

In an exemplary embodiment, the attention includes a primary attention and a secondary attention; the main attention degree comprises an attention amount corresponding to the target video, and the auxiliary attention degree comprises at least one of a click rate, a like rate, an attention rate or a play completion rate corresponding to the target video; the preset intervals comprise a first preset interval and a second preset interval;

classifying the attention degrees of the historical videos according to the preset interval to obtain attention degree classification results, wherein the attention degree classification results comprise:

classifying the main attention according to the first preset interval to obtain a first classification result;

classifying the auxiliary attention according to the second preset interval to obtain a second classification result;

and determining the first classification result and the second classification result as the attention classification result.

In an exemplary embodiment, the classifying the primary attention according to the first preset interval to obtain a first classification result includes:

dividing the first preset interval according to a preset numerical sequence to obtain at least one main preset interval;

and dividing the second preset interval according to a logarithmic function form to obtain at least one auxiliary preset interval.

clustering the main attention degrees in each main preset interval to obtain at least one target classification result;

and determining the main attention degree of a preset proportion in each target classification result as the first classification result.

In an exemplary embodiment, the training a preset initial classification model by using the historical image as an input and the attention classification result as supervision information to obtain the target classification model includes:

training the initial classification model by taking the historical image as input and the first classification result and the second classification result as supervision information to obtain an intermediate classification model;

training the intermediate classification model by taking a first preset value as the weight of the loss function corresponding to the main attention degree and taking a second preset value as the weight of the loss function corresponding to the auxiliary attention degree to obtain the target classification model; wherein the first preset value is greater than or equal to the second preset value.

In an exemplary embodiment, the inputting the target probability and the target vector as video recommendation features corresponding to the target video into a preset video recommendation model to recommend the target video includes:

taking the target probability and the target vector as video recommendation features corresponding to the target video, and inputting a preset video recommendation model to obtain the recommendation probability of the attention of the target video in the preset interval;

and when the recommendation probability is larger than a preset threshold value, recommending the target video.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for determining a video recommendation feature, including:

a target image acquisition unit configured to perform acquisition of a target image corresponding to a target video;

a target probability determination unit configured to input the target image into a preset target classification model to obtain a target probability that the attention of the target video is in a preset interval; the attention degree is a comprehensive parameter value for measuring the attention degree of the target video, and the target classification model is a model for predicting the attention degree distribution obtained according to historical video training;

a target vector determining unit configured to perform obtaining a vector generated by inputting the target image into the target classification model, and obtain a target vector;

and the video recommending unit is configured to input a preset video recommending model by taking the target probability and the target vector as video recommending characteristics corresponding to the target video so as to recommend the target video.

In an exemplary embodiment, the target vector determination unit is further configured to perform:

In an exemplary embodiment, the apparatus for determining video recommendation features further includes an object classification model obtaining unit configured to perform:

In an exemplary embodiment, the target classification model obtaining unit is further configured to perform:

In an exemplary embodiment, the video recommendation unit is configured to perform:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video recommendation method of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the video recommendation method of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, from which the at least one processor of the apparatus reads and executes the computer program, so that the apparatus performs the video recommendation method described in any one of the above-mentioned first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

acquiring a target image corresponding to a target video; inputting the target image into a preset target classification model to obtain the target probability of the attention of the target video in a preset interval; the target classification model is a model for predicting attention distribution obtained according to historical video training; obtaining a vector generated by inputting a target image into a target classification model to obtain a target vector; and taking the target probability and the target vector as video recommendation characteristics corresponding to the target video, and inputting a preset video recommendation model to recommend the target video. In the implementation process, a target image corresponding to a target video can often affect the attention of one video to a great extent, so that starting from the target image, a target probability and a generated target vector obtained by inputting the target image into a preset target classification model are input into a preset video recommendation model as factors affecting the attention of the video to recommend the target video, and the accuracy of video recommendation can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flow diagram illustrating a video recommendation method according to an example embodiment.

FIG. 2 is a flow diagram illustrating one possible implementation of obtaining a target classification model in accordance with an illustrative embodiment.

Fig. 3 is a flowchart illustrating one possible implementation of step S220 according to an example embodiment.

Fig. 4 is a flowchart illustrating an implementable manner of step S221 according to an exemplary embodiment.

Fig. 5 is a flowchart illustrating one possible implementation of step S230 according to an exemplary embodiment.

Fig. 6 is a diagram illustrating a change in AUC values of a video recommendation model according to an example embodiment.

Fig. 7 is a diagram illustrating a change in ROC values of a video recommendation model according to an example embodiment.

Fig. 8 is a block diagram illustrating a video recommendation device according to an example embodiment.

FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a video recommendation method according to an exemplary embodiment, including the steps of:

and step S100, acquiring a target image corresponding to the target video.

Step S200, inputting a target image into a preset target classification model to obtain a target probability of the attention of a target video in a preset interval; the attention degree is a comprehensive parameter value for measuring the attention degree of the target video, and the target classification model is a model for predicting the attention degree distribution obtained according to historical video training.

And step S300, obtaining a vector generated by inputting a target image into a target classification model to obtain a target vector.

And S400, taking the target probability and the target vector as video recommendation characteristics corresponding to the target video, and inputting a preset video recommendation model to recommend the target video.

The target video is a video which needs to be detected and determines factors influencing the video attention degree, and video recommendation is carried out according to the factors influencing the video attention degree. The target image is an image capable of reflecting the content of the target video to a certain extent, and can be a cover image of the video or an image frame obtained by detecting a video key frame. The preset interval refers to a distribution interval of the attention degree, for example, when the attention degree is the exposure amount, the preset interval may be (0,1000) or [1000, ∞ ], and it should be noted that (0,1000) and [1000, ∞ ] are merely exemplary illustrations and are not intended to specifically limit the preset interval. Optionally, the preset intervals may also be (0,1000), [1000, 10000), or [10000, ∞) unequal. When the attention is exposure, click rate, praise rate, attention or end broadcast rate, the preset interval may be { x | log (x × 1000) <1}, { x |1 ≦ log (x × 1000) <2}, { x |2 ≦ log (x × 1000) <3}, { x |3 ≦ log (x × 1000) <4}, and { x |4 ≦ log (x 1000) <5}, where x represents click rate, praise rate, attention or end broadcast rate. The preset video recommendation model is obtained by training according to video recommendation features (target probability and target vector) which influence the attention degree of the video and can recommend the target video. Exemplarily, the video recommendation characteristic is used as input, the attention degree of the target video in a preset interval is used as monitoring information, and a preset initial video recommendation model is trained to obtain the video recommendation model.

Specifically, the target image is input into a target classification model capable of predicting the attention distribution of the video, so that the target probability of the attention of the target video in a preset interval is obtained, and a target vector corresponding to a specific layer of the target classification model is obtained. And determining the target probability and the target vector as video recommendation characteristics corresponding to the target video. The video recommendation feature is that factors influencing the attention degree of the video are determined according to the target image, and the influence of the target image in the target video on the attention degree of the target video can be reflected to a certain degree. And taking the video recommendation characteristics as input data of video recommendation, inputting a preset video recommendation model, and recommending the target video according to the output of the video recommendation model.

Illustratively, when the degree of interest is exposure, and the preset intervals are (0,1000) and [1000, ∞), "the target image is input into the target classification model of the Resnet50 neural network + MLP structure (one layer is a full-connected layer of 512, one layer is a full-connected layer of 32), and a 32-dimensional vector output by the full-connected layer of the target classification model and a probability value of the exposure of the target video in [1000, ∞") are determined as the video recommendation feature. And inputting a preset video recommendation model by taking the 32-dimensional vector and the probability value of the attention degree in [1000, ∞) as input, and recommending the target video according to the output of the video recommendation model.

In the method for determining the video recommendation characteristics, a target image corresponding to a target video is obtained; inputting the target image into a preset target classification model to obtain the target probability of the attention of the target video in a preset interval; the target classification model is a model for predicting attention distribution obtained according to historical video training; obtaining a vector generated by inputting a target image into a target classification model to obtain a target vector; and taking the target probability and the target vector as video recommendation characteristics corresponding to the target video, and inputting a preset video recommendation model to recommend the target video. In the implementation process, a target image corresponding to a target video can often affect the attention of one video to a great extent, so that starting from the target image, a target probability obtained by inputting the target image into a preset target classification model and a generated target vector are used as factors affecting the attention of the video to be input into a preset video recommendation model so as to recommend the target video, and the accuracy of video recommendation can be improved.

In an exemplary embodiment, one possible implementation of step S300 includes:

and inputting the target image into a target classification model, and determining the feature vector output by a preset full connection layer as a target vector.

Specifically, the target classification model is a preset neural network model, when a target image is input into the target classification model to obtain a target probability, a feature vector of a full connection layer before the output probability value of the target classification model can be simultaneously obtained, the feature vector represents the features of the target image obtained by layer-by-layer data transmission of the target image through the target classification model, the attention degree affecting the video can be reflected to a certain degree, and the feature vector can be used as one of factors affecting the attention degree of the video.

In the above embodiment, the target image is input into the target classification model, the feature vector output by the preset full link layer is determined as the target vector, the target vector is used as one of the factors for reflecting the attention degree of the video, the target vector and the target probability are used as the factors for reflecting the attention degree of the video, the influence of the target image on the attention degree of the target video is used as one of the factors for influencing the attention degree of the video, the factors for more accurately influencing the attention degree of the video are obtained, the target image is added into one of the influence factors for predicting the attention degree of the target video, and meanwhile, the accuracy of subsequent video recommendation can be improved.

In an exemplary embodiment, as shown in fig. 2, a schematic flow chart diagram of an implementable method for obtaining a target classification model according to an exemplary embodiment is shown, which includes the following steps:

step S210, acquiring a history image corresponding to the history video and the attention of the history video.

And step S220, classifying the attention degrees of the historical videos according to a preset interval to obtain an attention degree classification result.

And step S230, training a preset initial classification model by taking the historical image as input and the attention degree classification result as supervision information to obtain a target classification model.

Specifically, the degree of attention of the history video is classified according to the distribution of preset intervals, for example, when the degree of attention is exposure, and the preset intervals are (0,1000) and [1000, ∞ ], history images with exposure between (0,1000) are classified into one category, and history images with exposure between [1000, ∞ ] are classified into one category. Alternatively, when the preset intervals are (0,1000), [1000, 10000) and [10000, ∞), the history images with the exposure between (0,1000) are classified into one category, the history images with the exposure between (0,10000) are classified into one category, and the history images with the exposure between [10000, ∞) are classified into one category. Thus, the attention classification result corresponding to the preset interval can be obtained.

And then, training a preset initial classification model by taking the historical image as input and the attention degree classification result as supervision information to obtain a target classification model. For example, when the degree of interest is exposure, the preset intervals are (0,1000) and [1000, ∞), "0" may be set as a label of the history image with the exposure between (0,1000), and "1" may be set as a label of the history image with the exposure between [1000, ∞), "1" may be set as a label of the history image with the exposure between [1000, ∞ ], and the initial classification model may be trained, and a model with a loss function satisfying a certain condition may be determined as the target classification model, with the history image as an input and the corresponding "0" and "1" as the supervised information.

In the embodiment, the historical image corresponding to the historical video and the attention degree of the historical video are acquired; classifying the attention degrees of the historical videos according to a preset interval to obtain attention degree classification results; and training a preset initial classification model by taking the historical image as input and the attention degree classification result as supervision information to obtain a target classification model. The historical images can be used for training to obtain a model capable of outputting attention degree distribution, and a basis is provided for determining video recommendation characteristics according to the current target images.

In an exemplary embodiment, as shown in fig. 3, is a schematic flow chart of an implementable manner of step S220 shown according to an exemplary embodiment, including the following steps:

step S221, classifying the main attention according to a first preset interval to obtain a first classification result.

And step S222, classifying the auxiliary attention according to a second preset interval to obtain a second classification result.

Step S223, determining the first classification result and the second classification result as attention classification results.

Wherein the attention degree comprises a main attention degree and an auxiliary attention degree; the main attention degree comprises the attention amount corresponding to the target video, and the auxiliary attention degree comprises at least one of the click rate, the like rate, the attention rate or the play-over rate corresponding to the target video; the preset interval comprises a first preset interval and a second preset interval.

Optionally, dividing the first preset interval according to a preset numerical sequence to obtain at least one main preset interval; and dividing the second preset interval according to a logarithmic function form to obtain at least one auxiliary preset interval.

The preset numerical sequence is a critical point divided according to the attention amount corresponding to the main attention degree, and the preset numerical sequence may be 1000, and 10000.

Specifically, when the preset value in the preset value sequence is 1000, two main preset intervals (0,1000) and [1000, ∞) can be obtained, and when the preset threshold is 1000 and 1000, three main preset intervals (0,1000), [1000, 10000) and [10000, ∞) can be obtained. The second preset interval is divided in a logarithmic function form, so that at least one auxiliary preset interval can be obtained. For example, the second predetermined intervals are { x | log (x × 1000) <1}, { x |1 ≦ log (x × 1000) <2}, { x |2 ≦ log (x × 1000) <3}, { x |3 ≦ log (x × 1000) <4}, and { x |4 ≦ log (x × 1000) <5}, and corresponding five auxiliary predetermined intervals can be obtained.

Specifically, when the main attention is the attention amount, the main attention is classified according to a first preset interval, and a first classification result is obtained. For example, when the preset intervals are (0,1000) and [1000, ∞), "the historical images with the amount of interest between (0,1000) are classified into one category, and the historical images with the amount of interest between [1000, ∞") are classified into one category. Alternatively, when the preset intervals are (0,1000), [1000, 10000) and [10000, ∞), the history images with the amount of interest between (0,1000) are classified into one class, the history images with the amount of interest between (0,10000) are classified into one class, and the history images with the amount of interest between [10000, ∞) are classified into one class.

And when the auxiliary attention is the click rate, the marking rate, the attention rate or the broadcasting completion rate, classifying the auxiliary attention according to a second preset interval to obtain a second classification result. For example, the second predetermined interval is { x | log (x × 1000) <1}, { x |1 ≦ log (x × 1000) <2}, { x |2 ≦ log (x × 1000) <3}, { x |3 ≦ log (x × 1000) <4}, and { x |4 ≦ log (x × 1000) <5}, where x represents a click rate, a mark rate, a focus rate, or a play-out rate. The history images with x between { x | log (x 1000) <1} are classified into one category, the history images with x between { x |1 ≦ log (x 1000) <2} are classified into one category, … …, and the history images with x between { x |4 ≦ log (x 1000) <5} are classified into one category.

In the above embodiment, the main attention degree is classified according to a first preset interval, so as to obtain a first classification result; classifying the auxiliary attention according to a second preset interval to obtain a second classification result; and determining the first classification result and the second classification result as attention degree classification results. The attention degree is divided into a main attention degree and an auxiliary attention degree, and factors influencing the video attention amount can be further refined, so that different parameters can determine personalized factors influencing the video attention amount based on the characteristics of the different parameters. For example, the attention amount is determined solely as one of the factors affecting the video attention amount as the main attention amount, and the auxiliary attention amount is determined solely as the other factor affecting the video attention amount. The main attention degree is divided by a certain threshold value and determined as a first classification result, and the main attention degree is classified according to a logarithmic function based on the exponential decay characteristics of the auxiliary attention degree click rate, the marking rate, the attention rate or the broadcasting completion rate to obtain a second classification result. The process of classifying according to the self characteristics of the main attention degree and the auxiliary attention degree and obtaining the corresponding classification results enables the corresponding classification results to reflect the characteristics of the target images more comprehensively, and provides a basis for accurately determining the video recommendation characteristics influencing the video attention amount.

In an exemplary embodiment, as shown in fig. 4, it is a schematic flowchart of an implementable manner of step S221 shown according to an exemplary embodiment, and includes the following steps:

and step S2211, clustering the main attention degrees in each main preset interval to obtain at least one target classification result.

Step S2212, determining the primary attention degree of the preset proportion in each target classification result as a first classification result.

Specifically, when the main attention is the attention amount and at least one main preset section is (0,1000) and [1000, ∞), generally, the number of target videos and target images of which the attention amount is between the main preset sections [1000, ∞) occupies about 1/15. In order to ensure that the distribution of the number of samples is as uniform as possible and improve the accuracy of the subsequently obtained video recommendation characteristics for reflecting the attention amount of the video, a clustering sampling mode is adopted for data equalization, target images with the attention amount between (0,1000) are clustered to obtain a fifth preset number of target classification results, and the target images with the attention amount between [1000, ∞ ] are in another category. A preset proportion (optionally, the preset proportion here may be 15%) of the target images in each class within a main preset interval of (0,1000), and a preset proportion (optionally, the preset proportion here may be 100 or 90%) of the target images in each class within a main preset interval of [1000, ∞ ]) are determined as a first classification result for initial classification model training.

In the above embodiment, in each main preset interval, clustering is performed on the main attention to obtain at least one target classification result; and determining the main attention degree of a preset proportion in each target classification result as a first classification result. According to the distribution condition of the main attention degree, the image in different main preset intervals is determined to be sample data inversely proportional to the attention amount, a first classification result is obtained, the sample data can be ensured to be uniformly distributed, the data of the training initial classification model can reflect the correlation between the image and the attention amount more comprehensively, and finally the video recommendation characteristic which influences the video attention amount more accurately can be obtained according to the video recommendation characteristic obtained by the target classification model.

In an exemplary embodiment, as shown in fig. 5, is a schematic flow chart of an implementable manner of step S230 shown according to an exemplary embodiment, including the following steps:

and S231, training the initial classification model by taking the historical image as input and the first classification result and the second classification result as supervision information to obtain an intermediate classification model.

Step S232, training the intermediate classification model by taking a first preset value as the weight of the loss function corresponding to the main attention and a second preset value as the weight of the loss function corresponding to the auxiliary attention to obtain a target classification model; the first preset value is greater than or equal to the second preset value.

Specifically, in the training process of the target classification model, the intermediate classification model is trained by taking a first preset value as the weight of the loss function corresponding to the main attention degree and taking a second preset value as the weight of the loss function corresponding to the auxiliary attention degree. The influence of the main attention on the output of the target classification model can be reflected more by the finally obtained target classification model, and the influence of the auxiliary attention on the output of the model is assisted, so that the influence of the main attention and the auxiliary attention on the output of the target classification model can be integrated, and the output of the video attention can be influenced more accurately.

In the embodiment, the historical image is used as input, the first classification result and the second classification result are used as supervision information, and the initial classification model is trained to obtain an intermediate classification model; training the intermediate classification model by taking the first preset value as the weight of the loss function corresponding to the main attention degree and the second preset value as the weight of the loss function corresponding to the auxiliary attention degree to obtain a target classification model; the first preset value is greater than or equal to the second preset value. The influence of the main attention degree and the auxiliary attention degree on the target classification model can be comprehensively considered, so that the output of the target classification model can more accurately influence the video recommendation characteristics of the video attention amount.

In a specific embodiment, a short video cover image is input by taking whether the video exposure (attention amount) is more than 1000 as a main target (main attention degree) and taking four indexes of click rate, praise rate, attention rate and broadcast completion rate as auxiliary targets (auxiliary attention degree), a multitask embedding model (target classification model) of a deep convolutional neural network is learned, and embedding features (target vectors) and prediction probability values (target probabilities) obtained by learning are used as the input of a cold start XGB model (video recommendation model), so that a model capable of accurately predicting the exposure of a target video is obtained finally, and the method comprises the following steps:

step S1, multitask object construction: based on-line data distribution, the main target and the four auxiliary targets are exponentially attenuated, so that log logarithm is selected to divide target categories (for example, 0-1000 is one category, and 1000-10000 is one category), and specific target tasks are divided:

the exposure task category show is determined as shown in formula (1):

wherein x is the exposure dose.

The click rate, the like rate, the attention rate and the play-out rate ctr (x) are determined as shown in formula (2):

wherein, x is one of click rate, praise rate, attention rate and broadcast completion rate.

Step S2, data equalization processing: because the main target (exposure) of learning is seriously unbalanced in online data distribution, the ratio of the exposure greater than 1000 is about 1/15, and in order to ensure the final precision and keep consistency with the online data distribution, a cluster sampling mode is adopted for data equalization. Clustering 10 types of images of sample images with exposure less than 1000 in a VAE clustering mode, and randomly sampling 15% of data in each clustering type to serve as negative sample images for training as final training data;

and step S2, constructing a model, wherein the model comprises 5 corresponding tasks: the task 1 is a main exposure two-classification task, the tasks 2, 3, 4 and 5 are auxiliary target click rate 5 classification tasks, praise rate 5 classification tasks, attention rate 5 classification tasks and broadcasting completion rate 5 classification multi-classification tasks, and the input is a video cover image. The model selects a Resnet50 neural network + MLP structure (one layer is a 512 full-connection layer, and the other layer is a 32 full-connection layer) as a training model, the final fitting target is a loss objective function value which takes task 1 as a main target and takes task 2, task 3, task 4 and task 5 as auxiliary targets in a preset interval, and the specific loss function is shown as formula (3):

wherein,

w_ifor task weight, since task 1 is the primary task, w is set₁The weight of (2) is 0.5, and the sum of the weights of the other four tasks is 0.5.

Step S4, online use: after training is completed according to the multitask target constructed in the first step, the training data selected after the data constructed in the second step are balanced and the model constructed in the third step, a 32-dimensional full-connected layer vector and the prediction probability value (the exposure is more than 1000) of the main target are selected as a final prediction result (video recommendation characteristic), and the result is finally added into the video recommendation model so as to improve the accuracy of the video recommendation model on the video exposure potential prediction result. Fig. 6 is a schematic diagram illustrating a change in AUC values of a video recommendation model according to an exemplary embodiment, and fig. 7 is a schematic diagram illustrating a change in ROC values of a video recommendation model according to an exemplary embodiment. As can be seen from fig. 6 and 7, the AUC value of the video recommendation model obtained by training with the video recommendation feature as the input of the video recommendation model is improved from 0.89 to 0.91, which indicates that the video recommendation model can better predict the video exposure potential.

In the above embodiment, starting from the target image, the target probability and the generated target vector obtained by inputting the preset target classification model are used as one of the factors influencing the video exposure, the factors (the target probability and the target vector) influencing the video exposure more accurately are used as input, the exposure of the target video is the monitoring information in the preset interval, the video recommendation model is obtained through training, and the accuracy of the video recommendation model in recommending the video is further improved.

It should be understood that although the various steps in the flow charts of fig. 1-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-5 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

Fig. 8 is a block diagram illustrating an apparatus for determining video recommendation features in accordance with an exemplary embodiment. Referring to fig. 8, the apparatus includes a target image acquisition unit 801, a target probability determination unit 802, a target vector determination unit 803, and a video recommendation unit 804:

a target image acquisition unit 801 configured to perform acquisition of a target image corresponding to a target video;

a target probability determination unit 802, configured to input a target image into a preset target classification model, and obtain a target probability that the attention of the target video is in a preset interval; the target classification model is a model for predicting attention distribution obtained according to historical video training;

a target vector determination unit 803 configured to perform acquiring a vector generated by inputting a target image into a target classification model, resulting in a target vector;

and the video recommending unit 804 is configured to perform the steps of taking the target probability and the target vector as video recommending features corresponding to the target video, and inputting a preset video recommending model to recommend the target video.

In an exemplary embodiment, the target vector determination unit 803 is further configured to perform: and inputting the target image into a target classification model, and determining the feature vector output by a preset full connection layer as a target vector.

In an exemplary embodiment, the apparatus for determining video recommendation features further includes an object classification model obtaining unit configured to perform: acquiring a historical image corresponding to a historical video and the attention degree of the historical video; classifying the attention degrees of the historical videos according to a preset interval to obtain attention degree classification results; and training a preset initial classification model by taking the historical image as input and the attention degree classification result as supervision information to obtain a target classification model.

In an exemplary embodiment, the target classification model obtaining unit is further configured to perform: classifying the main attention according to a first preset interval to obtain a first classification result; classifying the auxiliary attention according to a second preset interval to obtain a second classification result; and determining the first classification result and the second classification result as attention degree classification results.

In an exemplary embodiment, the target classification model obtaining unit is further configured to perform: dividing the first preset interval according to a preset numerical sequence to obtain at least one main preset interval; and dividing the second preset interval according to a logarithmic function form to obtain at least one auxiliary preset interval.

In an exemplary embodiment, the target classification model obtaining unit is further configured to perform: clustering the main attention degree in each main preset interval to obtain at least one target classification result; and determining the main attention degree of a preset proportion in each target classification result as a first classification result.

In an exemplary embodiment, the target classification model obtaining unit is further configured to perform: training an initial classification model by taking the historical image as input and the first classification result and the second classification result as supervision information to obtain an intermediate classification model; training the intermediate classification model by taking the first preset value as the weight of the loss function corresponding to the main attention degree and the second preset value as the weight of the loss function corresponding to the auxiliary attention degree to obtain a target classification model; the first preset value is greater than or equal to the second preset value.

In an exemplary embodiment, the video recommendation unit 804 is configured to perform: the target probability and the target vector are used as video recommendation characteristics corresponding to the target video, a preset video recommendation model is input, and recommendation probability of the attention of the target video in a preset interval is obtained; and when the recommendation probability is greater than a preset threshold value, recommending the target video.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment. The electronic device can be used for determining the video recommendation characteristics and determining the video recommendation model. For example, the device 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

Referring to fig. 9, device 900 may include one or more of the following components: a processing component 902, a memory 904, a power component 906, a multimedia component 908, an audio component 90, an interface to input/output (I/O) 92, a sensor component 94, and a communication component 96.

The processing component 902 generally controls the overall operation of the device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 902 may include one or more processors 920 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 902 can include one or more modules that facilitate interaction between processing component 902 and other components. For example, the processing component 902 can include a multimedia module to facilitate interaction between the multimedia component 908 and the processing component 902.

The memory 904 is configured to store various types of data to support operation at the device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 904 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power component 906 provides power to the various components of the device 900. The power components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 900.

The multimedia components 908 include a screen that provides an output interface between the device 900 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 908 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 900 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a Microphone (MIC) configured to receive external audio signals when device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 904 or transmitted via the communication component 916. In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

I/O interface 912 provides an interface between processing component 902 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor component 914 includes one or more sensors for providing status assessment of various aspects of the device 900. For example, the sensor component 914 may detect an open/closed state of the device 900, the relative positioning of components, such as a display and keypad of the device 900, the sensor component 914 may also detect a change in the position of the device 900 or a component of the device 900, the presence or absence of user contact with the device 900, orientation or acceleration/deceleration of the device 900, and a change in the temperature of the device 900. The sensor assembly 914 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 916 is configured to facilitate communications between the device 900 and other devices in a wired or wireless manner. Device 900 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 916 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 916 further includes a Near Field Communication (NFC) module to facilitate short-range communications.

In an exemplary embodiment, the device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 904 comprising instructions, executable by the processor 920 of the device 900 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for video recommendation, comprising:

acquiring a target image corresponding to a target video;

2. The video recommendation method according to claim 1, wherein said obtaining a vector generated by inputting said target image into said target classification model to obtain a target vector comprises:

3. The video recommendation method according to claim 1, wherein the obtaining manner of the object classification model comprises:

4. The video recommendation method according to claim 3, wherein the attention degree includes a primary attention degree and a secondary attention degree; the main attention degree comprises an attention amount corresponding to the target video, and the auxiliary attention degree comprises at least one of a click rate, a like rate, an attention rate or a play completion rate corresponding to the target video; the preset intervals comprise a first preset interval and a second preset interval;

5. The video recommendation method according to claim 4, wherein said classifying said primary attention according to said first preset interval to obtain a first classification result comprises:

6. The video recommendation method according to any one of claims 4 to 5, wherein the training a preset initial classification model by using the historical image as an input and the attention classification result as supervision information to obtain the target classification model comprises:

7. The video recommendation method according to claim 1, wherein the step of inputting the target probability and the target vector as video recommendation features corresponding to the target video into a preset video recommendation model to recommend the target video comprises:

8. A video recommendation apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video recommendation method of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a video recommendation method as recited in any one of claims 1-7.