CN109547814B

CN109547814B - Video recommendation method and device, server and storage medium

Info

Publication number: CN109547814B
Application number: CN201811527490.5A
Authority: CN
Inventors: 蔡锦龙
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2021-07-16
Anticipated expiration: 2038-12-13
Also published as: CN109547814A

Abstract

The disclosure relates to a video recommendation method, a video recommendation device, a server and a storage medium, belonging to the technical field of networks, wherein the method comprises the following steps: determining video characteristics of a plurality of first videos to be recommended according to target recommendation time lengths of the plurality of first videos, wherein the target recommendation time length of each first video is the time length from the current recommendation time to the recommendation starting time of the first video; determining the feedback probability of the user for performing feedback operation on the plurality of first videos according to the user characteristics of the user and the video characteristics of the plurality of first videos; and recommending the plurality of first videos to the user according to the feedback probability of the user on the plurality of first videos. The video features are determined according to the target recommendation duration to obtain the video features matched with the target recommendation duration, so that the problem that the predicted click rate is too small due to too few new video features with the target recommendation duration not greater than the preset duration is solved, the accuracy of the recommendation probability is improved, and the recommendation accuracy is improved.

Description

Video recommendation method and device, server and storage medium

Technical Field

The present disclosure relates to the field of network technologies, and in particular, to a video recommendation method, apparatus, server, and storage medium.

Background

With the development of network technology, more and more people transmit information and share life through videos, it is a common behavior that a user browses videos on a video application, and the video application can also recommend videos which the user may be interested in to the user. For example, a video application recommends game-like videos for spectator users who prefer games.

In the related art, the video recommendation process is as follows: the method comprises the steps that a server obtains user characteristics of a user, a plurality of videos with video characteristics matched with the user characteristics are preliminarily screened out from a plurality of candidate videos according to the user characteristics, the video characteristics of the videos and the user characteristics of the user are input into a neural network model, the click rate of the user on the videos is output, and the neural network model is used for predicting the click rate of the user on each video based on the user characteristics and the video characteristics; the video characteristics comprise the historical behavior records of a plurality of users in the historical recommendation process of the video, the attribute characteristics of the video, and other characteristics of multiple dimensions, such as the number of times that the video is clicked in a week, the video type to which the video belongs, and the like, and the server preferentially recommends the video with a higher click rate to the users according to the click rates of the plurality of videos.

The process predicts the click rate of the video according to the attribute characteristics of the video and the characteristics of historical behavior records of a plurality of users during historical recommendation of the video. However, for a new video just released into the video application, the number of corresponding features such as historical behavior records is small or even none, and compared with a video with the same video attribute features and a large number of historical behavior records, the new video has a low predicted click rate and is less likely to be recommended before recommendation, thereby causing inaccuracy in the video recommendation process.

Disclosure of Invention

The present disclosure provides a video recommendation method, apparatus, server and storage medium, which can overcome the inaccuracy of video recommendation in the related art.

According to a first aspect of the embodiments of the present disclosure, there is provided a video recommendation method, including:

determining video characteristics of a plurality of first videos to be recommended according to target recommendation durations of the plurality of first videos, wherein the target recommendation duration of each first video is the duration of the current recommendation time from the recommendation starting time of the first video;

determining feedback probability of the user for performing feedback operation on the plurality of first videos according to the user characteristics of the user and the video characteristics of the plurality of first videos;

and recommending the plurality of first videos to the user according to the feedback probability of the user on the plurality of first videos.

Optionally, the determining, according to the target recommendation duration of the plurality of first videos to be recommended, the video features of the plurality of first videos includes:

when the target recommended duration of the plurality of first videos is not greater than a preset duration, determining image features, text features and/or audio features of the plurality of first videos;

determining the image features, text features, and/or audio features as video features of the plurality of first videos.

when the target recommended duration of the first videos is not greater than a preset duration, determining predicted video features of the first videos, wherein the predicted video features are used for representing prediction of feedback operation on the first videos;

determining the predicted video features as video features of the plurality of first videos.

Optionally, when the target recommended durations of the plurality of first videos are not greater than the preset duration, determining the predicted video features of the plurality of first videos includes:

classifying the second videos according to the image characteristics of the second videos to obtain multiple video types, wherein the target recommended time lengths of the second videos are not less than the preset time length;

determining the type of a target video to which each first video belongs according to the image characteristics of each first video;

and determining the predicted video characteristics of each first video according to a plurality of second videos included in the target video type.

Optionally, the determining, according to the user characteristics of the user and the video characteristics of the plurality of first videos, the feedback probability of the user performing the feedback operation on the plurality of first videos includes:

inputting the video characteristics of the plurality of first videos into a recommendation model, wherein the recommendation model is used for determining the feedback probability of the user to the videos according to the user characteristics of the user and the video characteristics of the videos;

when a recommendation request of the user is received, inputting the user characteristics of the user into the recommendation model, and outputting the feedback probability of the user to the plurality of first videos.

Optionally, the recommending model includes a user neural network and a video neural network, and the inputting the video features of the plurality of first videos into the recommending model includes:

inputting the video features of the plurality of first videos into the recommendation model, and determining a video feature vector of each video according to first network parameters in the video neural network and the video features of the plurality of first videos in the video neural network of the recommendation model;

correspondingly, when a recommendation request of the user is received, inputting the user characteristics of the user into the recommendation model, and outputting the feedback probability of the user to the plurality of first videos includes:

when a recommendation request of the user is received, inputting the user characteristics of the user into the recommendation model, and determining a user characteristic vector of the user in a user neural network of the recommendation model according to second network parameters and the user characteristics in the user neural network;

and determining the feedback probability of the user to the plurality of first videos according to the user feature vector and the video feature vector.

Optionally, the recommendation model includes a click rate sub-model, an approval rate sub-model, and/or an attention rate sub-model, and accordingly, when a recommendation request of the user is received, inputting the user characteristics of the user into the recommendation model, and outputting the feedback probabilities of the user to the plurality of first videos includes:

when the recommendation model comprises a click rate sub-model, inputting the user characteristics of the user into the recommendation model, and outputting the click rate of the user to a plurality of first videos;

when the recommendation model comprises an approval rate sub-model, inputting the user characteristics of the user into the recommendation model, and outputting the approval rates of the user on a plurality of first videos;

and when the recommendation model comprises the attention rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the attention rates of the user to the plurality of first videos.

Optionally, the recommending the plurality of first videos to the user according to the feedback probability of the user on the plurality of first videos includes:

determining the arrangement sequence of the plurality of first videos according to the click quantity, the like rate and/or the attention rate of the user on the plurality of first videos;

and recommending the plurality of first videos to the user according to the arrangement and the sequencing of the plurality of first videos.

Optionally, the training process of the recommendation model includes:

obtaining the plurality of sample videos;

extracting the predicted video features of a first sample video in the plurality of sample videos as positive samples, and extracting the predicted video features of a second sample video in the plurality of sample videos as negative samples;

training a preset recommendation model according to the positive sample and the negative sample to obtain the recommendation model;

the first sample video is a video which is subjected to feedback operation by a user during historical recommendation, and the second sample video is a video which is not subjected to feedback operation by the user during historical recommendation.

According to a second aspect of the embodiments of the present disclosure, there is provided a video recommendation apparatus including:

the video feature determination module is configured to determine video features of a plurality of first videos to be recommended according to target recommendation durations of the first videos, wherein the target recommendation duration of each first video is a duration of a current recommendation time from a recommendation starting time of the first video;

the feedback probability determination module is configured to determine feedback probabilities of the user performing feedback operations on the plurality of first videos according to user characteristics of the user and video characteristics of the plurality of first videos;

a recommending module configured to recommend the plurality of first videos to the user according to the feedback probability of the user to the plurality of first videos.

Optionally, the video feature determination module is further configured to determine image features, text features and/or audio features of the plurality of first videos when the target recommended durations of the plurality of first videos are not greater than a preset duration; determining the image features, text features, and/or audio features as video features of the plurality of first videos.

Optionally, the video feature determination module is further configured to determine, when the target recommended durations of the plurality of first videos are not greater than a preset duration, predicted video features of the plurality of first videos, where the predicted video features are used to represent predictions that the plurality of first videos are subjected to feedback operations; determining the predicted video features as video features of the plurality of first videos.

Optionally, the video feature determination module is further configured to perform classification processing on a plurality of second videos according to image features of the plurality of second videos to obtain a plurality of video types, where target recommended durations of the plurality of second videos are not less than a preset duration; determining the type of a target video to which each first video belongs according to the image characteristics of each first video; and determining the predicted video characteristics of each first video according to a plurality of second videos included in the target video type.

Optionally, the feedback probability determining module includes:

an input unit configured to input video characteristics of the plurality of first videos into a recommendation model for determining a feedback probability of a user on a video according to user characteristics of the user and video characteristics of the video;

the output unit is configured to input the user characteristics of the user into the recommendation model and output the feedback probability of the user on the plurality of first videos when the recommendation request of the user is received.

Optionally, the recommendation model comprises a user neural network and a video neural network,

the input unit is further configured to input the video features of the plurality of first videos into the recommendation model, and in a video neural network of the recommendation model, a video feature vector of each video is determined according to first network parameters in the video neural network and the video features of the plurality of first videos;

the output unit is further configured to input the user characteristics of the user into the recommendation model when a recommendation request of the user is received, and in a user neural network of the recommendation model, a user characteristic vector of the user is determined according to a second network parameter in the user neural network and the user characteristics; and determining the feedback probability of the user to the plurality of first videos according to the user feature vector and the video feature vector.

Optionally, the recommendation model comprises a click-through rate sub-model, a like-rate sub-model and/or an attention rate sub-model, and, correspondingly,

the feedback probability determination module is further configured to input the user characteristics of the user into the recommendation model and output the click rate of the user on a plurality of first videos when the recommendation model comprises a click rate sub-model; when the recommendation model comprises an approval rate sub-model, inputting the user characteristics of the user into the recommendation model, and outputting the approval rates of the user on a plurality of first videos; and when the recommendation model comprises the attention rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the attention rates of the user to the plurality of first videos.

Optionally, the recommending module is further configured to determine an arrangement order of the plurality of first videos according to the click rate, the like rate and/or the attention rate of the user on the plurality of first videos; and recommending the plurality of first videos to the user according to the arrangement and the sequencing of the plurality of first videos.

Optionally, the apparatus further comprises:

a model training module configured to obtain the plurality of sample videos; extracting the predicted video features of a first sample video in the plurality of sample videos as positive samples, and extracting the predicted video features of a second sample video in the plurality of sample videos as negative samples; training a preset recommendation model according to the positive sample and the negative sample to obtain the recommendation model;

According to a third aspect of the embodiments of the present disclosure, there is provided a video recommendation server, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

determining video characteristics of a plurality of first videos to be recommended according to a recommendation stage in which the first videos are recommended, wherein the recommendation stage is used for indicating a period in which a current recommendation time is within a recommended period of the first videos;

According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a server, enable the server to perform a video recommendation method, the method comprising:

According to a fifth aspect of embodiments of the present disclosure, there is provided an application program comprising one or more instructions which, when executed by a processor of a server, enable the server to perform a video recommendation method, the method comprising:

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the embodiment of the disclosure, the server may determine video characteristics of a plurality of first videos to be recommended according to target recommendation durations of the plurality of first videos, and determine feedback probabilities of the user performing feedback operations on the plurality of first videos according to user characteristics of the user and the video characteristics of the plurality of first videos; therefore, the plurality of first videos can be recommended to the user according to the feedback probability of the user on the plurality of first videos. The video features are determined according to the target recommendation duration to obtain the video features matched with the target recommendation duration, so that the problem that the predicted click rate is too small due to too few new video features with the target recommendation duration not greater than the preset duration is solved, the accuracy of the recommendation probability is improved, and the recommendation accuracy is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a video recommendation method in accordance with an exemplary embodiment;

FIG. 2 is a flow diagram illustrating a method of video recommendation in accordance with an exemplary embodiment;

FIG. 3 is a diagram illustrating a recommendation model structure according to an exemplary embodiment;

FIG. 4 is a block diagram illustrating a video recommendation device in accordance with an exemplary embodiment;

FIG. 5 is a block diagram illustrating a server for video recommendation, according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a video recommendation method according to an exemplary embodiment, which is applied to a server, as shown in fig. 1, and includes the following steps.

101. Determining video characteristics of a plurality of first videos to be recommended according to target recommendation time lengths of the plurality of first videos, wherein the target recommendation time length of each first video is the time length from the current recommendation time to the recommendation starting time of the first video;

102. determining the feedback probability of the user for performing feedback operation on the plurality of first videos according to the user characteristics of the user and the video characteristics of the plurality of first videos;

103. and recommending the plurality of first videos to the user according to the feedback probability of the user on the plurality of first videos.

Optionally, the determining the video features of the plurality of first videos to be recommended according to the target recommendation duration of the plurality of first videos to be recommended includes:

when the target recommended duration of the plurality of first videos is not greater than the preset duration, determining image features, text features and/or audio features of the plurality of first videos;

the image features, text features, and/or audio features are determined to be audio features of the plurality of first videos.

when the target recommended duration of the first videos is not greater than the preset duration, determining predicted video characteristics of the first videos, wherein the predicted video characteristics are used for representing prediction of feedback operation on the first videos;

the predicted video features are determined as video features of the plurality of first videos.

Optionally, when the target recommended duration of the first videos is not greater than the preset duration, determining the predicted video features of the first videos includes:

classifying the second videos according to the image characteristics of the second videos to obtain multiple video types, wherein the target recommended time length of the second videos is not less than the preset time length;

Optionally, the determining, according to the user characteristic of the user and the video characteristics of the plurality of first videos, the feedback probability of the user performing the feedback operation on the plurality of first videos includes:

inputting the video features of the plurality of first videos into the recommendation model, and determining a video feature vector of each video in a video neural network of the recommendation model according to the first network parameters in the video neural network and the video features of the plurality of first videos;

when a recommendation request of the user is received, inputting the user characteristics of the user into the recommendation model, and determining the user characteristic vector of the user in the user neural network of the recommendation model according to the second network parameters and the user characteristics in the user neural network;

Optionally, the recommending model includes a click rate sub-model, a like rate sub-model and/or an attention rate sub-model, and accordingly, when the recommending model receives the recommending request of the user, inputting the user characteristics of the user into the recommending model, and outputting the feedback probabilities of the user to the plurality of first videos includes:

when the recommendation model comprises a click rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the click rate of the user to a plurality of first videos;

when the recommendation model comprises the like rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the like rates of the user to a plurality of first videos;

determining the arrangement sequence of the first videos according to the click quantity, the like rate and/or the attention rate of the user on the first videos;

Optionally, the training process of the recommendation model includes:

obtaining the plurality of sample videos;

Fig. 2 is a flowchart illustrating a video recommendation method used in a server according to an exemplary embodiment, and includes the following steps, as shown in fig. 2.

201. The server determines the video characteristics of the plurality of first videos to be recommended according to the target recommendation duration of the plurality of first videos.

The target recommendation duration of each first video refers to the duration of the current recommendation time from the recommendation starting time of the first video. In the embodiment of the disclosure, the server is a server of a video application, the video application includes a plurality of first videos, and the server can recommend the plurality of first videos to a user in real time. For videos with different target recommendation durations, the server can recommend the first video by adopting different video characteristics. In this step, for a first video with a target recommendation duration not greater than a preset duration, the server obtains video features matched with the target recommendation duration. The server can obtain video content characteristics of the first videos, such as image characteristics, character characteristics and/or audio characteristics of the first videos. In addition, the server may predict the video characteristics of the first video in combination with the video characteristics of the plurality of second videos. Accordingly, this step can be implemented in the following two ways.

In a first mode, when the target recommended duration of the first videos is not more than the preset duration, the server determines image features, text features and/or audio features of the first videos; the image features, text features, and/or audio features are determined to be audio features of the plurality of first videos.

For each first video, the server may extract image features, audio features, and text features of the first video from the image and the audio included in each first video and the display page where the first video is located according to the video content of the first video and the display page where the first video is located.

For the image features, the server may directly extract a plurality of image features of the first video from a plurality of frames of images included in the first video, and further, the server may determine the image features of the plurality of first videos through a plurality of third videos, where the process may be: the server classifies the third videos according to image features of the third videos to obtain multiple video types, each video type comprises the third videos, for each first video, the server determines the video type of the first video according to the image features of the first video, and the server determines the image features of the first video according to the image features of the third videos included in the video type of the first video. The server can extract image features of cover images of the plurality of third videos, and cluster the plurality of third videos through a clustering algorithm according to the image features of the cover images of the plurality of third videos to obtain a plurality of clustering centers, wherein each clustering center corresponds to one video type, so that the plurality of third videos are divided into a plurality of video types according to the image features. The clustering algorithm may be set based on needs, which is not specifically limited in the embodiment of the present invention, and for example, the clustering algorithm may be a K-Means algorithm.

For text features, when a publisher of a video makes a video, text information describing the video may also be input, such as a video title of the video, a video topic associated with the video, and so on. The server may extract text features of the first video from a displayed page of the first video, for example, text in the displayed page that introduces the first video. In addition, the cover image of the video generally includes the text introduction of the video, for example, the publisher of the video, the content introduction, and other information, and the server may further extract the text feature of the first video from the video cover of the first video, and of course, the server may also convert the voice signal in the audio into text information according to the audio included in the first video, and extract the text feature from the text information. Therefore, the process of the server acquiring the text feature of the first video may be: the server extracts text features of the first video from a display page of the first video, a video cover and/or audio of the first video.

For the audio feature, the audio feature may include a music feature and a voice feature of the first video, the music feature is used for representing a feature of the background music used by the first video, and the voice feature is used for representing a feature of the audio in the first video except the background music, for example, a character conversation in a conversation scene in the first video. Therefore, the process of the server acquiring the audio feature of the first video may be: the server extracts the audio of the first video, extracts the voice feature of the first video from the audio, determines the background music used by the first video, and acquires the music feature of the first video according to the background music. Wherein the server may extract the music feature of the first video from the video information of the first video.

In a second mode, when the target recommended duration of the first videos is not longer than the preset duration, the server determines the predicted video characteristics of the first videos, wherein the predicted video characteristics are used for representing prediction of feedback operation on the first videos; the server determines the predicted video features as video features of the plurality of first videos.

For each first video, the server may determine a plurality of second videos matching the first video from the first video, and determine a predicted video feature of the first video from video features of the plurality of second videos. The target recommended time lengths of the plurality of second videos are not less than a preset time length. The preset time period may be set based on needs, and this is not specifically limited in the embodiment of the present invention. For example, the preset time period may be 24 hours, 5 hours, 2 days, etc.

If the server determines a plurality of second videos matching the first video according to the image features of the first video, this step may be: the server classifies the second videos according to image characteristics of the second videos to obtain multiple video types, each video type comprises the second videos, for each first video, the server determines a target video type to which the first video belongs according to the image characteristics of the first video, and the server determines predicted video characteristics of the first video according to the video characteristics of the second videos included in the target video type.

The predicted video feature is used to represent prediction of a plurality of first videos to be recommended, where the plurality of first videos are subjected to feedback operation. The server predicts the feedback operation of the first videos according to the image characteristics, text characteristics, audio characteristics and the like of the first videos, and determines the predicted video characteristics of the first videos according to the prediction result.

The server may extract video features of a plurality of second videos included in the target video type, determine video features corresponding to the target video type from the video features of the plurality of second videos, and use the video features corresponding to the target video type as the predicted video features of the first video. The server can analyze and count the video characteristics of the plurality of second videos included in each video type to obtain the video characteristics corresponding to each video type. The video characteristics of the second video at least comprise the characteristics that the second video is subjected to feedback operation in historical recommendation. For example, the video characteristics may include a click through rate, a like rate, a focus rate, etc. of the second video. The video features corresponding to each video type can be represented by a multi-dimensional feature vector.

The server can extract image features of cover images of the plurality of second videos, and cluster the plurality of second videos through a clustering algorithm according to the image features of the plurality of second videos to obtain a plurality of clustering centers, wherein each clustering center corresponds to one video type, so that the plurality of second videos are divided into a plurality of video types according to the image features. In one possible implementation, the image features may be represented by an image feature vector. The server stores an image classification model in advance, and the image classification model is used for determining an image feature vector of an image and classifying the image based on the image feature vector. The server can obtain the image feature vectors of the plurality of second videos through the image classification model. In this step, after extracting the cover images of the plurality of second videos, the server inputs the image features of the cover images of the plurality of second videos into the image classification model, converts the image features of the cover images into image feature vectors through the image classification model, and extracts the image feature vectors of the middle layer of the image classification model, thereby obtaining the image feature vectors of the plurality of second videos; and the server clusters the plurality of second videos through a clustering algorithm based on the image characteristic vectors of the plurality of second videos to obtain a plurality of clustering centers, wherein each clustering center corresponds to one video type. Wherein, the clustering algorithm can be a K-Means clustering algorithm.

In the embodiment of the disclosure, the server may determine, based on the video features of the plurality of second videos included in each video type, a video feature vector corresponding to the video type, and establish a correspondence between the plurality of video types and the video feature vector in advance. When the server determines the video type to which the first video belongs, the video feature vector corresponding to the first video is directly obtained from the corresponding relation between the video type and the video feature vector. The different clustering centers correspond to different clustering IDs, and the corresponding relationship between the plurality of video types and the video feature vector, that is, the corresponding relationship between the plurality of clustering IDs and the video feature vector, where the video feature vector may be a multi-dimensional vector.

202. When a recommendation request of a user is received, the server determines the user characteristics of the user.

In the embodiment of the disclosure, the user is a user waiting for a recommended video. When a server receives a recommendation request of a user, the server can obtain user characteristics of the user in multiple dimensions according to the recommendation request, the recommendation request is used for requesting to recommend a first video, and the recommendation request carries a user identifier of the user.

In the embodiment of the present disclosure, the user characteristics may include a user identifier of the user, and static characteristics and dynamic characteristics of the user. The static characteristics are used for representing attribute characteristics of the user, and the dynamic characteristics are used for representing the operation behaviors of the user on each history recommended video when the server recommends the video to the user. The static characteristics of the user may include, but are not limited to: the gender, age, hobby of the user, the region where the user is located, the device identifier of the device used by the user, and the like. The dynamic characteristics of the user can include the click rate, the like rate, the attention rate, the comment rate or the forwarding rate of the user when the server historically recommends videos. In the embodiment of the disclosure, the server may store and update the user characteristics of the plurality of users in the video application in real time. In this step, the server may obtain the static feature and the dynamic feature of the user according to the user identifier of the user.

203. And the server determines the feedback probability of the user for performing feedback operation on the plurality of first videos according to the user characteristics of the user and the video characteristics of the plurality of first videos.

In the embodiment of the disclosure, the server stores the plurality of first videos in advance and trains in advance to obtain a recommendation model, the recommendation model is used for determining the feedback probability of the user to the videos based on the user characteristics of the user and the video characteristics of the videos, and the server may determine the video characteristics of the plurality of first videos in advance according to the plurality of first videos and input the video characteristics of the plurality of first videos into the recommendation model. When a recommendation request of a user is received, the server inputs the user characteristics of the user into the recommendation model and outputs the feedback probability of the user to the plurality of first videos.

In the embodiment of the present disclosure, the recommendation model includes a click rate sub-model, a like rate sub-model and/or an attention rate sub-model, and accordingly, the feedback probability of the user to the plurality of first videos may be the click rate, the like rate and/or the attention rate. When the recommendation model comprises a click rate sub-model, inputting the user characteristics of the user into the recommendation model, and outputting the click rate of the user to a plurality of first videos; when the recommendation model comprises the like rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the like rates of the user to a plurality of first videos; and when the recommendation model comprises the attention rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the attention rates of the user to the plurality of first videos.

In the disclosed embodiment, the recommendation model comprises a user neural network and a video neural network, the user neural network comprises a plurality of layers of networks, each layer corresponds to a first network parameter, and the user neural network is used for determining a user feature vector corresponding to a user feature based on the first network parameter; of course, the video neural network includes multiple layers of networks, each layer corresponds to a first network parameter, and the video neural network is configured to determine a video feature vector corresponding to a video feature based on the second network parameter. The server inputs the video features of the plurality of first videos into the recommendation model, and in a video neural network of the recommendation model, the server determines a video feature vector of each video according to the first network parameters in the video neural network and the video features of the plurality of first videos. When the server receives a recommendation request of the user, the server inputs the user characteristics of the user into the recommendation model, and in a user neural network of the recommendation model, the server determines the user characteristic vector of the user according to the second network parameters and the user characteristics in the user neural network. Then, in the recommendation model, the server determines the feedback probability of the user to the plurality of first videos according to the user feature vector and the video feature vector.

The recommendation model can be a neural network model, the neural network model can include a user neural network and a video neural network, the user neural network is used for converting user features of a user into user feature vectors, and the video neural network is used for converting video features of a video into video feature vectors. The server can periodically obtain video features of a plurality of first videos, input the video features of the plurality of first videos into the recommendation model, determine video feature vectors of the first videos through a video neural network in the recommendation model, and store the video feature vectors. When a recommendation request of a user is received, the server inputs the user characteristics of the user into the recommendation model, and determines the user characteristic vector of the user through a user neural network in the recommendation model. Wherein the process of the server determining the feedback probability of the user to the plurality of first videos based on the recommendation model may be: in the recommendation model, the server determines a vector distance between the user feature vector and each video feature vector according to the user feature vector of the user and the video feature vector of each first video; the server determines the feedback probability of the user to each first video through the feedback probability expression.

The server can determine the vector distance between the user feature vector and the video feature vector through the following formula one:

the formula I is as follows:

wherein A is a user feature vector of a user, B is a video feature vector of a first video, and A and B are belonged to R^dA, B may be a matrix vector of d columns. i is the ith column in the user feature vector or video feature vector.

Then, the server determines the feedback probability of the user for each first video according to the vector distance by the following feedback probability expressions:

feedback probability expression: σ (a) < 1/(1+ e)^-a)

Where a is a · B, e.g. a is a user feature vector of the user, B is a video feature vector of the first video, a, B ∈ R^dA, B may be a matrix vector of d columns.

The recommendation model may include a click rate submodel, a like rate submodel and/or an attention rate submodel, and the server may further determine the click rate, the like rate and/or the attention rate of the user on the plurality of first videos respectively. Taking the click rate as an example, the server may determine the user feature vector and the video feature vector corresponding to the click rate through the click rate sub-model, and determine the click rate of the user and each first video through the first formula and the feedback probability expression according to the user feature vector and the video feature vector corresponding to the click rate.

As shown in fig. 3, the recommendation model may be a neural network model, the neural network model is divided into a user neural network and a video neural network, each of the user neural network and the video neural network includes a plurality of layers of networks, each layer of network corresponds to a network parameter, as shown in fig. 3, the characteristics of the user are input into the neural network model, are transformed layer by layer to a next top layer a, and are output to top layers a1, a2 and A3, which respectively represent a top layer vector of the user-side click rate sub-model, a top layer vector of the click rate sub-model and a top layer vector of the attention rate sub-model. For the video side, the video side is also a multi-layer fully-connected neural network, and features of the video side are input into the network, are transformed layer by layer to the next top layer B, and then are output to the top layers B1, B2 and B3 to respectively represent a top layer vector of a click rate sub-model, a top layer vector of a click rate sub-model and a top layer vector of an attention rate sub-model of the video side. Then, the server calculates the inner product distance of vectors a1 and B1, the inner product distance of vectors a2 and B2, and the inner product distance of vectors A3 and B3, respectively, by the above formula one. The server determines the click rate, the like rate and the attention rate of the user on the first video according to the feedback probability expression and based on the inner product distance of the vectors A1 and B1, the inner product distance of the vectors A2 and B2 and the inner product distance of the vectors A3 and B3 respectively.

The server can be trained to obtain the recommendation model based on the sample video in advance. The training process of the recommendation model may be: the server acquires the plurality of sample videos; the server extracts the predicted video features of a first sample video in the plurality of sample videos as positive samples and extracts the predicted video features of a second sample video in the plurality of sample videos as negative samples; and training a preset recommendation model according to the positive sample and the negative sample to obtain the recommendation model. The first sample video is a video which is subjected to feedback operation by a user during historical recommendation, and the second sample video is a video which is not subjected to feedback operation by the user during historical recommendation. Wherein, for the click rate sub-model in the recommendation model, the positive sample is: when recommending videos to users, the negative sample of the video characteristics of the videos clicked by the users is: when a video is recommended to a user, video characteristics of the video which are not clicked by the user are obtained; for the praise rate submodel, the positive samples are: video features of videos that a user likes when recommending videos to the user, negative examples: when a video is recommended to a user, video features of the video that are not praised by the user; for the rate of interest submodel, the positive samples are: when a video is recommended to a user, the negative examples of the video characteristics of the video concerned by the user are as follows: when a video is recommended to a user, there are no video features of the video that are of interest to the user. The server trains based on the positive samples and the negative samples of the three sub-models respectively to obtain the recommended model.

The server may also update the recommendation model in real time, mainly updating network parameters in the recommendation model, and the process may be: and the server determines the minimum value of the loss function of the recommendation model according to the sample user characteristic vector and the sample video characteristic vector, and updates the recommendation model layer by layer to summarize the network parameters of each layer of neural network according to the gradient of the loss function. The server may minimize the loss function by using a random gradient descent method, solve the gradient of the loss function, and then update the network parameters of the network layer by layer. And for the click rate submodel, the server calculates a loss function according to the video characteristics of the sample video corresponding to the click rate submodel, calculates the gradient of the loss function, and updates the parameters of the network layer corresponding to the click rate submodel. As shown in fig. 3, according to the multi-layer neural network included in the neural network model, the server may update the parameters of top layers a1 and B1 corresponding to the click rate sub-model, then update the parameters of second top layers a and B, and then update the network parameters of the user side and the network parameters of the video side. And for the praise rate estimation, calculating a loss function according to the video characteristics of the sample video corresponding to the praise rate sub-model, calculating a gradient, and updating the parameters of the network layer corresponding to the praise rate sub-model. The server can update the parameters of top layers A2 and B2 corresponding to the click rate submodel, then update the parameters of secondary top layers A and B, and then update the network parameters of the user side and the network parameters of the video side; and for the attention rate estimation, calculating a loss function according to the video characteristics of the sample video corresponding to the attention rate submodel, calculating a gradient, and updating the parameters of the network layer corresponding to the attention rate submodel. The server may update the parameters of top layers a3 and B3 corresponding to the attention rate submodel, then update the parameters of second top layers a and B, and then update the network parameters of the user side and the network parameters of the video side.

Wherein the loss function is: l (A)_t,B_t)＝-y_tlogp_t-(1-y_t)log(1-p_t)

Wherein A is_t,B_t∈Rⁿ，A_t、B_tCan be respectively a sample user feature vector and a sample video feature vector, A_t、B_tCan be a matrix vector of n columns, wherein, in the neural network model, the A_tMay be the top-level vector of the user side, this B_tCan be a top-level vector of a video side, and the estimated probability is p_t＝σ(A_t·B_t) σ is used to identify the feedback probability expression: σ (a) < 1/(1+ e)^-a) Where a is A_t·B_t。y_tIs a label for the sample. The value of the sample label may be 1 when the user feeds back the video, and may be 0 when the user does not feed back the video.

In the embodiment of the disclosure, the server may determine the feedback probability through a recommendation model. Moreover, the recommendation model comprises a user neural network and a video neural network, and can respectively determine a user characteristic vector through the user neural network, determine a video characteristic vector through the video neural network, the two determination processes can be independently carried out without mutual influence, so that the video characteristic vector can be determined in advance, when a recommendation request of a user is received, only the determination process of the user feature vector can be executed, the feedback probability is directly determined based on the determined video feature vector and the user feature vector, during the time when the user waits to be pushed, the process of determining the video feature vector is not required to be executed, compared with the multi-layer complex neural network model in the prior art, the feedback probability determining method and the device can greatly improve the efficiency of determining the feedback probability by the recommendation model.

In addition, the recommendation model further comprises a click rate submodel, a like rate submodel and an attention rate submodel, so that recommendation can be performed based on the click rate, the like rate and the attention rate of the user to the video, the feedback condition between the user and the video is comprehensively considered from multiple angles, and the accuracy and the practicability of video recommendation are improved.

204. And the server recommends the plurality of first videos to the user according to the feedback probability of the user to the plurality of first videos.

In this step, the server recommends, to the user, the first video of which the feedback probability is not less than the preset threshold value among the plurality of first videos according to the feedback probability of the user to the plurality of first videos. The server may further determine an arrangement order of the plurality of first videos according to the click amount, the like rate and/or the attention rate of the user on the plurality of first videos, and recommend the plurality of first videos to the user according to the arrangement order of the plurality of first videos.

For each first video, the server may determine, according to weights corresponding to the approval rate, the click rate, and the attention rate of the first video, products of the approval rate, the click rate, and the attention rate and the corresponding weights, and sum up the three products to obtain a recommendation value of the first video, and the server performs descending order arrangement on the plurality of first videos according to the recommendation value of each first video, and recommends the plurality of first videos to the user in sequence according to the descending order, so that the first videos with higher recommendation can be preferentially sent.

FIG. 4 is a block diagram illustrating a video recommendation device, according to an example embodiment. The apparatus is applied to a server, and referring to fig. 4, the apparatus includes: a video feature determination module 401, a feedback probability determination module 402 and a recommendation module 403.

The video feature determination module 401 is configured to determine video features of a plurality of first videos to be recommended according to target recommendation durations of the plurality of first videos, where the target recommendation duration of each first video is a duration of a current recommendation time from a recommendation start time of the first video;

a feedback probability determination module 402 configured to determine a feedback probability of a user performing a feedback operation on a plurality of first videos according to user characteristics of the user and video characteristics of the plurality of first videos;

and a recommending module 403 configured to recommend the plurality of first videos to the user according to the feedback probability of the user on the plurality of first videos.

Optionally, the video feature determination module 401 is further configured to determine image features, text features and/or audio features of the plurality of first videos when the target recommended durations of the plurality of first videos are not greater than a preset duration; the image features, text features, and/or audio features are determined to be audio features of the plurality of first videos.

Optionally, the video feature determination module 401 is further configured to determine, when the target recommended durations of the plurality of first videos are not greater than the preset duration, predicted video features of the plurality of first videos, where the predicted video features are used to represent predictions that the plurality of first videos are subjected to feedback operations; the predicted video features are determined as video features of the plurality of first videos.

Optionally, the video feature determination module 401 is further configured to perform classification processing on a plurality of second videos according to image features of the plurality of second videos to obtain a plurality of video types, where target recommended durations of the plurality of second videos are not less than the preset duration; determining the type of a target video to which each first video belongs according to the image characteristics of each first video; and determining the predicted video characteristics of each first video according to a plurality of second videos included in the target video type.

Optionally, the feedback probability determining module 402 includes:

an input unit configured to input video characteristics of the plurality of first videos into a recommendation model for determining a feedback probability of a user to the video according to user characteristics of the user and video characteristics of the video;

and the output unit is configured to input the user characteristics of the user into the recommendation model and output the feedback probability of the user on the plurality of first videos when the recommendation request of the user is received.

Optionally, the recommendation model includes a user neural network and a video neural network,

the output unit is further configured to input the user characteristics of the user into the recommendation model when a recommendation request of the user is received, and determine a user characteristic vector of the user in a user neural network of the recommendation model according to the second network parameters and the user characteristics in the user neural network; and determining the feedback probability of the user to the plurality of first videos according to the user feature vector and the video feature vector.

Optionally, the recommendation model includes a click-through rate sub-model, a like-rate sub-model and/or an attention rate sub-model, and, accordingly,

the feedback probability determination module 402 is further configured to, when the recommendation model includes a click rate sub-model, input the user characteristics of the user into the recommendation model, and output click rates of the user on a plurality of first videos; when the recommendation model comprises the like rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the like rates of the user to a plurality of first videos; and when the recommendation model comprises the attention rate submodel, inputting the user characteristics of the user into the recommendation model, and outputting the attention rates of the user to the plurality of first videos.

Optionally, the recommending module 403 is further configured to determine an arrangement order of the first videos according to the click rate, the like rate and/or the attention rate of the user on the first videos; and recommending the plurality of first videos to the user according to the arrangement and the sequencing of the plurality of first videos.

Optionally, the apparatus further comprises:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 5 is a block diagram illustrating a server for video recommendation, according to an example embodiment. The server 500 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the video recommendation method provided by the above-mentioned method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of a server to perform the video recommendation method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, an application is also provided that includes one or more instructions executable by a processor of a server to perform the video recommendation method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for video recommendation, comprising:

determining video characteristics matched with the target recommendation durations of the plurality of first videos to be recommended according to the target recommendation durations of the plurality of first videos, wherein the target recommendation duration of each first video is the duration of the current recommendation time from the recommendation starting time of the first video;

inputting the video characteristics of the plurality of first videos into a video neural network of a recommendation model to obtain a top vector of a video side click rate sub-model, a top vector of a video side praise rate sub-model and a top vector of a video side attention rate sub-model;

when a recommendation request of a user is received, inputting user characteristics of the user into a user neural network of the recommendation model to obtain a top vector of a user side click rate sub-model, a top vector of a user side praise rate sub-model and a top vector of a user side attention rate sub-model;

determining a first inner product distance between a top layer vector of the video side click rate sub-model and a top layer vector of the user side click rate sub-model, a second inner product distance between the top layer vector of the video side click rate sub-model and the top layer vector of the user side click rate sub-model, and a third inner product distance between the top layer vector of the video side attention rate sub-model and the top layer vector of the user side attention rate sub-model;

determining the feedback probability of the user to the plurality of first videos according to the first inner product distance, the second inner product distance and the third inner product distance;

2. The video recommendation method according to claim 1, wherein the determining, according to the target recommendation durations of the first videos to be recommended, the video features that match the target recommendation durations of the first videos comprises:

3. The video recommendation method according to claim 1, wherein the determining, according to the target recommendation durations of the first videos to be recommended, the video features that match the target recommendation durations of the first videos comprises:

4. The video recommendation method according to claim 3, wherein said determining the predicted video characteristics of the first videos when the target recommendation durations of the first videos are not greater than the preset duration comprises:

5. The video recommendation method according to claim 1, wherein said inputting the video features of the plurality of first videos into a video neural network of a recommendation model to obtain a top vector of a video side click rate sub-model, a top vector of a video side like rate sub-model, and a top vector of a video side attention rate sub-model comprises:

inputting the video characteristics of the plurality of first videos into a video neural network of the recommendation model, and determining a top vector of the video side click rate sub-model, a top vector of the video side click rate sub-model and a top vector of the video side attention rate sub-model according to first network parameters in the video neural network and the video characteristics of the plurality of first videos;

correspondingly, when a recommendation request of a user is received, inputting the user characteristics of the user into the user neural network of the recommendation model to obtain a top vector of the user-side click rate sub-model, a top vector of the user-side click rate sub-model and a top vector of the user-side attention rate sub-model, and the method comprises the following steps:

when a recommendation request of the user is received, inputting the user characteristics of the user into a user neural network of the recommendation model, and determining a top vector of the user side click rate sub-model, a top vector of the user side click rate sub-model and a top vector of the user side attention rate sub-model according to a second network parameter and the user characteristics in the user neural network.

6. The video recommendation method according to claim 1, wherein said recommending the first videos to the user according to the feedback probabilities of the user on the first videos comprises:

determining the arrangement sequence of the plurality of first videos according to the click rate, the like rate and the attention rate of the user on the plurality of first videos;

7. The video recommendation method according to claim 1, wherein the training process of the recommendation model comprises:

obtaining the plurality of sample videos;

8. A video recommendation apparatus, comprising:

the video feature determination module is configured to determine video features matched with the target recommendation durations of the plurality of first videos to be recommended according to the target recommendation durations of the plurality of first videos, wherein the target recommendation duration of each first video is the duration of the current recommendation time from the recommendation starting time of the first video;

the feedback probability determination module is configured to input the video characteristics of the plurality of first videos into a video neural network of a recommendation model to obtain a top vector of a video side click rate sub-model, a top vector of a video side praise rate sub-model and a top vector of a video side attention rate sub-model; when a recommendation request of a user is received, inputting user characteristics of the user into a user neural network of the recommendation model, wherein the top vector of the user side click rate sub-model, the top vector of the user side praise rate sub-model and the top vector of the user side attention rate sub-model are included; determining a first inner product distance between a top layer vector of the video side click rate sub-model and a top layer vector of the user side click rate sub-model, a second inner product distance between the top layer vector of the video side click rate sub-model and the top layer vector of the user side click rate sub-model, and a third inner product distance between the top layer vector of the video side attention rate sub-model and the top layer vector of the user side attention rate sub-model; determining the feedback probability of the user to the plurality of first videos according to the first inner product distance, the second inner product distance and the third inner product distance;

9. The video recommendation device of claim 8,

the video feature determination module is further configured to determine image features, text features and/or audio features of the plurality of first videos when the target recommended durations of the plurality of first videos are not greater than a preset duration; determining the image features, text features, and/or audio features as video features of the plurality of first videos.

10. The video recommendation device of claim 8,

the video feature determination module is further configured to determine predicted video features of the plurality of first videos when the target recommended duration of the plurality of first videos is not greater than a preset duration, wherein the predicted video features are used for representing prediction of the plurality of first videos to be subjected to feedback operation; determining the predicted video features as video features of the plurality of first videos.

11. The video recommendation device of claim 10,

the video feature determination module is further configured to classify the plurality of second videos according to image features of the plurality of second videos to obtain a plurality of video types, wherein target recommended duration of the plurality of second videos is not less than the preset duration; determining the type of a target video to which each first video belongs according to the image characteristics of each first video; and determining the predicted video characteristics of each first video according to a plurality of second videos included in the target video type.

12. The video recommendation device of claim 8, wherein the feedback probability determination module is further configured to input the video features of the plurality of first videos into a video neural network of the recommendation model, and determine a top-level vector of the video side click rate sub-model, and a top-level vector of the video side interest rate sub-model according to the first network parameters in the video neural network and the video features of the plurality of first videos;

the feedback probability determination module is further configured to input the user characteristics of the user into a user neural network of the recommendation model when a recommendation request of the user is received, and determine a top vector of the user-side click rate sub-model, a top vector of the user-side thumbs-up rate sub-model and a top vector of the user-side attention rate sub-model according to a second network parameter in the user neural network and the user characteristics.

13. The video recommendation device of claim 8,

the recommendation module is further configured to determine an arrangement order of the plurality of first videos according to click rate, praise rate and/or attention rate of the user on the plurality of first videos; and recommending the plurality of first videos to the user according to the arrangement and the sequencing of the plurality of first videos.

14. The video recommendation device of claim 8, wherein said device further comprises:

15. A video recommendation server, comprising:

one or more processors;

one or more memories for storing one or more processor-executable instructions;

wherein the one or more processors are configured to:

16. A non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a server, enable the server to perform a video recommendation method, the method comprising:

determining video characteristics matched with the target recommendation durations of the plurality of first videos to be recommended according to the target recommendation durations of the plurality of first videos, wherein the target recommendation duration of each first video is the duration of the current recommendation time from the first video recommendation starting time;