CN110781321A

CN110781321A - Multimedia content recommendation method and device

Info

Publication number: CN110781321A
Application number: CN201910804665.0A
Authority: CN
Inventors: 刘鹏; 张伸正; 吴敬桐
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2020-02-11
Anticipated expiration: 2039-08-28
Also published as: CN110781321B

Abstract

The application provides a multimedia content recommendation method and device, and relates to the technical field of machine learning. The method comprises the following steps: obtaining a real-time interest vector of a user according to a first multimedia content set operated by the user in a set time period before the current time and an operation behavior aiming at each first multimedia content, wherein each component in the real-time interest vector is used for expressing the preference degree of the user on the multimedia content in the set time period; obtaining a user feature vector of a user; obtaining a multimedia content feature vector of each multimedia content to be recommended in a second multimedia content set consisting of the multimedia contents to be recommended; determining the matching degree of the user characteristic vector and the multimedia content characteristic vector of each multimedia content to be recommended through a trained multimedia content recommendation model; and recommending the multimedia contents with the matching degree meeting the preset condition in the second multimedia content set.

Description

Multimedia content recommendation method and device

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a multimedia content recommendation method and apparatus.

Background

In order to meet the requirements of the user, the short video playing application automatically recommends some short videos for the user.

The current way to recommend short videos is: short videos with high similarity to a large number of short video tags viewed by the user are recommended to the user through the short video tags. However, the short video tags are usually manually labeled, so that the matching degree of the recommended short videos determined according to the short video tags and the requirements of the user is not high, and the accuracy of the recommended short videos for the user is low. The same problem exists with the recommendation of other multimedia content.

Disclosure of Invention

The embodiment of the application provides a multimedia content recommendation method and device, which are used for improving the accuracy of recommended multimedia content.

In a first aspect, a multimedia content recommendation method is provided, including:

obtaining a real-time interest vector of a user according to a first multimedia content set operated by the user in a set time period before the current time and an operation behavior aiming at each first multimedia content, wherein each component in the real-time interest vector is used for expressing the preference degree of the user on the multimedia content in the set time period;

obtaining a user feature vector of a user, wherein the user feature vector comprises the real-time interest vector and a user attribute vector;

obtaining a multimedia content feature vector of each multimedia content to be recommended in a second multimedia content set consisting of the multimedia contents to be recommended;

determining the matching degree of the user characteristic vector and the multimedia content characteristic vector of each multimedia content to be recommended through a trained multimedia content recommendation model; the multimedia content recommendation model is obtained by training according to a multimedia content training sample set;

and recommending the multimedia contents with the matching degree meeting the preset condition in the second multimedia content set.

In a second aspect, a multimedia content recommendation apparatus is provided, including:

the obtaining module is used for obtaining real-time interest vectors of users according to a first multimedia content set operated by the users in a set time period before the current time and the operation behavior aiming at each first multimedia content, wherein each component in the real-time interest vectors is used for expressing the preference degree of the users to the multimedia content in the set time period; obtaining user feature vectors of users, wherein the user feature vectors comprise the real-time interest vectors and the user attribute vectors, and obtaining multimedia content feature vectors of all multimedia contents to be recommended in a second multimedia content set consisting of the multimedia contents to be recommended;

the determining module is used for determining the matching degree of the user characteristic vector and the multimedia content characteristic vector of each multimedia content to be recommended through the trained multimedia content recommendation model; the multimedia content recommendation model is obtained by training according to a multimedia content training sample set;

and the recommending module is used for recommending the multimedia contents with the matching degree meeting the preset condition in the second multimedia content set.

In a possible implementation, the obtaining module is specifically configured to:

performing embedding learning on the first multimedia content set to obtain an embedding vector of each multimedia content in the first multimedia content set;

according to the sequence of each multimedia content in the first multimedia content set operated by a user, weighting the similarity of the embedded vector of each multimedia content in the first multimedia content set and the multimedia content interest vector of the previous multimedia content of the multimedia content and the playing completion degree of the multimedia content to obtain the multimedia content interest vector of the multimedia content until the multimedia content interest vector of the last multimedia content in the first multimedia content set is obtained;

and weighting the multimedia content interest vector of each multimedia content in the first multimedia content set to obtain the real-time interest vector of the user.

In a possible implementation, the closer each multimedia content in the first set of multimedia content is to the current time, the greater the weight of the multimedia content interest vector of that multimedia content.

In a possible implementation manner, the first multimedia content set operated within the set time period before the current time includes each multimedia content operated within the current time after the time when the user logs in the multimedia playing application this time.

In a possible implementation, the apparatus further includes a training module, and the training module is specifically configured to:

acquiring a multimedia content training sample set; each multimedia content training sample in the multimedia content training sample set comprises a sample user feature vector and each multimedia content feature vector in an exposure multimedia content set, wherein the sample user feature vector comprises a real-time interest vector of a user aiming at the exposure multimedia content set, a value of a click label of each multimedia content feature in the exposure multimedia content set, and a preference weight of the user aiming at each exposure multimedia content in the exposure multimedia content set;

and training a multimedia content recommendation model according to the multimedia content training sample set until a preset loss function is converged, and obtaining the trained multimedia content recommendation model.

In a possible implementation manner, the preset loss function is obtained by weighting a cross-entropy loss function and a regularization term, and a weight corresponding to the cross-entropy loss function is a preference weight of the user for each exposed multimedia content in the set of exposed multimedia contents.

In a possible implementation manner, the preference weight of the user for each multimedia content is obtained by weighting values of multiple types of interaction tags for the multimedia content.

In a third aspect, a computer device is provided, comprising:

at least one processor, and

a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, and the at least one processor implements the method according to any one of the first aspect and possible embodiments by executing the instructions stored by the memory.

In a fourth aspect, a computer-readable storage medium is provided, which stores computer instructions that, when executed on a computer, cause the computer to perform the method according to any of the first aspect and possible embodiments.

In the embodiment of the application, the real-time interest vector of the user is obtained by learning the multimedia content operated in the set time period, the characteristic vector of each multimedia content is determined according to the multimedia content to be recommended, and the matching degree of the user and each multimedia content is determined based on the characteristic vector of the user and the characteristic vector of the multimedia content, compared with the prior art that the multimedia content is recommended only according to the multimedia content label, the real-time interest vector and the characteristic vector of the multimedia content of the user are considered in the embodiment of the application, the real-time interest vector of the user can be changed along with the process of the multimedia content operated by the user, so that the screened multimedia content is more in line with the change of the preference of the user, the accuracy of determining the recommended multimedia content can be improved, the user does not need to find the user likes from massive multimedia content, the experience of the user for viewing the multimedia content can be relatively improved.

Drawings

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of each device in the scenario shown in fig. 1 according to an embodiment of the present application;

fig. 3 is a schematic diagram illustrating a process of recommending a short video for a user by a server according to an embodiment of the present application;

fig. 4 is a flowchart of a method for recommending short videos according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a user performing a short video pull operation according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for obtaining a real-time interest vector of a user according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram illustrating a weight distribution of real-time interest vectors of respective short videos according to an embodiment of the present application;

fig. 8 is a schematic diagram of a process for training a short video recommendation model according to an embodiment of the present application;

fig. 9 is an exemplary diagram for determining a matching degree between a user and a short video according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a multimedia content recommendation apparatus according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the drawings and specific embodiments.

In order to facilitate those skilled in the art to better understand the technical solutions in the present application, the terms related to the present application are explained below.

Multimedia content: it refers to content including short video, music, etc. that can be played on a multimedia playing application or a web page.

The multimedia playing application comprises the following steps: the application is capable of providing multimedia playing, such as some news APPs, video playing APPs, live APPs, music playing APPs, and the like. The user can log in the multimedia playing application and open the operations of clicking, sharing, commenting and the like on the multimedia content.

Short video playback application: one of the multimedia playing applications, the short video playing application refers to an application capable of providing short video playing, such as some news APPs, video playing APPs, or live APPs. The user can log in the short video playing application and open the short video in the short video playing application to perform operations such as clicking, sharing and commenting.

The user logs in the multimedia playing application at this time and plays each multimedia content operated in the current time: the information corresponding to the multimedia content of the multimedia content clicked by the user in the current continuous access process can be understood, for example, the multimedia application is a news APP, and each multimedia content operated by the user in the current time after the user logs in the multimedia playing application time can be understood as the multimedia content operated by the user in the process from the time when the user opens the news APP to the time when the user operates the news APP.

The playing completion degree: the condition for indicating that the user completes playing the multimedia content may be represented by dividing the playing time length of the user by the total time length of the multimedia content. If the user plays the multimedia multiple times in succession, the total duration of the multimedia content can be understood as the total duration of the playing of the single multimedia content multiplied by the playing times, and the playing duration of the user is the accumulated duration of the continuous multiple playing of the user. Taking a short video as an example, the playing completion degree can be represented by dividing the duration of the short video played by the user by the total duration of the short video.

Exposure of multimedia content set: the exposure of the multimedia contents in the multimedia content set for all the multimedia contents recommended for the user in the preset time period comprises the multimedia contents which are already operated by the user after the recommendation and the multimedia contents which are not already operated by the user after the recommendation. The preset time period can be set according to actual requirements. The multimedia content takes short videos as an example, and the exposure short video set can be understood as all short videos recommended by the user for the content in a preset time period.

The operation behaviors are as follows: the method refers to an operation performed by a user for multimedia content, for example, the user clicks a short video, shares the short video, forwards the short video, and the like.

Interactive labeling: the label is a label that indicates the specific operation behavior type performed by the user for the multimedia content, and is generally generated after the operation behavior performed by the user for the multimedia content. The interactive tags include a variety of tags, including a click tag for indicating whether the user clicks the multimedia content, a duration tag for playing the multimedia content, a sharing tag for indicating whether the user shares the multimedia content, and a comment tag for indicating whether the user comments the multimedia content.

Preference weight: and weighting the value of the interactive label corresponding to each multimedia content by the user. The value of the interactive label can be understood as a result of quantifying the operation behavior of the user for the multimedia content, and can reflect the degree of interaction between the user and the multimedia content to some extent.

Click Through Rate (CTR): an index for evaluating the accuracy of recommending multimedia content means that the number of times a user clicks to recommend multimedia content is divided by the number of recommended multimedia content, and a higher CTR indicates a higher accuracy of recommendation.

Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Multimedia content recommendation model: and training the obtained multimedia content recommendation model according to the multimedia content training sample set. The training sample set of multimedia content is learned from the user's previous viewing behavior. For different types of multimedia contents, a multimedia content recommendation model can be obtained by training multimedia content training samples of the category corresponding to the multimedia content.

Multimedia content training sample set: the multimedia content exposure system comprises a plurality of multimedia content training samples, wherein each multimedia content training sample comprises a sample user characteristic vector and each multimedia content characteristic vector in an exposure multimedia content frequency set, the sample user characteristic vector comprises a real-time interest vector of a user aiming at the exposure multimedia content set, a value of a click label of each multimedia content characteristic in the exposure multimedia content set, and a preference weight of the user aiming at each exposure multimedia content in the exposure multimedia content set.

Semantic model based on Deep web (Deep Structured Semantic Models, DSSM): one of the multimedia content recommendation models includes an input layer, a presentation layer, and an output layer. And determining the matching degree of the user and the multimedia content according to the user characteristic vector and the multimedia content characteristic vector.

Multilayer Perceptron (MLP): also known as Artificial Neural Network (ANN), in addition to input and output layers, there may be multiple hidden layers in between, and the simplest MLP only contains one hidden layer. The MLP may be referred to as a presentation layer portion in the DSSN.

Content-based Collaborative Filtering (ICF): one strategy of the multimedia content recommendation model is to consider the characteristics of multimedia content for analysis when recommending multimedia content, and recommend multimedia content for a user that is similar to the multimedia content previously viewed by the user.

User-based Collaborative Filtering (UCF): when recommending the user, the strategy of the multimedia content recommendation model determines the user with similar interest to the user, and recommends the multimedia content watched by the similar user to the user.

The user feature vector: generally refers to a feature vector composed of feature components related to a user, such as user attribute features and a real-time interest vector of the user. The user attribute features are used for representing some information which may influence video recommendation of the user, and the information is more stable compared with a real-time interest vector of the user, such as the age, the sex, the portrait of the user and the like of the user. The user is presented with features derived from the user's previous behavior habits, such as multimedia content that the user likes for a long time, including short video categories or music categories.

Multimedia content feature vector: the multimedia contents feature vector includes a feature vector for representing that some feature vectors are learned based on the multimedia contents, for example, a feature vector obtained by learning a multimedia contents category, a multimedia contents click rate, a multimedia exposure rate, and the like.

Embedding (embedding) learning is used to convert input data into a vector having a fixed size. For example, each short video pair in the short video set clicked by the user is trained through an embedding learning algorithm, and an embedded vector corresponding to each short video can be obtained.

Video interest vector: and obtaining a vector by adopting a certain specified calculation method according to the Embedding vector of the first short video set. For example, the video interest vector may be weighted based on the similarity of the short video interest vector of the short video and the previous short video, and the playing completion of the short video. The first set of short videos is a specific example of a first set of multimedia content.

Real-time interest vector of user: for indicating the user's preference for multimedia content for a set period of time. For example, real-time interest vectors of the user for short videos may be obtained by weighting the interest vectors of the user for each short video in a set time period.

Short video feature vector: some features used to represent the short video itself, such as short video category labels for the short video, exposure of the short video, number of plays and click volume, etc.

Small batch Gradient Descent method (Mini-batch Gradient Descent, MBGD): and a small batch of samples are used for updating the parameters during each training, so that the speed and the accuracy of model training are improved.

The following describes the design concept of the embodiment of the present application by taking a short video as an example, and the design concept of the embodiment of the present application is also applicable to other multimedia contents.

Currently, short videos recommended for a user are determined based on short video tags of short videos that a large number of users have viewed and a trained short video recommendation model.

The inventor of the application finds that in the existing mode, the short video tag is generally manually labeled in advance, and the matching degree of the short video tag and the content of the short video is possibly not high, so that the matching degree of the recommended short video determined for the user according to the labeled short video tag and the content actually wanted to be watched by the user is possibly not high, that is, the accuracy of recommending the short video is low.

In view of this, the inventor of the present application designs a short video recommendation method, which obtains matching degrees between a user and each short video by inputting the user feature vectors and each short video feature vector into a trained short video recommendation model, and then recommends a plurality of short videos corresponding to matching degrees of which matching degrees satisfy a preset condition. Compared with the mode that the short video tags are directly used for recommending videos for users in the prior art, the embodiment of the application combines the real-time interest vectors of the users and the related information of the short videos for recommendation, and the real-time interest vectors of the users represent the preference degree of the users to the short videos, so that the short videos which are more in line with the preference of the users can be screened out, and the accuracy of recommending the short videos is improved.

The inventor further considers that, in some cases, the user clicks a certain short video, but the user may be attracted by the name of the short video, etc., but the user is not interested in the short video itself, so that, when determining the real-time interest vector of the user for each short video, the inventor considers not only the content of each short video itself, but also the operation behavior data of the user for the short video, the real-time interest vector correlation degree of the content of the short video and the previous short video, etc. The relevance of the real-time interest vectors of the short video and the last short video and the operation behavior data of the short video are combined, so that the real-time interest vectors of the user which are relatively more in line with the preference degree of the user can be obtained.

The inventor further considers that the real-time interest vector of the user is continuously changed in the process of playing the short video by the user, so that the user has the corresponding video interest vector after playing each short video. The inventor finds that dimension disasters are easily caused if the short video interest vector corresponding to each short video is input into the short video recommendation model. Therefore, the inventor of the application considers that short video interest vectors corresponding to each short video of a user can be determined first, and the video interest vectors of each short video are weighted to obtain the real-time interest vectors of the user, so that the situation of dimension disaster caused by excessive input of a short video recommendation model can be avoided while the real-time interest vectors of the user for representing the preference of the user are obtained.

The inventor of the present application further finds that interest preferences of a user for short videos may change with time, and when short video recommendation is performed, if current preferences of the user are considered, short videos most needed by the user at present can be recommended better.

The inventor further considers that the interest preference of the user for the short videos may change at different stages, and if the real-time interest vectors of all the short videos clicked by the user before are calculated to calculate the real-time interest vectors of the user. On one hand, the short videos participating in calculation are more, and the calculation amount is larger; on the other hand, the more short videos participating in the calculation, the more current preference of the user is ignored, and the determined real-time interest vector of the user cannot represent the current preference of the user, so that the inventor further considers that each short video operated within the current time after the user logs in the short video playing application time at this time can be used as the first video set to obtain the real-time interest vector of the user, the calculation amount can be reduced, and the real-time interest vector of the user which can represent the current preference of the user can be determined.

The inventor further considers that when the short video recommendation model is trained, the short video feature vector, the sample user feature vector and the click label are not only required to be considered in the training sample set, but also the preference degree of the user for each exposed short video can be considered, and the accuracy of recommending the short video can be further improved.

After introducing the design idea of the embodiment of the present application, an application scenario of the short video recommendation method in the embodiment of the present application is described below.

Referring to fig. 1, the application scenario includes a terminal device 110, a client 120 installed in the terminal device 110, a server 130, and a database 140. Terminal equipment 110 is, for example, a cellular phone, a personal computer, or the like. The client 120 may be understood as a software module installed in the terminal device 110 or embedded in a third-party application, or may also be a web page version client accessed through a web page, and the client 120 in this embodiment of the present application generally refers to the client 120 where a user can view multimedia content, such as a news APP, a live APP, and the like. The server 130 may be a physical server or a virtual server. The server 130 may be a single server or a cluster of servers. Database 140 may be implemented by one or more storage devices, such as disks, etc. Database 140 may exist independently or may be part of server 130. Fig. 1 illustrates one terminal device 110, and the number of terminal devices 110 is not limited in practice.

For ease of understanding, the present application defines two concepts, a first short video set and a second short video set, where the first short video set refers to a set of short video information of all short videos operated by a user. The operations include clicking, commenting, sharing, praising and the like. The short video information is used for representing the characteristics of the short video, and the short video information includes, for example, a unique identifier of the short video, the total duration of the short video, whether the user has commented on the short video, whether the user has shared the short video, whether the user has clicked the short video, whether the user has approved the short video, a short video tag, a short video category and the like. The second set of short videos consists of short videos to be recommended, which may be generally understood as short video information of short videos that the user has not viewed, but may also include popular short videos that some users have operated.

Specifically, the short database 140 stores a large number of short videos, the server 130 may obtain a second short video set that is not watched by the user from the short database 140, and the server 130 is configured to determine, from the second short video set, a short video recommended by the user and send the short video to the client 120, so that the user can watch the short video conveniently. Of course, the user may also view the short video recommended by the server 130 by accessing the web browser.

The process of the server 130 recommending the short video for the user mainly comprises three parts: screening out candidate short videos, screening out selected short videos from the candidate short videos, and scattering and rearranging the selected short videos to obtain short videos recommended by a user. The portion of the candidate short video that is screened out is briefly described below with reference to fig. 2.

In the process that a user continuously clicks the short videos on the client 120, a log of the user is generated, the log comprises information such as identifiers of the short videos clicked by the user and the duration of playing each short video, the server 130 can obtain a first short video set viewed by the user according to the log, determine a real-time interest vector of the user, and input user vector features such as the real-time interest vector of the user, a user portrait and the like and the short video feature vector into a short video recommendation model, and the short video recommendation model screens out some candidate short videos from a large number of short videos. The short video recommendation model may be implemented based on ICF and UCF policies, etc.

For example, server 130 screens database 140 for 1000- > 2000 short videos associated with the user.

After the candidate short videos are screened out, a short video fine recommendation model can be trained based on the log of the user, and the short videos selected by the user can be obtained by inputting the short video fine recommendation model according to the real-time interest vector of the user and the like, and the CTR corresponding to the server 130 can be calculated.

For example, 1000-.

After obtaining the short video that is selected for the user. The selected short video is then re-ordered and broken up and displayed on the client 120.

Fig. 2 is an illustration of the server 130 determining recommended short videos, and in one possible embodiment, the server 130 may not need to perform the process of filtering out the refined short videos from the candidate short videos and re-thrashing the refined short videos for reordering.

In fig. 2, the multimedia content is illustrated as a short video, but the same is true for other types of multimedia content, and when the multimedia content is of other types, the processing procedure of the server 130 is the same, except that the content corresponding to the multimedia content is stored in the database 140. E.g., the multimedia content is music, then a large amount of music is stored in the database 140.

After the functions of each device in the application scenario in the embodiment of the present application are described, the structure of each device is described below.

The server 130 includes a processor 321, a memory 322, and an interface 323. The memory 322 stores program instructions, and the processor 321, when executing the program instructions in the memory 322, implements the functions of the server 130 discussed above, obtains a large number of short videos from the database 140, obtains short videos recommended for the user after processing, and can send the short videos to the client 120 in the terminal device 110 through the interface 323.

Terminal device 110 includes a processor 311, a memory 312, an interface 313, and a display panel 314. The memory 312 stores program instructions, and when the processor 311 executes the program instructions, the functions of the terminal device 110 discussed above are realized, and the terminal device 110 communicates with the server 130 through the interface 313. After receiving the recommended short video, the terminal device 110 displays the corresponding short video through the display panel 314.

After the description of the devices according to the embodiments of the present application, the following describes a short video recommendation method according to the embodiments of the present application. The short video recommendation process in the embodiment of the present application may be applied to the process of screening candidate short videos by the server 130 discussed above.

Referring to fig. 4, the short video recommendation method includes the following specific processes:

s410, obtaining a real-time interest vector of the user according to the first multimedia content set operated by the user in a set time period before the current time and the operation behavior aiming at each first multimedia content.

Specifically, in the process that the user watches the short video through the client 120, after a certain short video is played, or when the user may not be interested in the short video on the current short video display interface, the user may perform an operation of pulling the short video to request for updating the short video displayed on the current interface. The operation of pulling the short video is, for example, the user sliding the short video display interface, or the user pulling the slide-down frame, etc. For example, referring to fig. 5, fig. 5 is a schematic view of a short video display interface, where a user can slide the short video display interface, that is, the user performs an operation of sliding the short video.

After receiving the operation of pulling the short video by the user, the client 120 may generate a short video pulling request in combination with the identification information of the user, and send the short video pulling request to the server 130. After receiving the short video pull request, the server 130 obtains a log of the user according to the identification information of the user in the short video pull request, and obtains a first short video set previously operated by the user according to the log of the user. Similarly, the server 130 may obtain, according to the log of the user, an operation behavior of the user in a set time period, for example, a short video clicked by the user in the set time period, a short video shared by the content in the set time period, and the like.

S420, obtaining a user characteristic vector of the user, wherein the user characteristic vector comprises a real-time interest vector and a user attribute vector.

Specifically, after the server 130 acquires the first short video set, the server 130 may perform embedding learning on the first short video set to obtain an embedded vector of each short video, and obtain a real-time interest vector of the user based on the embedded vector of each short video and an operation behavior of the user in a set time period. The server 130 may perform embedded learning and the like on the user basic information to obtain a user attribute vector, which may be relatively fixed, so that the user attribute vector may be learned in advance and directly obtained during use, so that the server 130 obtains a user feature vector.

Some continuous features such as user age and sparse features such as short video tags, short video categories, etc. may be involved in the user feature vector. Continuous features can be directly input or subjected to embedded learning; for sparse features, the sparse features can be used as input after embedded learning.

And S430, obtaining short video feature vectors of each short video to be recommended in a second short video set consisting of the short videos to be recommended.

Specifically, the server 130 may obtain the second video set from the database 140, and the server 130 performs embedding learning on the second short video set to obtain each short video feature vector in the second short video set. And acquiring each short video feature vector. As an example, the short video feature vector may be periodically updated without real-time computation. For example, the updates are calculated every hour.

It should be noted that the execution sequence of S430 and S420 may be arbitrary, and is not specifically limited herein.

S440, determining the matching degree of the user characteristic vector and each short video characteristic vector in the second short video set through the trained short video recommendation model.

Specifically, after obtaining the user feature vectors and the short video feature vectors, the server 130 inputs the user feature vectors and the short video feature vectors into the trained short video recommendation model, and the short video feature vectors may be input into the short video recommendation model in a matrix form. And determining the matching degree of the user and each short video in the second short video set through the short video recommendation model so as to obtain a plurality of matching degrees.

And S450, recommending short videos, the matching degree of which meets the preset conditions, in the second short video set.

Specifically, after the server 130 obtains the multiple matching degrees, the server 130 may determine, from the multiple matching degrees, a matching degree that satisfies a preset condition, and determine, as the recommended video, the short video corresponding to the matching degree that satisfies the preset condition. The preset condition satisfies, for example, a preset matching degree or more.

In the embodiment of the application, when the recommended short video is determined, the real-time interest vector and the short video feature vector of the user are considered, and the real-time interest vector of the user can represent the preference of the user on the short video clicked before, so that the matching degree of the recommended short video determined based on the real-time interest vector and the user is higher, and the accuracy of short video recommendation is further improved. By the short video recommendation method related to the embodiment of the application, the CTR corresponding to the server 130 is improved by about 3%.

After the general idea of the embodiments of the present application is introduced, the steps in the embodiments of the present application will be described in detail below.

In S420, one way to determine the user feature vector is:

referring to fig. 6, the method includes S610, obtaining an embedded vector of each short video in the first short video set by performing embedding learning on the first short video set.

The embedded learning can refer to the content discussed in the foregoing, and is not described in detail here. It should be noted that, when the embedded vector of each short video in the first short video set in the department is obtained by taking the operation behavior data of the user on each video into consideration, the learned embedded vector represents the preference degree of the user on each video to some extent.

S620, according to the sequence of the short videos in the first short video set operated by the user, weighting the similarity between the embedded vector of each short video in the first short video set and the short video interest vector of the previous short video of the short video and the playing completion degree of the short video to obtain the short video interest vector of the short video until the short video interest vector of the last short video in the first short video set is obtained.

Specifically, according to the sequence of each short video operated by a user, the similarity between the embedded vector of each short video and the short video interest vector of the previous short video of the short video and the playing completion degree of the short video are determined and weighted, so that the video interest vector of each video in the multiple videos is obtained. The video interest vector may represent a user's preference for each short video.

As an example, a similarity S of an embedded vector of each short video to a short video interest vector of a previous short video of the short video is determined _n+1For example:

S _n+1＝cos(V _n，I _n+1) (1)

wherein, I _n+1An embedded vector, V, representing the n +1 th short video clicked on by the user _nAnd indicating the real and short video interest vectors corresponding to the nth click-to-play of the user.

After the similarity between the embedded vector of each short video and the interest vector of the short video of the previous short video of the short video is obtained, the real-time interest vector can be obtained according to the similarity and the playing completion degree of the user for the short video. The specific calculation formula is, for example:

W _n+1＝r _n+1*(1-α)*(1-β*S _n+1) (2)

V _n+1＝V _n*(1-W _n+1)+W _n+1*I _n+1(3)

wherein α is a hyperparameter for adjusting I _n+1β is a hyperparameter, a weight for adjusting similarity I _nAn embedded vector for representing the nth short video clicked by the user. r is _n+1Indicating the completion of the user clicking the (n + 1) th short video. W _n+1The weights α and β of the embedded vectors representing the (n + 1) th video can be selected by searching to have appropriate values.

S630, weighting the short video interest vector of each short video in the first short video set to obtain the real-time interest vector of the user.

Specifically, after each short video is played by the user, the short video has a corresponding short video interest vector, and the short video interest vectors of each short video in the first short video set are weighted to obtain the real-time interest vector of the user.

In the embodiment of the application, when the real-time interest vector of the user is determined, the relevance between the short video and the previous real-time interest vector and the playing completion degree of the short video are considered, the short video interest vector of the user to the short video can be described more accurately, the user interest can be described more accurately by the real-time interest vector of the user obtained through weighting, and the real-time interest vector of the user can be adjusted when the user interest changes rapidly.

In one possible embodiment, in S630, when weighting the short video interest vectors of the short videos in the first short video set, the closer each short video is to the current time, the greater the weight of the short video interest vector of the short video is.

Specifically, the closer the short video clicked by the user is to the current, the more the current preference of the user can be represented, so that in the process of determining the real-time interest vector of the user, the greater the weight of the real-time interest vector corresponding to the short video closer to the current time is set, the determined real-time interest vector of the user can better reflect the current interest of the user, and the accuracy of determining and recommending the short video in the later stage is further improved.

For example, with continued reference to fig. 7, the user clicks on short video 1, short video 2, short video 3, and short video 4 in sequence from the time the user logs in the short video playback application this time to the current time. The real-time interest vectors of the four short videos are sequentially V1, V2, V3 and V4, the weights of the four short videos are sequentially w1, w2, w3 and w4, and when the real-time interest vector of the user is calculated, the weight w4 with the larger weight of the short video interest vector corresponding to the short video 4 can be made the largest value because the short video 4 is closest to the current time.

In a possible embodiment, the first short video set is each short video that has been operated in the current time since the user logged in the short video playing application this time.

Specifically, the first short video set adopts each short video operated within the current time after the user logs in the short video playing application time this time, which not only can reduce the calculation amount of the server 130, but also can make the solved real-time interest vector of the user more accord with the current interest of the user.

In one possible embodiment, referring to fig. 8, the short video recommendation model in S430 may be obtained by training as follows:

and S810, acquiring a short video training sample set.

Specifically, the short video training sample set may refer to the content discussed above, and will not be described herein again. The server 130 may obtain the exposure short video corresponding to the user according to the short video recommended for the user. After the exposure short video is obtained, performing embedded learning on the exposure short video to obtain a real-time interest vector of the user for the user of the exposure short video set. And determining whether the user clicks the label or not according to the log of the user, and determining the value of the user for the clicked label of each short video in the exposed short video set. For example, if the user clicks the tab, the value of the tab clicked by the user for the short video is 1, the user does not click the tab, and the value of the tab clicked by the user for the short video is 0. And determining the value of the interactive label of the user aiming at each short video in the exposed short video according to the log of the user, weighting the value of each interactive label, and determining the preference weight of the user aiming at the exposed short video.

For example, the format of the short video training sample corresponding to each exposed short video is: label, weight, feature1, feature2, … feature N, label representing a click tag of the user for the exposed short video, weight representing a preference weight of the user for the exposed short video, feature1, feature2, … feature N representing other relevant features between the user and the exposed short video.

And S820, adjusting parameters of the short video recommendation model.

Specifically, the server 130 inputs the short video training sample set into a short video recommendation model, such as a DSSN model. After the short video recommendation model is input, a training result can be obtained, and a value of the preset loss function is determined based on the training result. And adjusting parameters of the short video recommendation model according to the value of the preset loss function.

And S830, until the preset loss function is converged, obtaining the trained short video recommendation model.

Specifically, when the preset loss function is converged, it is determined that the model training is completed, and parameters corresponding to the convergence of the preset loss function are determined as parameters of the short video recommendation model, so that the trained short video recommendation model is obtained.

In a possible embodiment, the preset loss function is obtained by weighting a cross entropy loss function and a regularization term, and a weight corresponding to the cross entropy loss function is a preference weight of a user for each of the exposed short videos.

Specifically, a formula of the preset loss function is as follows:

wherein y represents a sample label and y' represents a short videoRecommendation of model predictive labels, W _labelIndicating the user's preference weight for each exposed short video,

for the L2 regularization term, θ represents the parameters of the short video recommendation model. The sample label includes a click label of whether the user clicks the short video, and the like.

In the embodiment of the application, the preference weight of the exposed short video is added in the loss function, and can be according to W _labelAnd correcting the loss function to avoid the overfitting condition of the short video recommendation model.

In one possible embodiment, the server 130 may update the parameters of the short video recommendation model according to a preset time period.

Specifically, when the server 130 uses the short video recommendation model, if the short video recommendation model is updated in real time, the processing amount of the server 130 is large, so in order to relatively reduce the processing amount of the server 130, the server 130 may update the parameters of the short video recommendation model at intervals of a preset time period. The preset time period may be preset by the server 130, such as a day or an hour.

In one possible embodiment, when adjusting the preset loss function, a small-batch Gradient (MBGD) method may be used to adjust the parameters of the short video recommendation model.

In a possible embodiment, the preference weight of the user for each short video may be obtained by weighting values of multiple types of interaction tags.

Specifically, when a user watches the short video, there are various ways of interacting with the short video, such as praise, comment, share, and the like, the user interacts with the short video to show that the user is more interested in the short video, and the preference weight of the user on each short video can be more accurately shown by weighting according to various types of interaction tags.

For example, one formula for determining a preference weight is:

W _label＝m*log(t _play)+n*(b _like+b _share+b _comment) (5)

wherein, t _playIndicating the user's playing time duration, b _likeValue representing the like of a Takeban, b _shareValue representing the sharing tag, b _commentThe values of the comment tags are represented, and m and n represent hyper-parameters. The values of m and n can be obtained by a searching method.

When the user plays the short video, the user may click the short video, the user does not necessarily watch the short video during playing, and the user needs to actually perform operations on approval, comment, sharing and the like, so that in one possible embodiment, the value of m is smaller than n, and thus, the occupied weight of the approval tag, the comment tag and the sharing tag is larger, the obtained preference weight of the user is more in line with the real preference of the user, and the accuracy of model training is further improved.

As an embodiment, the value of the like label, the value of the share label, and the value of the comment label all include 1 and 0, where 1 indicates that the user has an operation corresponding to the type of interactive label, and 0 indicates that the user does not have an operation corresponding to the type of interactive label.

As an embodiment, the value of each type of interactive tag may be determined according to the number of times the user performs the operation.

Specifically, the number of times that the user approves the video is n, and then the value of the approval tag corresponding to the video is n. The number of times that the user reviews the video is n, and the value of the review tag corresponding to the video is n. The number of times that the user shares the video is n, and then the value of the sharing tag corresponding to the video is n. In the embodiment of the application, the specific operation times of the user can be measured by the value of the interactive label, and the quantification of the preference data of the user to the video is realized.

After obtaining the short video model, the processing may be performed according to the short video model, and the following process of specifically determining the matching degree between the user and the short video in S430 is illustrated.

Referring to fig. 9, the short video recommendation model in fig. 9 is a DSSN short video recommendation model based on MLP, and the server 130 inputs the user feature vector and the short video feature vector into the short video recommendation model after performing the embedding learning.

The user feature vector comprises a real-time interest vector of a user, a user portrait, a short video embedding vector mean value and the like, and the short video embedding vector means the mean value of the embedding vectors of all short videos in the first short video set. The short video feature vector comprises a short video category, a short video label, a short video click rate and the like.

After relu function processing in MLP, higher-order user characteristics and short video characteristics are obtained, and cosine similarity of the user characteristics and the short video characteristics is determined. And inputting the obtained cosine similarity into an activation function to obtain the matching degree of the user and the short video.

As an example, MLP may employ three layers of results, including 512, 256, and 128 neurons, respectively, each employing relu as an activation function. After the user characteristic vector or the short video characteristic vector is subjected to cross calculation of MLP, a user vector with the length of 64 dimensions and a 64-dimensional short video vector are obtained respectively.

After obtaining the recommended video, the server 130 may send the video directly to the client 120, or may send the video to the client 120 after re-screening, etc.

Although the embodiment is described in detail by taking recommendation of a short video as an example, according to the disclosure, a person skilled in the art can use the design idea provided by the embodiment of the present application in recommendation of other multimedia contents, and for any multimedia content, a real-time interest vector of a user can be obtained according to a first multimedia content set operated by the user in a set time period before the current time, and each component in the real-time interest vector is used for representing the preference degree of the user on each multimedia content in the first multimedia content set, so as to obtain a user feature vector of the user, where the user feature vector includes the real-time interest vector and a user attribute vector; obtaining the multimedia content characteristic vector of each multimedia content to be recommended in a second multimedia content set consisting of the multimedia contents to be recommended; then, determining the matching degree of the user characteristic vector and the multimedia content characteristic vector of each multimedia content to be recommended through the trained multimedia content recommendation model; the multimedia content recommendation model is obtained by training according to a multimedia content training sample set; and recommending the multimedia contents with the matching degree meeting the preset condition in the second multimedia content set.

Based on the same inventive concept, an embodiment of the present application provides a multimedia content recommendation apparatus, please refer to fig. 10, where the apparatus 1000 includes:

an obtaining module 1001, configured to obtain a real-time interest vector of a user according to a first multimedia content set operated by the user within a set time period before a current time and an operation behavior for each first multimedia content, where each component in the real-time interest vector is used to represent a preference degree of the user for multimedia content within the set time period, and obtain a user feature vector of the user, where the user feature vector includes the real-time interest vector and a user attribute vector, and obtain a multimedia content feature vector of each to-be-recommended multimedia content in a second multimedia content set composed of to-be-recommended multimedia contents;

a determining module 1002, configured to determine, through a trained multimedia content recommendation model, a matching degree between a user feature vector and a multimedia content feature vector of each to-be-recommended multimedia content; the multimedia content recommendation model is obtained by training according to a multimedia content training sample set;

the recommending module 1003 is configured to recommend the multimedia content of which the matching degree meets a preset condition in the second multimedia content set.

In a possible embodiment, the obtaining module 1001 is specifically configured to:

embedding learning is carried out on the first multimedia content set to obtain an embedded vector of each multimedia content in the first multimedia content set;

according to the sequence of operating each multimedia content in the first multimedia content set by a user, weighting the similarity of the embedded vector of each multimedia content in the first multimedia content set and the multimedia content interest vector of the previous multimedia content of the multimedia content and the playing completion degree of the multimedia content to obtain the multimedia content interest vector of the multimedia content until the multimedia content interest vector of the last multimedia content in the first multimedia content set is obtained;

In one possible embodiment, the closer each multimedia content in the first set of multimedia content is to the current time, the greater the weight of the multimedia content interest vector for that multimedia content.

In a possible embodiment, the first multimedia content set operated within the set time period before the current time includes each multimedia content operated within the current time after the time when the user logs in the multimedia playing application this time.

In a possible embodiment, the apparatus further comprises a training module 1004, the training module 1004 is specifically configured to:

acquiring a multimedia content training sample set; each multimedia content training sample in the multimedia content training sample set comprises a sample user characteristic vector and each multimedia content characteristic vector in the exposure multimedia content set, wherein the sample user characteristic vector comprises a real-time interest vector of a user aiming at the exposure multimedia content set, a value of a click label of each multimedia content characteristic in the exposure multimedia content set, and a preference weight of the user aiming at each exposure multimedia content in the exposure multimedia content set;

and training the multimedia content recommendation model according to the multimedia content training sample set until the preset loss function is converged, and obtaining the trained multimedia content recommendation model.

In a possible embodiment, the preset loss function is obtained by weighting a cross entropy loss function and a regularization term, and a weight corresponding to the cross entropy loss function is a preference weight of a user for each exposure multimedia content in the exposure multimedia content set.

In one possible implementation, the preference weight of the user for each multimedia content is obtained by weighting values of multiple types of interaction tags for the multimedia content.

As one example, the training module 1004 is an optional module.

Based on the same inventive concept, an embodiment of the present application provides a computer device 1100, please refer to fig. 11, where the computer device 1100 includes a processor 1101 and a memory 1102.

A memory 1102 for storing computer programs for execution by the processor 1101. The memory 1102 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like.

The processor 1102 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 1102 and the processor 1101 is not limited in the embodiment of the present application. In the embodiment of the present application, the memory 1102 and the processor 1101 are connected by a bus 1103 in fig. 1, the bus 1103 is shown by a thick line in fig. 11, and the connection manner between other components is merely illustrative and not limited thereto. The bus 1103 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

Memory 1102 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 1102 may also be a non-volatile memory (non-volatile) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or the memory 1102 may be any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Memory 1102 may be a combination of the memories described above.

A processor 1101 for executing the methods involved by the devices in the embodiments shown in fig. 4 to 9 when calling the computer program stored in the memory 1102.

Based on the same inventive concept, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed on a computer, cause the computer to perform a short video recommendation method as discussed above with reference to fig. 4-9.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for recommending multimedia contents, comprising:

2. The method of claim 1, wherein obtaining the real-time interest vector of the user based on a first set of multimedia content operated by the user within a set time period before a current time comprises:

3. The method of claim 2, wherein the closer each multimedia content in the first set of multimedia content is to the current time, the greater the weight of the multimedia content interest vector for that multimedia content.

4. The method as claimed in claim 2, wherein the first set of multimedia contents operated within a set time period before the current time includes each multimedia content operated within the current time from the time when the user logs in the multimedia playing application this time.

5. The method of claim 1, wherein the trained multimedia content recommendation model is trained by:

6. The method of claim 5, wherein the pre-set loss function is a cross-entropy loss function that is weighted with a regularizer, the weight corresponding to the cross-entropy loss function being a preference weight of the user for each exposed multimedia content in the set of exposed multimedia content.

7. The method of claim 5, wherein the preference weight of the user for each multimedia content is obtained by weighting values of a plurality of types of interactive labels for the multimedia content.

8. A method according to any of claims 1 to 7, wherein the multimedia content comprises short video.

9. A multimedia content recommendation apparatus, comprising:

the system comprises an obtaining module, a recommending module and a recommending module, wherein the obtaining module is used for obtaining real-time interest vectors of users according to a first multimedia content set operated by the users in a set time period before the current time and operation behaviors aiming at each first multimedia content, each component in the real-time interest vectors is used for expressing the preference degree of the users to the multimedia content in the set time period, obtaining user characteristic vectors of the users, the user characteristic vectors comprise the real-time interest vectors and user attribute vectors, and obtaining the multimedia content characteristic vectors of each multimedia content to be recommended in a second multimedia content set consisting of the multimedia contents to be recommended;

10. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-8.