CN111680217B

CN111680217B - Content recommendation method, device, equipment and storage medium

Info

Publication number: CN111680217B
Application number: CN202010460183.0A
Authority: CN
Inventors: 张晗
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2022-10-14
Anticipated expiration: 2040-05-27
Also published as: CN111680217A

Abstract

The application discloses a content recommendation method, a content recommendation device, content recommendation equipment and a storage medium, and relates to the technical field of artificial intelligence machine learning. The method comprises the following steps: acquiring behavior history information and user basic information of a user account, wherein the behavior history information is used for indicating a content sequence viewed by a user history and labels and classifications of the content, and the user basic information is used for indicating user personalized features; generating a user vector according to the behavior history information and the basic user information; calculating the similarity between the user vector and the content vector of the candidate content; and selecting recommended contents provided for the user account from the candidate contents based on the similarity. According to the method and the device, the accuracy of the recommended content provided for the user can be improved.

Description

Content recommendation method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of machine learning in the artificial intelligence technology, in particular to a content recommendation method, device, equipment and storage medium.

Background

The excellent content recommendation scheme can provide a user with contents of interest, thereby improving a click rate.

In the related art, a content recommendation method based on a word vector of content is provided. By acquiring ID (Identity) information of contents which are viewed by a user history, word vectors corresponding to the ID information are calculated, and then similar contents are calculated off line according to the word vectors. Then, online recommendation is performed based on similar content obtained offline.

However, the above content recommendation method provides the user with a recommended content with low accuracy.

Disclosure of Invention

The embodiment of the application provides a content recommendation method, a content recommendation device and a storage medium, which can improve the accuracy of recommended content provided for a user. The technical scheme is as follows:

in one aspect, an embodiment of the present application provides a content recommendation method, where the method includes:

acquiring behavior history information and user basic information of a user account, wherein the behavior history information is used for indicating a content sequence viewed by a user history and labels and classifications of the content, and the user basic information is used for indicating user personalized features;

generating a user vector according to the behavior history information and the user basic information;

calculating similarity between the user vector and a content vector of candidate content;

and selecting recommended content provided to the user account from the candidate content based on the similarity.

In another aspect, an embodiment of the present application provides a method for training a content recommendation model, where the method includes:

acquiring a user behavior log which is a log generated based on the content viewing behavior of the user history;

generating a training sample based on the user behavior log, wherein the training sample comprises behavior history information, user basic information, positive examples and negative examples; the behavior history information is used for indicating the content sequence viewed by the user history and the label and the classification of the content, and the user basic information is used for indicating the personalized features of the user;

training a content recommendation model by using the training samples; wherein the input data of the content recommendation model includes the behavior history information and the user basic information.

In another aspect, an embodiment of the present application provides a content recommendation apparatus, where the apparatus includes:

the information acquisition module is used for acquiring behavior history information and user basic information of a user account, wherein the behavior history information is used for indicating a content sequence viewed by a user history and labels and classifications of the content, and the user basic information is used for indicating user personalized features;

the vector generation module is used for generating a user vector according to the behavior history information and the user basic information;

the similarity calculation module is used for calculating the similarity between the user vector and the content vector of the candidate content;

and the content recommending module is used for selecting recommended content provided for the user account from the candidate content based on the similarity.

In another aspect, an embodiment of the present application provides a device for training a content recommendation model, where the device includes:

the log acquisition module is used for acquiring a user behavior log which is generated based on the content viewing behavior of the user history;

the sample generating module is used for generating a training sample based on the user behavior log, wherein the training sample comprises behavior history information, user basic information, positive examples and negative examples; the behavior history information is used for indicating the content sequence viewed by the user history and the label and the classification of the content, and the user basic information is used for indicating the personalized features of the user;

the model training module is used for training a content recommendation model by adopting the training samples; wherein the input data of the content recommendation model includes the behavior history information and the user basic information.

In still another aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the content recommendation method or implement a training method for the content recommendation model.

In yet another aspect, an embodiment of the present application provides a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the above content recommendation method or implement the above training method for a content recommendation model.

In a further aspect, the present application provides a computer program product, which when run on a computer device, causes the computer device to execute the content recommendation method or implement the content recommendation model training method.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects:

generating a user vector by encoding based on behavior history information and user basic information of a user account, and selecting recommended content provided for the user account based on similarity between the user vector and a content vector; the behavior history information comprises the content sequence viewed by the user history, and also comprises the label and the classification of the content viewed by the user history, so that the content sequence viewed by the user history, the label and the classification of the content and the user basic information which are comprehensively used generate a user vector for representing the user portrait, the user vector can more accurately and completely reflect the interest tendency of the user to the content, and the accuracy of content recommendation is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an environment for implementing an embodiment provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario interface provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of the overall architecture of the solution provided by one embodiment of the present application;

FIG. 4 is a flowchart of a training method of a content recommendation model according to an embodiment of the present application;

FIG. 5 is a diagram of a content recommendation model provided by one embodiment of the present application;

FIG. 6 is a flow chart of a method for training a content recommendation model according to another embodiment of the present application;

FIG. 7 is a schematic diagram illustrating alignment of a set of experimental data;

FIG. 8 is a flowchart of a content recommendation method provided by an embodiment of the present application;

FIG. 9 is a diagram of a content recommendation process provided by one embodiment of the present application;

fig. 10 is a block diagram of a content recommendation device provided in an embodiment of the present application;

FIG. 11 is a block diagram of a training apparatus for a content recommendation model according to an embodiment of the present application;

fig. 12 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Refer to fig. 1, which illustrates a schematic diagram of an environment for implementing an embodiment of the present application. The embodiment implementation environment can be implemented as a content recommendation system. The embodiment implementation environment may include: a terminal 10 and a server 20.

The terminal 10 may be an electronic device such as a mobile phone, a tablet Computer, a multimedia playing device, a wearable device, a PC (Personal Computer), and the like. A client running an application that can provide recommended content to a user for viewing by the user may be installed in the terminal 10. In the embodiment of the present application, the type of the application is not limited, and may be a social application, an instant messaging application, a video application, a news information application, a music application, a shopping application, or the like.

The server 20 may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center. Server 20 may be a backend server for the application described above to provide backend services for the application.

The terminal 10 and the server 20 can communicate with each other through a network.

In the content recommendation method and the training method of the content recommendation model provided in the embodiment of the present application, the execution subject of each step may be the server 20, or may be the terminal 10 (such as a client of an application program running in the terminal 10), or may be executed by the interactive cooperation between the terminal 10 and the server 20. For convenience of explanation, in the following method embodiments, only the execution subject of each step is described as a computer device, but the present invention is not limited thereto.

AI (Artificial Intelligence) is a theory, method, technique and application system that simulates, extends and expands human Intelligence, senses the environment, acquires knowledge and uses knowledge to obtain the best results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

ML (Machine Learning) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the application relates to an artificial intelligence machine learning technology, and provides a content recommendation method, wherein a user vector is generated by encoding behavior history information and user basic information based on a user account, and recommended content provided for the user account is selected based on the similarity between the user vector and the content vector; the behavior history information comprises the content sequence viewed by the user history, and also comprises the label and the classification of the content viewed by the user history, so that the content sequence viewed by the user history, the label and the classification of the content and the user basic information which are comprehensively used generate a user vector for representing the user portrait, the user vector can more accurately and completely reflect the interest tendency of the user to the content, and the accuracy of content recommendation is improved.

In the embodiment of the present application, "content" refers to electronic information provided to a user, and can be viewed by the user. Optionally, the content includes, but is not limited to, at least one of: videos, articles, pictures, textual content, music, merchandise, applications, and the like.

In one exemplary scenario, as shown in fig. 2, a client of an application provides various items of recommended content 21 to a user in a user interface 20, and a user clicking on a certain item of recommended content 21 may enter a corresponding content detail interface 22 for viewing.

As shown in fig. 3, the technical solution provided by the embodiment of the present application mainly includes an offline part and an online part.

The off-line part mainly comprises three parts of user behavior collection, user portrait calculation and content recommendation model training. The user behavior collection is mainly used for collecting user behavior logs, and the user behavior logs are logs generated based on content viewing behaviors of user history; the user portrait calculation is mainly used for calculating the interests of the user in different dimensions such as labels and classifications according to user behaviors to obtain the user portrait; the content recommendation model training is mainly used for feature extraction and model training according to user behaviors and user figures.

The online part mainly comprises three parts of candidate recalls, ranking scores and diversity display. The candidate recalls are mainly used for recalling the content according to the user behaviors and the user portrait; the ranking score is mainly used for extracting features and calculating scores according to the model trained offline; the diversity display is mainly used for displaying recommended contents by combining a diversity model on the basis of ranking and scoring.

The technical solution of the present application will be described below by means of several embodiments.

Referring to fig. 4, a flowchart of a training method of a content recommendation model provided in an embodiment of the present application is shown, where the method may include the following steps (401 to 403):

step 401, a user behavior log is obtained, where the user behavior log is a log generated based on content viewing behavior of a user history.

For example, different logs may be generated for different user behaviors, and by combining these logs, a user behavior log may be generated. For example, user behavior may include viewing (e.g., clicking or playing, etc.), presentation, etc. behavior. The viewing behavior may correspond to a viewing log (or referred to as a click log) for recording the content historically viewed by the user. The presentation behavior may correspond to a presentation log (or referred to as an exposure log) for recording the contents of the historical presentation to the user. It should be noted that the content displayed to the user is not necessarily viewed by the user, for example, 10 contents are displayed to the user in one refresh request, and the user may select the content of interest to view, and the other content of no interest to view.

Optionally, log consolidation is performed at a certain period (e.g., in days). In the process of providing online service for users, user behaviors can be acquired in real time, and then the user behaviors are recorded and stored according to a certain rule (for example, according to a time descending order) to form a log. In addition, when storing the user behavior, the label and the classification of the corresponding content may be stored together. Therefore, the training data can be generated by directly using the on-line landing behavior history in the off-line part, and inaccuracy caused by splicing the user historical behaviors by the off-line part according to the behavior log is avoided.

Optionally, one viewing log includes information related to the one-time viewing behavior of the user, such as information including an identifier, a tag, a category, and the like of the content viewed by the user. One display log contains relevant information of one refresh request, such as identification, label, classification and the like of each content displayed to a user by one refresh request. In addition, the display log and the view log corresponding to the same refresh request may contain the same serial number, and the serial number is used to play a role of unique identification for the refresh request. Therefore, the viewed and presented content information related to one refresh request can be gathered together through the serial number.

Step 402, generating a training sample based on the user behavior log, wherein the training sample comprises behavior history information, user basic information, positive examples and negative examples; the behavior history information is used for indicating the content sequence viewed by the user history and the label and the classification of the content, and the user basic information is used for indicating the user personalized features.

The sequence of content historically viewed by the user includes an identification of the respective content historically viewed by the user. The number of contents that can be included in the content sequence may set a corresponding maximum value, such as 100. The content tag refers to some keywords corresponding to the content, such as a certain company name, a certain person name, a certain team name, and the like, and the tag may be different for different categories of content in different fields, which is not limited in this embodiment of the present application. The classification of the content refers to the category of the content, such as different categories including information, entertainment, sports, and the like, and similarly, the classification of the content may be different for different fields, which is not limited in the embodiments of the present application. It follows that the behavior history information may include content sequence information, content tag information, and content classification information; the content sequence information is used for indicating content sequences viewed by the user in history, the content label information is used for indicating labels of the content viewed by the user in history, and the content classification information is used for indicating classifications of the content viewed by the user in history.

The user basic information may include the user age, the user gender, the user area, the user occupation, and the like, which can reflect the user personalized features. In the embodiment of the present application, the basic information of the user mainly includes the age and the sex of the user, and in other possible embodiments, the basic information of the user may further include other information for reflecting the personalized features formed by the user for a long time, which is not limited in the embodiment of the present application.

Positive examples (or called positive examples) refer to content viewed by a user, and negative examples (or called negative examples) refer to content not viewed by the user. One training sample may include one or more positive examples and may also include one or more negative examples. Considering that the content recommendation model trained in the embodiment of the present application is used to screen out recommended content that is of interest to the user from a large amount of content, the number of negative examples may be greater than the number of positive examples, for example, one training sample may include 1 positive example and 500 negative examples.

Optionally, negative examples include at least one of: and displaying the unviewed content and the content randomly selected from the content resource pool. Wherein, the content which is not viewed is displayed to the user in the online stage, but the user does not click to view the content. Randomly selected content from the content resource pool is used to supplement the number of negative examples.

Optionally, the user behavior log is subjected to data processing by a distributed computing platform (such as a Hadoop computing platform) to obtain data meeting the format requirement of the training sample. Data processing may include two phases, map and reduce.

In the map stage, inputting a user behavior log, such as a viewing log and a display log; when the click log is analyzed, information such as a serial number, a user account number and an Identifier (ID) of the checked content of the click log can be acquired, and then the information such as the identifier of the checked content in the click log is output by taking the serial number as a key (a main key); when the display log is analyzed, the serial number, the user account number, the identifier, the label and the classification of each displayed content, and the user basic information such as the user gender and the user age of the display log can be acquired, and then the information such as the identifier, the user account number and the user basic information of the displayed content in the display log is output by taking the serial number as a key. And in the map stage, outputting corresponding information recorded in the user behavior log by taking the serial number as a key.

In the reduce phase, behavior history information (such as behaviors including viewing, showing and the like) corresponding to one-time refresh request can be aggregated together. Optionally, the method includes the steps of firstly obtaining an identifier of viewed content corresponding to a refresh request, a corresponding label and a corresponding classification, an identifier set of display content, a user account, and user basic information such as user gender and user age, arranging behavior history information corresponding to each refresh request according to a descending order of time, discarding the behavior history information corresponding to the refresh request if the behavior history information corresponding to a refresh request does not include the viewed content, and generating an initial training sample if the behavior history information corresponding to a refresh request includes the viewed content. The initial training samples include behavior history information, user basic information, positive examples, and negative examples. If the behavior history information comprises the content which is not viewed in the display, initializing negative examples of the training samples to comprise the content which is not viewed in the display; if the behavior history information does not include content that shows no view, then the negative case of the training sample is initialized to null.

In the initial training sample generated through the above process, the negative example only includes the content that is not viewed, and this negative example selection mode is suitable for the click rate model, because the click rate model solves the problem of selecting the most relevant content from the content set that the user is interested in. The content recommendation model related to the embodiment of the application belongs to a recall model, the recall model is used for selecting content which is possibly interested by a user from the whole content resource pool, a large amount of completely irrelevant content exists in the content resource pool, and if a negative example only comprises content which is not viewed, the recall model is not applicable and can influence the final performance of the model.

Based on the above consideration, in the negative examples of each training sample, in addition to showing the content that is not viewed, the application may randomly select several contents in the content resource pool as the negative examples. Thus, negative examples in the final determined training sample may include showing unviewed content and randomly selected content from the content asset pool.

In addition, the number of negative examples can be determined experimentally. For example, 4 sets of experiments are set, and the number of negative cases corresponding to each set of experiments is 100, 300, 500, and 1000, respectively. It can be concluded from the recall ratio of the content recommendation model obtained by the final training that 100 and 300 have obvious discount compared with 500, and 1000 has slight improvement compared with 500, so that it is more appropriate to select a training sample including 500 negative examples by comprehensively considering the model performance and the training time consumption.

In addition, the corpus may be selected from a corpus in a recent period (e.g., the last 3 days), and the test corpus may be selected from a set number of samples (e.g., 1 ten thousand samples) in the recent period. And generating a training sample by adopting the training corpora, and generating a testing sample by adopting the testing corpora. And training the content recommendation model by adopting the training samples, wherein the content recommendation model obtained by training can adopt the test samples to evaluate the recall rate.

Step 403, training the content recommendation model by using the training samples; wherein, the input data of the content recommendation model comprises behavior history information and user basic information.

Optionally, the input data of the content recommendation model includes a word vector corresponding to the behavior history information and a word vector corresponding to the user basic information. The word vector corresponding to the history information and the word vector corresponding to the basic information of the user may be obtained by table lookup, or may be obtained by some other word vector generation algorithms, which is not limited in the embodiment of the present application.

Optionally, after a training sample is taken, the information of content sequence information, content label information, content classification information, user basic information, positive examples, negative examples and the like contained in the training sample is firstly analyzed. The content sequence information, the content label information, the content classification information and the user basic information respectively obtain corresponding word vectors (embedding). Illustratively, the word vector of the Identification (ID) of the content is 64 dimensions, the word vector of the label of the content is 48 dimensions, the word vector of the classification of the content is 48 dimensions (wherein, the classification of the content may include 3 classes such as a first class, a second class and a third class, the word vector of each class is 16 dimensions, and the 3 classes are concatenated together to be 48 dimensions), the user basic information includes the user gender and the user age, and the word vectors of the user gender and the user age are 8 dimensions.

In an exemplary embodiment, as shown in fig. 5, the content recommendation model 50 includes an encoder network 51, a convergence network 52, and a classifier 53. The encoder network 51 is configured to encode input data (including behavior history information and user basic information) of a model to generate a corresponding encoding vector. The encoder network 51 may encode the content sequence information, the content tag information, the content classification information, and the user basic information, respectively, to generate corresponding encoding vectors. The fusion network 52 is configured to perform fusion processing on each of the encoded vectors generated by the encoder network 51 to generate a user vector. The classifier 53 is configured to output a corresponding classification result based on the user vector.

In an exemplary embodiment, as shown in fig. 6, step 403 may include several sub-steps (4031-4036) as follows:

step 4031, generate the corresponding position coding information according to the content sequence information; the content sequence information is used for indicating the content sequence viewed by the user in history, and the position coding information is used for indicating the positions of the contents contained in the content sequence and the relative distance between the contents.

In the embodiment of the application, in order to increase the distinguishability of each content at different positions, an additional position encoding (positional encoding) information is added to the content sequence viewed by the user history, the dimension of the word vector corresponding to the position encoding information can be the same as the dimension of the word vector corresponding to the content sequence information, and the position of the content contained in the content sequence viewed by the user history and the relative distance between the content are depicted.

Illustratively, a computation method based on a trigonometric function in a Transformer structure is adopted to compute a word vector corresponding to the position coding information:

where pos represents the position of the content in the content sequence, i represents the index (index) of each value in the word vector corresponding to the content sequence information, and d represents the index of each value in the word vector corresponding to the content sequence information _model And the dimension number of the word vector corresponding to the content sequence information is represented. It can be seen that sinusoidal coding is used in even positions and pre-coding is used in odd positions.

In the embodiment of the application, the position coding information corresponding to the content sequence information is generated to capture the time sequence information of each content in the content sequence viewed by the user history, so that more useful input data are provided for the model, and the model performance is improved.

In addition, assuming that the content sequence indicated by the content sequence information includes N contents (N is a positive integer), and the word vector corresponding to the identifier of each content is 64-dimensional, the word vector corresponding to the content sequence information includes N64-dimensional vectors, and a 64-dimensional bias (bias) may be added to the 64-dimensional word vector corresponding to the identifier of each content, and all the contents may share the same group of bias.

4032, respectively encoding the content sequence information, the content label information, the content classification information and the user basic information through an encoder network to generate corresponding encoding vectors; and the content sequence information, the content label information and the content classification information are fused with the position coding information and then are coded.

In the embodiment of the application, by encoding the content sequence information, the content tag information, the content classification information and the user basic information respectively, different encoding structures can be adopted for processing different information, for example, the information with higher importance can adopt an encoding structure with a relatively complex structure, and the information with lower importance can adopt an encoding structure with a relatively simple structure, so that the flexibility and the emphasis degree during encoding are improved. Of course, in some other embodiments, the content sequence information, the content tag information, the content classification information, and the user basic information may also be encoded in a merged manner, which is not limited in this embodiment of the present application.

Alternatively, as shown in fig. 5, the encoder network 51 includes a content sequence processing unit 511, a content tag processing unit 512, and a content classification processing unit 513. The content sequence processing unit 511 is configured to encode content sequence information, the content tag processing unit 512 is configured to encode content tag information, and the content classification processing unit 513 is configured to encode content classification information.

Alternatively, as shown in fig. 5, the content sequence processing unit 511 includes a multi-head attention layer, a first post-processing layer, a first feedforward neural network layer, and a second post-processing layer. The coding vector corresponding to the content sequence information can be obtained by the following steps:

1. processing the fusion vector of the content sequence information and the position coding information through a multi-head attention layer;

2. performing residual error connection and layer normalization processing on output vectors of the multi-head attention layer through a first post-processing layer;

3. processing the output vector of the first post-processing layer through a first feedforward neural network layer;

4. and performing residual connection and layer normalization processing on the output vector of the first feedforward neural network layer through a second post-processing layer to obtain a coding vector corresponding to the content sequence information.

The Multi-Head Self-Attention layer is mainly composed of two parts, self-Attention and Multi-Head. The application considers that the identifications of the contents viewed by the user are related to each other, and the related relation is obtained through an attention mechanism. Specifically, through the Self-attention mechanism, the word vector of the content identifier is first converted into 192 dimensions through a linear change of a matrix W, namelyFor each content in the content sequence, 64-dimensional query, key and value are obtained respectively, and the query, key and value are obtained by performing linear change on word vectors identified by the content, wherein the query is normalized according to the vector dimensions (for example, by dividing the vector dimensions by a root number). When generating the first position output, the query of the first position needs to be used ₁ And keys (including keys) for all positions ₁ 、key ₂ 、…、key _n ) Operation, it is obvious that the weight calculated by the query of each position and its own key is the maximum, here we can use softmax to normalize, that is, the attention weight of the first position to each position can be obtained, the obtained attention weight is used to perform weighted summation on the value of each position to obtain the final output, and the other positions are the same, this is the Self-attention mechanism, and the matrix operation is described as:

wherein Q, K and V respectively represent query, key and value, and d represents the dimension number of the word vector.

The above-described calculation process is a single-headed Self-Attention mechanism, and the present application uses a Multi-Head Attention mechanism in a transform, that is, a matrix that only initializes one set of query, key, and value, but initializes multiple sets, specifically, the present application may adopt a 4-set Multi-headed mechanism, and split a word vector matrix corresponding to an identifier in an input content sequence into 4 parts, and respectively perform Self-Attention calculation, that is, split a 64-dimensional vector into 4 16-dimensional vectors, respectively calculate the 4 parts to obtain the respective 16-dimensional vectors, and finally combine the 4 parts together according to a sequence to obtain the 64-dimensional vector:

MultiHead(Q,K,V)＝Concat(head ₁ ,head ₂ ,head ₃ ,head ₄ )；

wherein the head _i And (4) representing the result of Self-attention calculation of the ith part, wherein i is a positive integer.

The choice of the number of headers for the Multi-Head Self-orientation mechanism can be determined experimentally. For example, in several cases of 1, 2, 3 and 4 experiments respectively, 4 experiments can obtain 0.5% improvement in the evaluation of offline recall rate compared with 1, so that 4 experiments are adopted in the present application as the final experimental configuration. The Multi-Head Self-orientation mechanism can enable each position to capture the information of the whole sequence; any two positions have direct paths, so that the problem of long-distance dependence in samples with long viewing history is solved; features of different dimensions can be learned; and, it is easy to calculate in parallel, faster than RNN (Recurrent Neural Network) training speed.

The first post-processing Layer is used for performing residual error connection and Layer Normalization (LN) processing on output vectors of the multi-head attention Layer. The residual error can obtain good feedback gradient theoretically, and the use of LN can accelerate the convergence speed of the model. LN is a way to normalize the data, calculating the mean and variance at each sample:

wherein α and β are learning parameters of LN, corresponding to scale factor (scale) and bias (bias), respectively; x is the number of _i Representing the ith element in the feature map, wherein i is a positive integer; mu.s _L Denotes the mean value, σ _L The variance is shown, and ε is the set value.

The first Feed-Forward (Feed Forward) Neural Network layer may include several DNN (Deep Neural Network) layers. For example, a first feed-forward neural network layer may include 2 DNN layers, the first DNN layer having a dimension of 256 and the second DNN layer having a dimension that is consistent with the dimension of the content-identified word vector of 64. Specifically, the structure of the first feedforward neural network layer may be y = max (0,xw) ₁ +b ₁ )W ₂ +b ₂ Which includes two linear variantsAnd replacing a Recirculation Units (Linear rectifier layers) activation output, namely the DNN, the structure of the ReLU and the DNN. Wherein, b ₁ And b ₂ Representing the parameters of the linear transformation.

And the second post-processing layer is used for performing residual connection and layer normalization processing on the output vector of the first feedforward neural network layer to obtain a coding vector corresponding to the content sequence information.

In the embodiment of the application, the fusion vector of the content sequence information and the position coding information passes through a plurality of (for example, 6) layers of multi-head attention layers and a feedforward neural network layer to obtain the coding vector corresponding to the content sequence information, and the multi-head attention layers and the feedforward neural network layer are stacked for use, so that the precision degree of user behavior history depiction can be increased.

If only the content sequence information is used as the characteristics, the problem of long tail exists, and a plurality of word vectors with low-frequency content identification cannot be fully learned, so that the effect is influenced. In order to solve the problems, content tag information and content classification information are added in addition to content sequence information, in order to reduce time consumption of training and prediction, a feed-forward neural network layer is used when the content tag information and the content classification information are processed, and compared with a multi-head attention layer, the calculation amount is greatly reduced.

Optionally, as shown in fig. 5, the content tag processing unit 512 includes a second feedforward neural network layer and a third post-processing layer. The encoding vector corresponding to the content tag information can be obtained through the following steps:

1. processing the fusion vector of the content label information and the position coding information through a second feedforward neural network layer;

2. and performing residual connection and layer normalization processing on the output vector of the second feedforward neural network layer through a third post-processing layer to obtain a coding vector corresponding to the content tag information.

Still assume that the content sequence includes N contents, each content may correspond to one or more tags, the table lookup obtains the word vector of each tag, and when a certain content corresponds to multiple tags, the word vector of each tag may be obtained by performing word direction on each tagThe dimension is fixed at 48 dimensions by means of quantity averaging, at this time, N word vectors with 48 dimensions can be obtained, 48-dimensional word vectors with 48 dimensions are added with 48-dimensional bias (bias) for each content label, and all the labels can share the same group of bias. Secondly, in order to increase the distinguishability of the content tag at different positions, additional position encoding information can also be added to the content tag. The calculation process of the position coding information may adopt the above-described trigonometric function-based calculation method in the transform structure, and details thereof are not repeated herein. Then, the tag word vectors of the N contents are averaged (i.e., dimension reduction) to obtain a 48-dimensional vector of the history tag word vectors. Then the historical label word vector passes through a plurality of (such as 3) layers of feedforward neural network layers, and residual connection and layer normalization processing exist at the input and the output of each sub-layer. Additionally, the second feedforward neural network layer may include several DNN layers. For example, the second feed-forward neural network layer may include 2 DNN layers, the first DNN layer having a dimension of 192 and the second DNN layer having a dimension that is consistent with the dimension of the word vector of the content tag of 48. In particular, the structure of the second feedforward neural network layer may be y = max (0,xw) ₁ +b ₁ )W ₂ +b ₂ It includes two linear transformations and a ReLU activation output, i.e., a structure that is DNN, reLU, and DNN. Wherein, b ₁ And b ₂ Representing the parameters of the linear transformation.

Alternatively, as shown in fig. 5, the content classification processing unit 513 includes a third feedforward neural network layer and a fourth post-processing layer. The encoding vector corresponding to the content classification information can be obtained by the following steps:

1. processing the fusion vector of the content classification information and the position coding information through a third feedforward neural network layer;

2. and performing residual connection and layer normalization processing on the output vector of the third feedforward neural network layer through a fourth post-processing layer to obtain a coding vector corresponding to the content classification information.

Still assume that the content sequence includes N contents, each content may correspond to one or more levels of classification, and the table lookup obtains the word vectors of each level of classification respectively. For example, each content corresponds to a first levelThe method comprises the steps of class classification, secondary classification and tertiary classification, a table look-up is used for respectively obtaining word vectors of each class classification, the word vectors are respectively 16-dimensional, and the word vectors of 3 classes of contents are connected together in a series connection mode to form a 48-dimensional word vector. Since the number of contents is N, N48-dimensional word vectors can be obtained, and the 48-dimensional word vectors plus a 48-dimensional bias (bias) for each content category, all categories can share the same set of bias. Secondly, in order to increase the distinctiveness of the content classification at different positions, additional position encoding information can also be added to the content classification. The calculation process of the position coding information may adopt the above-described trigonometric function-based calculation method in the transform structure, and is not described herein again. Then, the classification word vectors of the N contents are averaged (i.e., dimension reduction) to obtain a 48-dimensional vector of the history classification word vector. The historical classified word vector passes through a plurality of (such as 3) layers of feedforward neural network layers, and residual connection and layer normalization processing exist at the input and output of each sub-layer. In addition, the third feed-forward neural network layer may include several DNN layers. For example, the third feed-forward neural network layer may include 2 DNN layers, the first having a dimension of 192 and the second having a dimension of 48 that is consistent with the dimension of the content-classified word vectors. Specifically, the structure of the third feedforward neural network layer may be y = max (0,xw) ₁ +b ₁ )W ₂ +b ₂ It includes two linear transformations and a ReLU activation output, i.e., a structure that is DNN, reLU, and DNN. Wherein, b ₁ And b ₂ Representing the parameters of the linear transformation.

4033, fusing the coding vectors through a fusion network to generate user vectors.

The fusion network may adopt a DNN structure, and is configured to fuse coding vectors corresponding to the content sequence information, the content tag information, the content classification information, and the user basic information, respectively, together to generate a user vector. The user vector is capable of characterizing a user representation, i.e., a quantized representation of content of interest to the user.

The convergence processing procedure of the convergence network can be represented by the following formula:

y＝x _id W _id +x _tag W _tag +x _channel W _channel +x _age W _age +x _sex W _sex +b；

wherein, W _id 、W _tag 、W _channel 、W _age And W _sex Respectively representing a coding vector corresponding to the content sequence information, a coding vector corresponding to the content label information, a coding vector corresponding to the content classification information, a coding vector corresponding to the user age and a coding vector corresponding to the user gender; x is the number of _id 、x _tag 、x _channel 、x _age And x _sex Respectively representing the weight corresponding to each code vector; b represents the bias. Exemplarily, x _id Has a dimension of 64,x _tag And x _channel All of which have a dimension of 48,x _age And x _sex All dimensions of (A) are 8,W _id Vector of 64 by 64 dimensions, W _tag Vector of 48 x 64 dimensions, W _channel Vector of 48 by 64 dimensions, W _age Vector of 8 by 64 dimensions, W _sex For a vector of 8 x 64 dimensions, a 64-dimensional vector can be finally calculated. Optionally, in order to accelerate the convergence speed of the model, a layer normalization process may be added after the network is fused.

4034, outputting a corresponding classification result based on the user vector through the classifier.

The number of classes of the classifier may be determined according to the number of positive and negative examples contained in the training sample, for example, the number of classes may be the sum of the number of positive examples and the number of negative examples. For example, if the number of positive examples included in the training sample is 1 and the number of negative examples is 500, the number of classes of the classifier is 501. The classifier may be a SoftMax classifier or other types of classifiers, which is not limited in this embodiment.

Step 4035, based on the classification result and the positive and negative examples, a loss function value corresponding to the content recommendation model is calculated.

Step 4036, parameters of the content recommendation model are adjusted based on the loss function values.

And then, adjusting and optimizing parameters of the content recommendation model by calculating a loss function value corresponding to the content recommendation model, and stopping training when the content recommendation model meets a training stopping condition to obtain a trained content recommendation model.

In addition, referring to fig. 7 in combination, for the structure of the content recommendation model, the present application successively tests a model structure (denoted as model structure 1) of only a neural network, and a model structure (denoted as model structure 2) of introducing a multi-head attention mechanism and introducing label classification information of content and user basic information, a curve corresponding to an offline recall rate corresponding to model structure 1 is denoted by 71 in fig. 7, and a curve corresponding to an offline recall rate corresponding to model structure 2 is denoted by 72 in fig. 7, and from the test result, the network convergence of model structure 1 is slower than that of model structure 2, and the recall rate is significantly lower. The corresponding recall rate of the model structure 1 in network convergence is about 74.5%, the corresponding recall rate of the model structure 2 in network convergence is about 77.3%, and the recall rate of the model structure 2 is about 2.8% higher than that of the model structure 1, so that the effect is obvious.

In addition, the network structure of the content recommendation model can be realized by using an open source framework TensorFlow, the Adam method can be adopted to optimize model parameters, and meanwhile, in order to increase the robustness of the model, a certain amount of discarding (dropout) can be performed on the coding vectors corresponding to the behavior history information and the user basic information in the training process. In addition, the model may be trained at a certain period (e.g., on a day scale), and the online model may be updated.

In summary, in the technical solution provided by the embodiment of the present application, a user vector is generated by encoding behavior history information and user basic information based on a user account, and a content recommendation model is trained based on the user vector; the behavior history information comprises the content sequence viewed by the user history, and the label and the classification of the content viewed by the user history, so that the content sequence viewed by the user history, the label and the classification of the content and the user basic information are comprehensively used to generate a user vector for representing the user portrait, the user vector can more accurately and completely reflect the interest tendency of the user to the content, and the accuracy of content recommendation is improved.

In addition, in the embodiment of the application, an attention mechanism is introduced in the process of generating the user vector, so that the association relation among the contents viewed by the user history can be obtained, and the model can learn the association relation to obtain better performance.

In addition, in the embodiment of the application, the position coding information corresponding to the content sequence information is generated to capture the time sequence information of each content in the content sequence viewed by the user history, so that more useful input data are provided for the model, and the performance of the model is improved.

The above embodiment describes the offline part, the online part is content recommended according to the user vector of the user, and an optional method is to use a GPU (Graphics Processing Unit) server to recall similar content offline according to the behavior history information of the user in the latest statistical period (for example, the previous day). However, this method has a significant problem that the updating period of the user's behavior history information is long (e.g., one-day updating), so that the user's behavior history information is static and constant in one period, while the user's real-time behavior history is constant, which causes the problem of offline and online inconsistency.

To address this problem, the present application brings the logic of similar content computation to the line. Specifically, the overall processing flow of the online portion of the present application is shown in fig. 7 and 8. Next, the on-line portion will be described by way of examples.

Referring to fig. 8, a flowchart of a content recommendation method provided in an embodiment of the present application is shown, where the method may include the following steps (801 to 804):

step 801, acquiring behavior history information of a user account and user basic information, wherein the behavior history information is used for indicating a content sequence viewed by a user history and a label and a classification of the content, and the user basic information is used for indicating a user personalized feature.

And the online part can acquire corresponding behavior history information based on the behavior history of the user in a recent period of time. For example, the behavior history information may include several (e.g., 100) contents that the user has recently viewed. In addition, for explanation of the behavior history information and the basic information of the user, reference is made to the above, and details are not repeated here.

And step 802, generating a user vector according to the behavior history information and the user basic information.

The user vector is capable of characterizing a user representation, i.e., a quantized representation of content of interest to the user.

In an exemplary embodiment, in order to increase the distinguishability of each content at different positions, when a user vector is generated, an additional position encoding (positional encoding) information is added to a content sequence viewed by a user history, and a dimension of a word vector corresponding to the position encoding information may be the same as a dimension of a word vector corresponding to the content sequence information, which depicts the positions of the contents contained in the content sequence viewed by the user history and the relative distance between the contents. Optionally, generating corresponding position coding information according to content sequence information included in the behavior history information; and generating a user vector according to the behavior history information, the user basic information and the position coding information. For the generation of the position-coding information, reference is made to the above, and the description thereof is omitted.

In an exemplary embodiment, the user vector is generated by a content recommendation model. As shown in fig. 9, the content recommendation model 50 includes an encoder network 51 and a convergence network 52. Optionally, the content sequence information, the content tag information, the content classification information and the user basic information are respectively encoded through an encoder network to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are encoded after being fused with the position encoding information; and carrying out fusion processing on the coding vectors through a fusion network to generate user vectors.

Alternatively, as shown in fig. 9, the encoder network 51 includes a content sequence processing unit 511, a content tag processing unit 512, and a content classification processing unit 513. The content sequence processing unit 511 is configured to encode content sequence information, the content tag processing unit 512 is configured to encode content tag information, and the content classification processing unit 513 is configured to encode content classification information.

Alternatively, as shown in fig. 9, the content sequence processing unit 511 includes a multi-head attention layer, a first post-processing layer, a first feed-forward neural network layer, and a second post-processing layer. The coding vector corresponding to the content sequence information can be obtained by the following steps:

Optionally, as shown in fig. 9, the content tag processing unit 512 includes a second feedforward neural network layer and a third post-processing layer. The encoding vector corresponding to the content tag information can be obtained by the following steps:

Alternatively, as shown in fig. 9, the content classification processing unit 513 includes a third feedforward neural network layer and a fourth post-processing layer. The encoding vector corresponding to the content classification information can be obtained by the following steps:

For the processing procedure of each part of the content recommendation model, refer to the description in the training embodiment above, and are not described herein again.

Optionally, all operators of the Attention neural network are realized on line by using C + + based on the Eigen library, the operators mainly comprise a DNN Layer, a Multi-Head Self-attachment Layer, a Layer Norm Layer and an Embedding mapping Layer, the Eigen supports all matrix operations including fixed size and any size, linear algebra, matrix and vector operations are effectively supported, and therefore the time consumption of the operations can be greatly reduced by realizing the calculation operators based on the Eigen library.

In step 803, the similarity between the user vector and the content vector of the candidate content is calculated.

The content vector of the candidate content may generate a corresponding word vector based on the information such as the tag and the classification of the candidate content, and then generate a content vector based on the word vector. The similarity between the user vector and the content vector of the candidate content may adopt a cosine similarity algorithm, or may adopt other similarity algorithms, which is not limited in the embodiment of the present application.

In addition, considering that a large number of candidate contents generally exist in the content resource pool, the calculation of the similarity of all the participating user vectors is unrealistic, and in order to reduce the number of contents participating in the calculation of the similarity of the user vectors, one possible method is to cluster the contents, perform the calculation of similar clustering according to the user vectors, and further perform the calculation of similar contents. This approach achieves good results, but may result in missing the computation of part of similar content due to the problem of clustering accuracy.

An exemplary embodiment of the present application provides for similarity calculation in the following manner: obtaining the view quantity index of each content in the content resource pool; and selecting candidate contents with the view quantity indexes meeting the conditions from the content resource pool. The viewing quantity index is used for representing the viewing quantity of the content, such as the playing quantity of the video, the clicking quantity/reading quantity of the article and the like. For example, a target number of contents with the largest view volume index are selected from the content resource pool as candidate contents, and then the content vectors of the selected candidate contents and the user vectors are subjected to similarity calculation, so that the contents with high attention can be omitted as far as possible on the premise of reducing the calculation amount of the similarity calculation.

And step 804, selecting recommended contents provided for the user account from the candidate contents based on the similarity.

After calculating the similarity between the user vector and the content vectors of the candidate contents, selecting a plurality of target candidate contents with the similarity greater than a threshold and/or the highest similarity; and then, scoring and sequencing the target candidate contents, and selecting a set number of target candidate contents with the highest score as recommended contents to be provided for the user.

In an exemplary embodiment, target candidate contents with similarity meeting a condition are selected from the candidate contents; selecting recommended content provided for the user account from the target candidate content according to the label and the classification of the target candidate content; wherein, the label and the classification of the recommended content accord with the set proportion rule. By the method, the contents of each label or category can be uniformly recommended to the user by flexibly setting the proportion rule, and certain labels or categories can also be preferentially and intensively recommended to the user, so that the recommended contents are more flexible and controllable to select.

In summary, in the technical solution provided by the embodiment of the present application, a user vector is generated by encoding the behavior history information and the user basic information based on the user account, and the recommended content provided to the user account is selected based on the similarity between the user vector and the content vector; the behavior history information comprises the content sequence viewed by the user history, and also comprises the label and the classification of the content viewed by the user history, so that the content sequence viewed by the user history, the label and the classification of the content and the user basic information which are comprehensively used generate a user vector for representing the user portrait, the user vector can more accurately and completely reflect the interest tendency of the user to the content, and the accuracy of content recommendation is improved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Referring to fig. 10, a block diagram of a content recommendation device according to an embodiment of the present application is shown. The device has the function of implementing the content recommendation method example, and the function can be implemented by hardware or by hardware executing corresponding software. The device can be a computer device and can also be arranged in the computer device. The apparatus 1000 may include: an information acquisition module 1010, a vector generation module 1020, a similarity calculation module 1030, and a content recommendation module 1040.

The information obtaining module 1010 is configured to obtain behavior history information of the user account and user basic information, where the behavior history information is used to indicate a content sequence viewed by the user history and a tag and a category of the content, and the user basic information is used to indicate a user personalized feature.

A vector generating module 1020, configured to generate a user vector according to the behavior history information and the user basic information.

A similarity calculation module 1030, configured to calculate a similarity between the user vector and a content vector of the candidate content.

And a content recommending module 1040, configured to select recommended content provided to the user account from the candidate content based on the similarity.

In an exemplary embodiment, the vector generation module 1020 is configured to:

generating corresponding position coding information according to content sequence information contained in the behavior history information; wherein the content sequence information is used for indicating a content sequence viewed by the user in history, and the position coding information is used for indicating the position of the content contained in the content sequence and the relative distance between the contents;

and generating the user vector according to the behavior history information, the user basic information and the position coding information.

In an exemplary embodiment, the user vector is generated by a content recommendation model comprising an encoder network and a convergence network; the behavior history information comprises content sequence information, content label information and content classification information; the vector generation module 1020 is configured to:

respectively encoding the content sequence information, the content label information, the content classification information and the user basic information through the encoder network to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are fused with the position coding information and then are coded;

and performing fusion processing on the coding vector through the fusion network to generate the user vector.

In an exemplary embodiment, the encoder network comprises a content sequence processing unit for encoding the content sequence information; the content sequence processing unit comprises a multi-head attention layer, a first post-processing layer, a first feedforward neural network layer and a second post-processing layer; the vector generation module 1020 is configured to:

processing the fusion vector of the content sequence information and the position coding information through the multi-head attention layer;

performing residual error connection and layer normalization processing on the output vectors of the multi-head attention layer through the first post-processing layer;

processing an output vector of the first post-processing layer by the first feedforward neural network layer;

and performing residual connection and layer normalization processing on the output vector of the first feedforward neural network layer through the second post-processing layer to obtain a coding vector corresponding to the content sequence information.

In an exemplary embodiment, the encoder network comprises a content tag processing unit for encoding the content tag information; wherein the content tag processing unit comprises a second feed-forward neural network layer and a third post-processing layer; the vector generation module 1020 is configured to:

processing the fusion vector of the content tag information and the position coding information through the second feedforward neural network layer;

and performing residual connection and layer normalization processing on the output vector of the second feedforward neural network layer through the third post-processing layer to obtain a coding vector corresponding to the content tag information.

In an exemplary embodiment, the encoder network comprises a content classification processing unit for encoding the content classification information; the content classification processing unit comprises a third feed-forward neural network layer and a fourth post-processing layer; the vector generation module 1020 is configured to:

processing, by the third feed-forward neural network layer, the fusion vector of the content classification information and the position coding information;

and performing residual connection and layer normalization processing on the output vector of the third feedforward neural network layer through the fourth post-processing layer to obtain a coding vector corresponding to the content classification information.

In an exemplary embodiment, the apparatus 1000 further comprises: a content screening module to:

obtaining the view quantity index of each content in the content resource pool;

and selecting the candidate content with the viewing volume index meeting the condition from the content resource pool.

In an exemplary embodiment, the content recommendation module 1040 is configured to:

selecting target candidate contents with similarity meeting conditions from the candidate contents;

selecting recommended content provided for the user account from the target candidate content according to the label and the classification of the target candidate content;

and the labels and the classifications of the recommended contents conform to a set proportion rule.

Referring to fig. 11, a block diagram of a training apparatus for a content recommendation model according to an embodiment of the present application is shown. The device has the function of realizing the training method example of the content recommendation model, and the function can be realized by hardware or by hardware executing corresponding software. The device can be a computer device, and can also be arranged in the computer device. The apparatus 1100 may include: a log acquisition module 1110, a sample generation module 1120, and a model training module 1130.

The log obtaining module 1110 is configured to obtain a user behavior log, where the user behavior log is a log generated based on content viewing behavior of a user history.

A sample generating module 1120, configured to generate a training sample based on the user behavior log, where the training sample includes behavior history information, user basic information, positive examples, and negative examples; the behavior history information is used for indicating the content sequence viewed by the user history and the label and the classification of the content, and the user basic information is used for indicating the personalized features of the user.

A model training module 1130, configured to train a content recommendation model using the training samples; wherein the input data of the content recommendation model includes the behavior history information and the user basic information.

In an exemplary embodiment, the content recommendation model includes an encoder network, a convergence network, and a classifier; the behavior history information comprises content sequence information, content label information and content classification information;

the model training module 1130, configured to:

generating corresponding position coding information according to the content sequence information; wherein the content sequence information is used for indicating a content sequence viewed by the user in history, and the position coding information is used for indicating the position of the content contained in the content sequence and the relative distance between the contents;

fusing the coding vectors through the fusion network to generate the user vectors;

outputting, by the classifier, a corresponding classification result based on the user vector;

calculating a loss function value corresponding to the content recommendation model based on the classification result and the positive examples and the negative examples;

adjusting parameters of the content recommendation model based on the loss function value.

In an exemplary embodiment, the negative examples include at least one of: and displaying the unviewed content and the content randomly selected from the content resource pool.

In summary, in the technical solution provided by the embodiment of the present application, a user vector is generated by encoding behavior history information and user basic information based on a user account, and a content recommendation model is trained based on the user vector; the behavior history information comprises the content sequence viewed by the user history, and also comprises the label and the classification of the content viewed by the user history, so that the content sequence viewed by the user history, the label and the classification of the content and the user basic information which are comprehensively used generate a user vector for representing the user portrait, the user vector can more accurately and completely reflect the interest tendency of the user to the content, and the accuracy of content recommendation is improved.

It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, the division of each functional module is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments, which are not described herein again.

Referring to fig. 12, a schematic structural diagram of a computer device according to an embodiment of the present application is shown. Optionally, the computer device may be a terminal such as a mobile phone, a tablet computer, a multimedia playing device, a wearable device, a PC, or may be a server. Specifically, the method comprises the following steps:

the computer device 1200 includes a processor 1201, such as a CPU (Central Processing Unit) and/or a GPU (Graphics Processing Unit), a system Memory 1204 including a RAM (Random Access Memory) 1202 and a ROM (Read Only Memory) 1203, and a system bus 1205 connecting the system Memory 1204 and the Central Processing Unit 1201. The computer device 1200 also includes a basic I/O (Input/Output) system 1206 that facilitates transfer of information between various devices within the computer, and a mass storage device 1207 for storing an operating system 1213, application programs 1214, and other program modules 1215.

The basic input/output system 1206 includes a display 1208 for displaying information and an input device 1209, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 1208 and input device 1209 are connected to the central processing unit 1201 through an input-output controller 1210 coupled to the system bus 1205. The basic input/output system 1206 may also include an input/output controller 1210 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 1210 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 1207 is connected to the central processing unit 1201 through a mass storage controller (not shown) connected to the system bus 1205. The mass storage device 1207 and its associated computer-readable media provide non-volatile storage for the computer device 1200. That is, the mass storage device 1207 may include a computer-readable medium (not shown) such as a hard disk or CD-ROM (Compact disk Read-Only Memory) drive.

Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read Only Memory), flash Memory or other solid state Memory technology, CD-ROM or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 1204 and mass storage device 1207 described above may be collectively referred to as memory.

According to various embodiments of the present application, the computer device 1200 may also operate as a remote computer connected to a network through a network, such as the Internet. That is, the computer device 1200 may connect to the network 1212 through a network interface unit 1211 coupled to the system bus 1205, or may connect to other types of networks and remote computer systems (not shown) using the network interface unit 1211.

The memory also includes at least one instruction, at least one program, set of codes, or set of instructions stored in the memory and configured to be executed by the one or more processors to implement the above-described content recommendation method, or to implement a training method of the above-described content recommendation model.

In an exemplary embodiment, a computer readable storage medium is also provided, having stored therein at least one instruction, at least one program, code set, or set of instructions which, when executed by a processor of a computer device, implements the above-described content recommendation method.

In an exemplary embodiment, a computer readable storage medium is also provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which when executed by a processor of a computer device, implements the method of training of the content recommendation model described above.

Optionally, the computer-readable storage medium may include: ROM, RAM, SSD (Solid State Drives), optical disks, etc. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).

In an exemplary embodiment, a computer program product is also provided, which, when executed by a processor of a computer device, is configured to implement the above-mentioned content recommendation method.

In an exemplary embodiment, a computer program product is also provided, which, when being executed by a processor of a computer device, is configured to implement the above-mentioned training method of the content recommendation model.

It should be noted that the information (including but not limited to the subject equipment information, subject personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals referred to in this application are authorized by the subject or fully authorized by each party, and the collection, use and processing of the relevant data are in compliance with relevant laws and regulations and standards in relevant countries and regions. For example, the behavior history information, the user basic information, and the like referred to in the present application are acquired with sufficient authorization.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only show an exemplary possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the drawings, which is not limited by the embodiments of the present application.

The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for recommending content, the method comprising:

acquiring behavior history information and user basic information of a user account; the behavior history information comprises content sequence information, content label information and content classification information, the behavior history information is used for indicating content sequences viewed by a user history and labels and classifications of the content, the content sequence information is used for indicating the content sequences viewed by the user history, and the user basic information is used for indicating user personalized features;

generating corresponding position coding information according to the content sequence information; the position coding information is used for capturing the time sequence information of each content in the content sequence viewed by the user history, and the position coding information is used for indicating the position of the content contained in the content sequence and the relative distance between the contents;

respectively encoding the content sequence information, the content label information, the content classification information and the user basic information to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are fused with the position coding information and then are coded;

performing fusion processing on the coding vectors to generate user vectors, wherein the user vectors are used for reflecting the interest tendency of users to the content;

and selecting recommended content provided for the user account from the candidate content based on the similarity.

2. The method of claim 1, wherein the user vector is generated by a content recommendation model comprising an encoder network and a convergence network;

the encoder network is configured to encode the content sequence information, the content tag information, the content classification information, and the user basic information, respectively, and generate the encoding vector;

the fusion network is used for carrying out fusion processing on the coding vector to generate the user vector.

3. The method of claim 2, wherein the encoder network comprises a content sequence processing unit configured to encode the content sequence information; the content sequence processing unit comprises a multi-head attention layer, a first post-processing layer, a first feedforward neural network layer and a second post-processing layer; the method further comprises the following steps:

processing, by the first feedforward neural network layer, an output vector of the first post-processing layer;

4. The method of claim 2, wherein the encoder network comprises a content tag processing unit configured to encode the content tag information; wherein the content tag processing unit comprises a second feedforward neural network layer and a third post-processing layer; the method further comprises the following steps:

5. The method of claim 2, wherein the encoder network comprises a content classification processing unit configured to encode the content classification information; the content classification processing unit comprises a third feed-forward neural network layer and a fourth post-processing layer; the method further comprises the following steps:

processing the fused vector of the content classification information and the position coding information through the third feedforward neural network layer;

6. The method according to any one of claims 1 to 5, wherein before calculating the similarity between the user vector and the content vector of the candidate content, further comprising:

obtaining the view quantity index of each content in the content resource pool;

7. The method according to any one of claims 1 to 5, wherein the selecting recommended content provided to the user account from the candidate content based on the similarity comprises:

selecting recommended content provided to the user account from the target candidate content according to the label and the classification of the target candidate content;

8. A method for training a content recommendation model, the method comprising:

generating a training sample based on the user behavior log, wherein the training sample comprises behavior history information, user basic information, positive examples and negative examples; the behavior history information comprises content sequence information, content label information and content classification information, the behavior history information is used for indicating content sequences viewed by a user history and labels and classifications of the content, the content sequence information is used for indicating the content sequences viewed by the user history, and the user basic information is used for indicating user personalized features;

respectively encoding the content sequence information, the content label information, the content classification information and the user basic information to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are encoded after being fused with the position encoding information;

performing fusion processing on the coding vectors to generate the user vectors, wherein the user vectors are used for reflecting the interest tendency of the user to the content;

training a content recommendation model based on the user vector and the positive and negative examples; wherein the input data of the content recommendation model includes the behavior history information and the user basic information.

9. The method of claim 8, wherein the content recommendation model comprises an encoder network, a convergence network, and a classifier; the encoder network is configured to encode the content sequence information, the content tag information, the content classification information, and the user basic information, respectively, to generate the encoding vector; the fusion network is used for performing fusion processing on the coding vector to generate the user vector;

the training of the content recommendation model based on the user vector and the positive and negative examples comprises:

calculating a loss function value corresponding to the content recommendation model based on the classification result and the positive and negative examples;

10. The method according to claim 8 or 9, wherein the negative examples comprise at least one of: and displaying the unviewed content and the content randomly selected from the content resource pool.

11. A content recommendation apparatus, characterized in that the apparatus comprises:

the information acquisition module is used for acquiring behavior history information and user basic information of the user account; the behavior history information comprises content sequence information, content label information and content classification information, the behavior history information is used for indicating the content sequences viewed by the user history and the labels and the classifications of the content, the content sequence information is used for indicating the content sequences viewed by the user history, and the user basic information is used for indicating the user personalized features;

the vector generation module is used for generating corresponding position coding information according to the content sequence information; the position coding information is used for capturing the time sequence information of each content in the content sequence viewed by the user history, and the position coding information is used for indicating the position of the content contained in the content sequence and the relative distance between the contents; respectively encoding the content sequence information, the content label information, the content classification information and the user basic information to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are fused with the position coding information and then are coded; performing fusion processing on the coding vectors to generate user vectors, wherein the user vectors are used for reflecting the interest tendency of users to contents;

12. An apparatus for training a content recommendation model, the apparatus comprising:

the sample generating module is used for generating a training sample based on the user behavior log, wherein the training sample comprises behavior history information, user basic information, positive examples and negative examples; the behavior history information comprises content sequence information, content label information and content classification information, the behavior history information is used for indicating content sequences viewed by a user history and labels and classifications of the content, the content sequence information is used for indicating the content sequences viewed by the user history, and the user basic information is used for indicating user personalized features;

the model training module is used for generating corresponding position coding information according to the content sequence information; the position coding information is used for capturing the time sequence information of each content in the content sequence viewed by the user history, and the position coding information is used for indicating the position of the content contained in the content sequence and the relative distance between the contents; respectively encoding the content sequence information, the content label information, the content classification information and the user basic information to generate corresponding encoding vectors; the content sequence information, the content label information and the content classification information are encoded after being fused with the position encoding information; performing fusion processing on the coding vectors to generate the user vectors, wherein the user vectors are used for reflecting the interest tendency of the user to the content; training a content recommendation model based on the user vector and the positive and negative examples; wherein the input data of the content recommendation model includes the behavior history information and the user basic information.

13. A computer device, characterized in that the computer device comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method according to any of claims 1 to 7, or to implement the method according to any of claims 8 to 10.

14. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the method of any one of claims 1 to 7, or to implement the method of any one of claims 8 to 10.