CN111008336A

CN111008336A - Content recommendation method, device and equipment and readable storage medium

Info

Publication number: CN111008336A
Application number: CN201911337184.XA
Authority: CN
Inventors: 余志伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-23
Filing date: 2019-12-23
Publication date: 2020-04-14

Abstract

The embodiment of the application discloses a content recommendation method, a device, equipment and a readable storage medium; the embodiment of the application can receive a content browsing request sent by a terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data to obtain target characteristic vectors; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and transmitting the target content to the terminal. According to the scheme, the content browsing behavior characteristic data of the user and the user characteristic data of the user can be fused, so that the accuracy of content recommendation performed by the user is greatly improved after the user characteristic data is fused.

Description

Content recommendation method, device and equipment and readable storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a content recommendation method, device, equipment and readable storage medium.

Background

With the rapid development of networks, people's daily life is more and more unable to leave the networks, and people's daily life habits such as reading books, reading videos, reading news, reading articles and the like through the networks are formed. Taking an electronic book as an example, with the explosive growth of various information contents (such as news, articles, and video contents) on the internet, it is increasingly difficult for a user to select a favorite book from a plurality of electronic books, so that actively recommending the book of interest to the user is just a feasible and efficient solution.

The recommendation algorithm of browsing contents at present is mainly an article-based collaborative filtering algorithm, which can construct similarities between books through historical browsing behaviors of users, such as read books, and then recommend books that may be of interest to users according to the similarities between books and the reading history of the users.

In the course of research and practice on the prior art, the inventor of the present invention found that the existing algorithms for recommending browsing content only consider the historical browsing behavior of the user, and then recommend the content based on the historical browsing behavior by rigidly adopting some established rules, thereby resulting in an insufficient accuracy of content recommendation.

Disclosure of Invention

The embodiment of the application provides a content recommendation method, a content recommendation device and a readable storage medium, which can improve the accuracy of content recommendation.

The embodiment of the application provides a content recommendation method, which comprises the following steps:

receiving a content browsing request sent by a terminal;

according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal;

generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data;

fusing the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data to obtain a target characteristic vector;

determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content;

and sending the target content to the terminal.

Correspondingly, an embodiment of the present application further provides a content recommendation device, including:

a receiving unit, configured to receive a content browsing request sent by a terminal;

the acquisition unit is used for acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal according to the content browsing request;

the generating unit is used for generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data;

the fusion unit is used for fusing the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data to obtain a target characteristic vector;

the determining unit is used for determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content;

and the sending unit is used for sending the target content to the terminal.

In some embodiments, the fusion unit comprises:

the acquiring subunit is used for acquiring the heat weight of each content in the preset content library;

and the fusion subunit is used for performing weighted fusion on the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data based on the heat weight of each content to obtain a target characteristic vector.

In some embodiments, the fusion subunit is configured to:

setting the weight of a word vector corresponding to the content browsing behavior characteristic data based on the heat weight of each content;

and carrying out weighted fusion on the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data according to the preset weight of the word vectors of the user characteristic data and the weight of the word vectors corresponding to the content browsing behavior characteristic data to obtain a target characteristic vector.

In some embodiments, the obtaining subunit is configured to:

calculating the ratio of the number of the browsed users of each content according to the number of the browsed users of each content and the total number of the users;

and calculating the heat weight of each content in the preset content library based on the ratio of the number of browsed users of each content.

In some embodiments, the content recommendation device further comprises: a training unit, the training unit comprising:

the characteristic obtaining subunit is used for obtaining sample user characteristic data of a sample user and sample content browsing behavior characteristic data;

and the training subunit is used for training a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data and a word vector corresponding to the sample content browsing behavior characteristic data.

In some embodiments, the sample content browsing behavior feature data includes a historical browsing content sequence browsed by the sample user, and the training subunit is configured to:

fusing the sample user characteristic data and the historical browsing content sequence to obtain a sequence to be trained;

and training a preset word vector model based on the sequence to be trained to obtain the trained word vector model, the word vectors of the sample user characteristic data and a word vector sequence corresponding to the historical browsing content sequence.

In some embodiments, the determining unit is configured to:

according to the target characteristic vector and the word vectors of the candidate contents, obtaining the similarity between the target characteristic vector and the word vectors of the candidate contents;

and determining the target content to be recommended from the candidate content according to the similarity.

Accordingly, the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in any content recommendation method provided in the embodiments of the present application.

In addition, the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in any one of the content recommendation methods provided by the embodiment of the present application.

The embodiment of the application can receive a content browsing request sent by a terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data to obtain a target characteristic vector; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and sending the target content to the terminal. According to the scheme, the content browsing behavior characteristic data of the user and the user characteristic data of the user can be fused, so that the content recommendation is carried out after the user characteristic data is fused, and the accuracy is greatly improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1a is a scene schematic diagram of a content recommendation method provided in an embodiment of the present application;

fig. 1b is a schematic flowchart of a content recommendation method provided in an embodiment of the present application;

fig. 1c is a schematic structural diagram of a CBOW model provided in an embodiment of the present application;

FIG. 1d is another schematic structural diagram of a CBOW model according to an embodiment of the present disclosure;

fig. 2a is another schematic flow chart of a content recommendation method provided in an embodiment of the present application;

FIG. 2b is a schematic diagram of a content recommendation application scenario provided in an embodiment of the present application;

fig. 2c is a schematic structural diagram of a block chain provided in an embodiment of the present application;

fig. 2d is another schematic structural diagram of a block chain provided in the embodiment of the present application;

fig. 3a is a schematic structural diagram of a content recommendation device according to an embodiment of the present application;

fig. 3b is a schematic structural diagram of a content recommendation device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the application provides a content recommendation method, a content recommendation device, computer equipment and a computer readable storage medium. The content recommendation device may be integrated in a computer device, and the computer device may be a server or a terminal.

The content recommendation scheme provided by the embodiment of the application relates to Natural Language Processing (NLP) in the field of Artificial Intelligence (AI). The word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data can be generated through an artificial intelligent natural language processing technology.

The artificial intelligence is a theory, a method, a technology and an application system which simulate, extend and expand human intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and obtain the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Among them, the natural language processing technology is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.

For example, referring to fig. 1a, a content recommendation system provided by the embodiment of the present invention includes a terminal 10, a server 11, and the like; the terminal 10 and the server 11 are connected via a network, such as a wired or wireless network, wherein the image search device is integrated in the terminal, such as in the form of a client.

The terminal 10 may send a content browsing request of the user to the server, and when the server receives the content browsing request sent by the terminal, the terminal may detect user characteristic data and content browsing behavior characteristic data of the user, and report the user characteristic data and the content browsing behavior characteristic data to the server, so that the server may determine, from the candidate content, a target content with a high similarity to the historical browsing content of the user, and return the target content to the terminal for display. Wherein the user characteristic data may comprise: gender, age, etc. of the user, the content browsing behavior feature data may include: content that the user browses, content that the user has performed browsing operations, and the like.

The server 11 may receive a content browsing request sent by the terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data to obtain target characteristic vectors; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and sending the target content to a terminal.

The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.

In this embodiment, it will be described from the perspective of a content recommendation apparatus, which may be specifically integrated in a computer device, for example, the content recommendation apparatus may be an entity apparatus provided in the computer device, or the content recommendation apparatus may be integrated in the computer device in the form of a client, such as a server or the like.

As shown in fig. 1b, the specific flow of the content recommendation method may be as follows:

101. and receiving a content browsing request sent by the terminal.

The content is content available for the user to browse, and the content may be of various types, such as articles, news, electronic books, videos, and the like. The content browsing request is a browsing request for content (such as articles, electronic books, news and the like) triggered by a user at a terminal, for example, a reading request triggered by the user when the user uses an electronic book reading client. The triggering mode of the content browsing request may be triggered by the user through the terminal to which the user belongs, for example, by clicking or sliding a content browsing trigger key in the terminal application interface, or by performing search triggering through a content search box. When a terminal to which a user belongs detects a content browsing request of the user, the content browsing request is sent to a server, and the server receives the content browsing request sent by the terminal.

102. And acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal according to the content browsing request.

The characteristic data of the user is mainly data representing individual characteristics of the user, such as sex, age, and the like of the user. The content browsing behavior feature data is browsing behavior feature data of a user in a historical time period, such as content browsing behavior feature data which is at the current time and before the current time and is within a time threshold range.

The content browsing behavior feature data may include: the content identification information of the content corresponding to the browsing behavior which is browsed by the user in the historical time period and meets the preset browsing duration, and the content identification information of the content which is browsed by the user in the historical time period. Taking an electronic book as an example, the browsing behavior feature data may include book identification information corresponding to an electronic book that has been read by a user and has a reading time longer than a preset time, and an electronic book that has been read by the user through clicking or sliding.

The content identification information is information for identifying each content in the preset content library to distinguish different contents, such as a book name of each electronic book, or a number of each electronic book, such as an ID number (Identity document), may be used to distinguish different books. In consideration of the possibility of duplication of content names of different contents, in the embodiment of the present application, different contents in the preset content library may be numbered, and the number of each content may be used to distinguish different contents in the preset content library.

The time threshold and the preset browsing duration may be set according to the requirements of the practical application, for example, the time threshold may be within the last half year, or within the last 1 year, and the like; the preset browsing duration may be 20 minutes, half an hour, or an hour, etc., and will not be described herein.

In the embodiment of the application, the user characteristic data and the content browsing behavior characteristic data of a plurality of users can be acquired.

The user characteristic data and the content browsing behavior characteristic data of the user can be reported by the terminal, for example, the terminal can detect the user characteristic data and the content browsing behavior characteristic data of the user and report the user characteristic data and the content browsing behavior characteristic data to the server, so that the server can collect the user characteristic data and the content browsing behavior characteristic data of each user.

103. And generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data.

The user characteristic data may specifically include the age and gender of the user; the content browsing behavior feature data of the user may include a content number sequence of history browsing contents that the user has browsed, and a content number sequence of contents that the user has historically performed browsing operations. Wherein each content number in the content number sequence is used as a word; the user characteristic data of each user can be represented as a word in a combined representation, and the format can be gender-age, such as: male 27 is a word, female 26 is also a word, etc.

In some embodiments, to generate a word vector of the user characteristic data and a word vector corresponding to the content browsing behavior characteristic data, a preset word vector model pair may be trained by using sample user characteristic data of the sample user and the sample content browsing behavior characteristic data, so as to obtain a trained word vector model, a word vector of the sample user characteristic data, and a word vector corresponding to the sample content browsing behavior characteristic data. Specifically, the content recommendation method further includes:

acquiring sample user characteristic data and sample content browsing behavior characteristic data of a sample user;

and training a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data and a word vector corresponding to the sample content browsing behavior characteristic data.

The process of obtaining the sample user characteristic data and the sample content browsing behavior characteristic data of the sample user may specifically refer to the description in step 102.

The sample content browsing behavior characteristic data comprises a historical browsing content sequence browsed by a sample user. In an embodiment, in order to embody the similarity between each content in the history browsing content sequence of the sample user, content identifiers (e.g., content numbers) corresponding to each content in the history browsing content of the sample user may be sorted according to a preset rule, so as to obtain the history browsing content sequence of the sample user, for example, the content numbers of each content browsed by the sample user in the history time period may be sorted according to the browsing duration, that is, two contents with browsing durations that are closer are higher than each other, and the similarity of the two contents is higher for the browsing preference of the sample user.

In one embodiment, before training the preset word vector model pair by using the sample user characteristic data and the sample content browsing behavior characteristic data, it is considered that the user characteristics of different users have different preferences for browsing contents, that is, people of different genders and ages have different preferences for browsing contents, for example, a male in 27 years is more inclined to swordsman or science fiction novels, while a female in the same age is inclined to emotion-like readings. In order to make the recommended content for the user characteristics of different users more accurate, the sample user characteristic data and the sample content browsing behavior characteristic data can be fused, and the preset word vector model is trained based on the data to be trained obtained after fusion. Specifically, the step "training a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data, and a word vector corresponding to the sample content browsing behavior characteristic data" may include:

fusing sample user characteristic data and a historical browsing content sequence to obtain a sequence to be trained;

and training a preset word vector model based on a sequence to be trained to obtain the trained word vector model, word vectors of sample user characteristic data and a word vector sequence corresponding to the historical browsing content sequence.

For example, word representations (gender _ age) of the user feature data can be randomly embedded into the history browsing content sequence (e.g., content number sequence) to obtain a word sequence to be trained, for example, Sn contents exist in a preset content library, the word sequence to be trained can be (S1, S3, S5, male _27, S10), (S4, female _26, S7, S12, S20), each sample user has a word sequence to be trained, and thus a set of sequences to be trained of the sample users can be obtained.

After the sequence to be trained is obtained, in the embodiment of the application, a preset word vector model may be trained based on the word sequence to be trained, so as to obtain the trained word vector model and a word vector of each word in the word sequence to be trained.

For example, when a word is input to the word vector model, the word vector model can output a vector expression of the word, which is an important technology in natural language processing. The word vector model may be a neural network-based word vector model; for example, the word vector model may include: word2vec, and so on.

Word2vec can map words into a low-dimensional vector space, and the similarity between words is obtained by calculating the distance between two words. The Word2vec model types may be various, and may include, for example, a CBOW (Continuous Bag of Words) model, a Skip-gram model, and so on.

The Skip-gram model is a word vector model for predicting the upper and lower words through the current word. For example, the input of the CBOW model is wi contexts wi-c, … … wi-2, wi-1, wi +2, … … wi + c of wi, and the output is wi, wherein the window size c of the context can be set according to the requirements of the actual application. For example, there is a sentence "I drive my car to the store". If the word group { "I", "drive", "my", "to", "the", "store" } is input data, the "car" is output.

For example, referring to fig. 1c, the CBOW model may include: an Input layer, a hidden layer, and an Output layer; each layer includes a plurality of neurons.

In FIG. 1c, the input vector { x1, …, xC } represents the one-hot encoding corresponding to a context word for a word, and the corresponding output vector is the word vector y for that word. The ith row of the weight matrix W between the input layer and the hidden layer represents the weight of the ith word in the vocabulary.

The weight matrix W is the target (same as W') that needs to be learned or trained because the weight matrix contains the weight information of all words in the vocabulary. In the above model, each output word vector also has an output vector W' of dimension N × V. The final model also has a hidden layer of N nodes, and we can find that the input of the node hi of the hidden layer is the weighted sum of the input layer inputs. Thus, since the input vector { x1, …, xC } is one-hot encoded, only non-zero elements in the vector can produce input to the hidden layer.

In natural language processing, the input of word2vec is the content of a text, or a series of sequences made up of words, in which the words have only a contextual relationship. word2vec is the mining of the intrinsic meaning of words by word co-occurrence, with words having similar contexts having similar semantics. Then, how do word2vec be trained by the sequence to be trained?

Taking a CBOW model as an example, specifically, in the embodiment of the present application, a sequence to be trained may be sampled to obtain a context word sequence of a target word in the sequence to be trained, the context word sequence is input to an input layer of the CBOW model, a hidden layer in the CBOW model predicts the target word according to the input context word sequence, an output layer of the CBOW model will output a predicted word vector of the predicted target word, then a gradient descent algorithm may be used to reduce an error between the predicted value (i.e., the predicted word vector of the target word) and a true value (i.e., the true word vector of a target word unit), and the model is continuously trained to adjust model parameters such as weights and continuously update the word vector of the target word, so as to finally obtain a trained word vector model and a word vector of each word in the sequence to be trained. The context word sequence of each word includes n words before and after each word, for example, taking the sliding window size as 2 as an example, the context word sequence of the word wt in the sequence to be trained may include { w (t-2), w (t-1) }, { w (t +1), w (t +2) }.

In an embodiment, the sample user feature data and the historical browsing content sequence are fused to obtain the sequence to be trained in various ways, for example, the fusion can be performed in a splicing or summing manner, and specifically, the sample user feature data of the sample user and the sample historical browsing content sequence can be embedded into the historical browsing content sequence of the sample user in a splicing or summing manner. As shown in fig. 1d, the part of speech _ age (sex _ age) represented by the sample user feature data of each sample user and the sample history browsing content sequences w (t-2), w (t-1), w (t +2) of the sample users respectively predict word vectors corresponding to the word wt through concatenation or summation after bitwise addition.

It should be noted that after the training of the preset word vector model is completed through the sequences to be trained of the plurality of sample users, the trained word vector model and the word vector corresponding to each word in the sequences to be trained, that is, the word vector including the user feature data (gender _ age) of the sample user and the word vector corresponding to the content number of each content in the historical browsing content sequences of the sample users, can be obtained. The content number of each content in the history browsing content sequence of the sample user is the content number of each content in the preset content library, so that vectors corresponding to the content numbers of each content in the preset content library can be obtained completely. The obtained trained word vector model and the word vector corresponding to each word in the sequence to be trained can be stored in a preset storage space.

Therefore, the step of generating the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data may be directly obtaining the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data from a preset storage space; or generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data through a trained word vector model.

104. And fusing the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior data to obtain a target characteristic vector. For example, the following may be specifically mentioned:

acquiring the heat weight of each content in a preset content library;

and performing weighted fusion on the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data based on the heat weight of each content to obtain a target characteristic vector.

In an embodiment, in consideration of the influence of the hot content, one hot weight may be set for each content in a preset content library, so as to reduce the influence of the hot content on the target feature vector in the process of obtaining the target feature vector of the user, that is, the hot content should not contribute too much when characterizing the browsing behavior of the user. Specifically, the step of "obtaining the heat weight of each content in the preset content library" may include: calculating the ratio of the number of the browsed users of each content according to the number of the browsed users of each content and the total number of the users; and calculating the heat weight of each content in the preset content library based on the ratio of the number of browsed users of each content.

For example, a heat weight of each content in the preset content library is calculated, and the formula may be as follows:

wherein n refers to the total number of contents in the preset content library, and Bid _ i represents the heat of the contents. The total number of the contents in the preset content library and the total number of the users can be preset by operation and maintenance personnel.

In an embodiment, in order to recommend different contents to different users more accurately, before that, user characteristic data of the user and characteristic data of content browsing behavior of the user may be fused to construct a target characteristic vector of the user, specifically, the step "weighting and fusing word vectors of the user characteristic data and word vectors corresponding to the characteristic data of content browsing behavior based on a heat weight of each content to obtain the target characteristic vector" may include:

The content browsing behavior data comprises a content number sequence of the content browsed by the user and a content number sequence of the content clicked by the user.

For example, the weights corresponding to the word vectors of the content numbers of each content browsed by the user and clicked by the user can be set according to the calculated heat weight of each content. And then carrying out weighted fusion, such as weighted summation, on the word vectors of all browsed contents and the word vectors of all clicked contents according to the weight of each word vector and the preset weight of the word vectors of the user feature data to obtain the target feature vector of the user. For example, the formula is as follows:

wherein, Vertor _ user is the target characteristic vector of the user, and k is the target characteristic vector browsed and clicked by the userTotal number of contents of (1), Vertor_iAnd the Vertor _ profile is a word vector of the user characteristic data of the user.

In an embodiment, the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data are subjected to weighted fusion, and besides a weighted summation mode, fusion can be performed in a splicing mode, for example, the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data are connected to form a target characteristic vector of the user.

105. And determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content.

After the steps, a target characteristic vector of the user and a word vector of the content number of each candidate content in a preset content library can be obtained; and then, the target content to be recommended to the user can be determined from the candidate contents according to the target characteristic vector of the user and the word vector of each candidate content.

For example, in some embodiments, to improve the accuracy of content recommendation, the distance (i.e., similarity) between the target feature vector and the word vector of each candidate content in the content library may be calculated to implement content recommendation, and in particular, the step "determining the target content to be recommended from the candidate content according to the target feature vector and the word vector of the candidate content" may include:

In an embodiment, to improve the efficiency of the calculation and to quickly calculate the distance (i.e., the similarity) between the target feature vector and the word vector of each candidate content in the preset content library among the word vectors of the plurality of candidate contents, an approximate nearest neighbor search algorithm, such as a local sensitive Hashing algorithm (LSH), may be used to calculate the similarity, and then rank each candidate content in the preset content library according to the similarity between the target feature vector and the word vector of each candidate content, and select the content ranked in the top several bits (i.e., the content closer to the top several bits) from the ranked candidate content as the target content to be recommended according to a preset rule (e.g., select the content ranked in the top several bits).

106. And transmitting the target content to the terminal.

For example, the target content is transmitted to the terminal so that the terminal displays the target content.

The scheme provided by the embodiment of the application can be applied to scenes needing content recommendation, such as electronic book recommendation of a reading client, news recommendation of a news client, recommendation of information streams such as public numbers, articles and videos, and the like.

As can be seen from the above, the embodiment of the present application may receive a content browsing request sent by a terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data to obtain target characteristic vectors; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and transmitting the target content to the terminal. According to the scheme, user characteristic data (such as gender and age) of a user is fused into content browsing behavior characteristic data of the user, the user characteristic data of a sample user and the content browsing behavior characteristic data (such as a historical browsing content sequence) are fused to be used as data to be trained, a word vector model is trained, word vectors of the user characteristic data and word vectors (including word vectors of each content) corresponding to the content browsing behavior characteristic data are generated, and finally the content is recommended to the user through the constructed target characteristic vector of the user and the word vector of each content in a content library; according to the scheme, the content browsing behavior characteristic data of the user and the user characteristic data of the user can be fused, so that the accuracy of content recommendation performed by the user is greatly improved after the user characteristic data is fused.

The method described in the above embodiments is further illustrated in detail by way of example.

In the present embodiment, the content recommendation apparatus will be described by taking an example in which the content recommendation apparatus is specifically integrated in a server.

As shown in fig. 2a, a content recommendation method may specifically include the following processes:

201. the server receives a content browsing request sent by the terminal.

The content browsing request is a browsing request for content (such as articles, electronic books, news, and the like) triggered by a user at a terminal, for example, a reading request triggered by the user when the user uses the electronic books to read a client, where the reading request may be triggered by the user through a reading trigger key in an application interface of the reading client, or a reading search box, such as clicking or sliding the reading trigger key, or reading the search box. When a terminal to which a user belongs detects a content browsing request of the user, the content browsing request is sent to a server, and the server receives the content browsing request sent by the terminal.

For example, referring to fig. 2b, taking the reading client as an example, the user may trigger a reading request for an electronic book by performing a triggering operation such as clicking, sliding, etc. on the search box, or trigger a reading request for contents such as an electronic book, an article, etc. by using a reading triggering key such as a "book city" button or a "story" button.

202. And the server acquires the user characteristic data and the content browsing behavior characteristic data of the user corresponding to the terminal according to the content browsing request.

For example, the server may obtain user characteristic data of the end user, such as sex and age of the user, and content browsing behavior characteristic data of the user, such as a number sequence of e-books read in a historical period of time, and a number sequence of triggered e-books (e.g., performing an operation such as clicking or sliding) according to a received content browsing request, such as a reading request of an e-book.

203. And the server acquires sample user characteristic data and sample content browsing behavior characteristic data of the sample user.

The process of the server obtaining the sample user characteristic data and the sample content browsing behavior characteristic data of the sample user may specifically refer to the description in step 102.

The sample content browsing behavior characteristic data comprises a historical browsing content sequence browsed by a sample user. In an embodiment, in order to embody the similarity between each content in the history browsing content sequence of the sample user, content identifiers (e.g., content numbers) corresponding to each content in the history browsing content of the sample user may be sorted according to a preset rule (e.g., browsing duration of each content), so as to obtain the history browsing content sequence of the sample user, that is, two contents whose browsing durations are closer to each other are higher in similarity to the browsing preference of the sample user.

204. The server trains a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data and a word vector corresponding to the sample content browsing behavior characteristic data.

In an embodiment, before the server trains the preset word vector model pair by using the sample user characteristic data and the sample content browsing behavior characteristic data, in order to make the content recommended for the user characteristics of different users more accurate, the sample user characteristic data and the sample content browsing behavior characteristic data may be fused, and the preset word vector model is trained based on the data to be trained obtained after the fusion. Specifically, the step "training a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data, and a word vector corresponding to the sample content browsing behavior characteristic data" may include:

For example, word representations (gender _ age) of the user feature data can be randomly embedded into a history browsing content sequence (e.g., content number sequence) to obtain a word sequence to be trained, for example, Sn contents exist in a preset content library, the word sequence to be trained can be (S2, S3, S5, male _40, S10), (S4, S5, female _20, S7, S12, S20, S33), each sample user has a word sequence to be trained, and thus a set of sequences to be trained of the sample users can be obtained.

The Skip-gram model is a word vector model for predicting the upper and lower words through the current word. For example, the input of the CBOW model is wi contexts wi-c, … … wi-2, wi-1, wi +2, … … wi + c of wi, and the output is wi, wherein the window size c of the context can be set according to the requirements of the actual application.

In an embodiment, the sample user feature data and the historical browsing content sequence are fused to obtain the sequence to be trained in various ways, for example, the fusion can be performed in a splicing way or a summing way, and specifically, the sample user feature data of the sample user and the sample historical browsing content sequence can be embedded into the historical browsing content sequence of the sample user in the splicing way or the summing way, so that the training effect is better, and the sample user feature data and the sample historical browsing content sequence can be embedded into the historical browsing content sequence of the sample user in the summing way. As shown in fig. 1d, after bitwise addition, the part of speech _ age (sex _ age) represented by the sample user feature data of the sample user and the sample history browsing content sequences w (t-2), w (t-1), w (t +1), and w (t +2) of the sample user are respectively subjected to concatenation or summation to predict a word vector corresponding to the word wt.

205. And the server generates word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data.

The server generates word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data, wherein the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data can be directly obtained from a preset storage space; or generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data through a trained word vector model.

206. And the server fuses the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data to obtain the target characteristic vector. For example, the following may be specifically mentioned:

the server acquires the heat weight of each content in a preset content library; and performing weighted fusion on the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data based on the heat weight of each content to obtain a target characteristic vector.

wherein, Vertor _ user is a target feature vector of a user, k is the total amount of contents browsed and clicked by the user, and Vertor_iAnd the Vertor _ profile is a word vector of the user characteristic data of the user.

207. And the server determines the target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content.

For example, in some embodiments, to improve the accuracy of content recommendation, the server may calculate the distance (i.e., similarity) between the target feature vector and the word vector of each candidate content in the content library to implement content recommendation. Specifically, an approximate nearest neighbor search algorithm such as a Locality-Sensitive Hashing algorithm (LSH) may be used to calculate the similarity, each candidate content in the preset content library is ranked according to the similarity between the target feature vector and the word vector of each candidate content, and the content ranked in the first several bits (i.e., the content closer to the first bit) is selected from the ranked candidate content as the target content to be recommended according to a preset rule (e.g., the content ranked in the first several bits is selected).

208. The server transmits the target content to the terminal.

For example, the server transmits the target content to the terminal so that the terminal displays the target content. Referring to fig. 2b, the contents of the electronic books, articles and the like displayed in the content recommendation columns such as "recommend" to you "," guess you like "under the" discovery "module and" article recommendation "under the" story "module are recommendation results for the user, wherein the displayed recommended contents can be switched and displayed by" changing one batch "through the button. Besides electronic books, recommended contents can be displayed in different types by triggering (e.g., clicking, sliding and the like) selection controls such as "novel", "book listening", "cartoon", and "public number".

In an embodiment, the target content to be recommended determined in step 207 may also be stored in the blockchain. The blockchain system may be a distributed system formed by connecting clients, a plurality of nodes (any form of computing devices in an access network, such as servers and user terminals) through a network communication mode. Referring To fig. 2c, fig. 2c is an optional structural diagram of the distributed system 100 applied To a blockchain system, which is formed by a plurality of nodes (any form of computing devices in an access network, such as servers and user terminals) and clients, and a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, where the P2P Protocol is an application layer Protocol running on top of a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

Referring to the functions of each node in the blockchain system shown in fig. 2c, the functions involved include:

1) routing, a basic function that a node has, is used to support communication between nodes.

Besides the routing function, the node may also have the following functions:

2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.

For example, the services implemented by the application include:

2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;

and 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.

2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

Referring to fig. 2d, fig. 2d is an optional schematic diagram of a Block Structure (Block Structure) according to an embodiment of the present invention, where each Block includes a hash value of a transaction record (hash value of the Block) stored in the Block and a hash value of a previous Block, and the blocks are connected by the hash value to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.

In order to better implement the method, the embodiment of the present application further provides a content recommendation apparatus, which may be integrated in a computer device, such as a server or the like.

For example, as shown in fig. 3a, the content recommendation apparatus may include a receiving unit 301, an obtaining unit 302, a generating unit 303, a fusing unit 304, a determining unit 305, and a transmitting unit 306, as follows:

a receiving unit 301, configured to receive a content browsing request sent by a terminal;

an obtaining unit 302, configured to obtain, according to the content browsing request, user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal;

a generating unit 303, configured to generate a word vector of the user feature data and a word vector corresponding to the content browsing behavior feature data;

a fusion unit 304, configured to fuse the word vector of the user feature data and the word vector corresponding to the content browsing behavior feature data to obtain a target feature vector;

a determining unit 305, configured to determine, according to the target feature vector and a word vector of candidate content, target content to be recommended from the candidate content;

a sending unit 306, configured to send the target content to the terminal.

In some embodiments, the content recommendation device further comprises: a training unit 307, the training unit comprising:

the characteristic obtaining subunit 3071 is configured to obtain sample user characteristic data of the sample user and sample content browsing behavior characteristic data;

the training subunit 3072 is configured to train a preset word vector model according to the sample user feature data and the sample content browsing behavior feature data, so as to obtain a trained word vector model, a word vector of the sample user feature data, and a word vector corresponding to the sample content browsing behavior feature data.

In some embodiments, the sample content browsing behavior feature data includes a historical browsing content sequence browsed by the sample user, and the training sub-unit 3072 may be specifically configured to:

In some embodiments, referring to fig. 3b, the fusion unit 304 comprises:

an obtaining subunit 3041, configured to obtain a heat weight of each content in the preset content library;

a fusion subunit 3042, configured to perform weighted fusion on the word vector of the user feature data and the word vector corresponding to the content browsing behavior feature data based on the heat weight of each content, to obtain a target feature vector.

In some embodiments, the fusion subunit 3042 may be specifically configured to:

In some embodiments, the obtaining subunit 3041 may be specifically configured to:

In some embodiments, the determining unit 305 is configured to:

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, the content recommendation apparatus according to the embodiment of the present application may receive, through the receiving unit 301, a content browsing request sent by a terminal; the obtaining unit 302 obtains the user characteristic data and the content browsing behavior characteristic data of the user corresponding to the terminal according to the content browsing request; generating a word vector of the user characteristic data and a word vector corresponding to the content browsing behavior characteristic data by a generating unit 303; the fusion unit 304 fuses the word vectors of the user feature data and the word vectors corresponding to the content browsing behavior feature data to obtain target feature vectors; determining, by the determining unit 305, target content to be recommended from the candidate content according to the target feature vector and the word vector of the candidate content; the target content is transmitted to the terminal by the transmitting unit 306. According to the scheme, the content browsing behavior characteristic data of the user and the user characteristic data of the user can be fused, so that the accuracy of content recommendation performed by the user is greatly improved after the user characteristic data is fused.

The embodiment of the present application further provides a computer device, as shown in fig. 4, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, specifically:

the computer device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 4 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby monitoring the computer device as a whole. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The computer device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 via a power management system, so that functions of managing charging, discharging, and power consumption are implemented via the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The computer device may also include an input unit 404, the input unit 404 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application programs stored in the memory 402, thereby implementing various functions as follows:

receiving a content browsing request sent by a terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data to obtain a target characteristic vector; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and sending the target content to the terminal.

The above operations can be referred to the previous embodiments specifically, and are not described herein.

As can be seen from the above, the computer device according to the embodiment of the present application may receive a content browsing request sent by a terminal; according to the content browsing request, acquiring user characteristic data and content browsing behavior characteristic data of a user corresponding to the terminal; generating word vectors of the user characteristic data and word vectors corresponding to the content browsing behavior characteristic data; fusing the word vectors of the user characteristic data and the word vectors corresponding to the content browsing behavior characteristic data to obtain target characteristic vectors; determining target content to be recommended from the candidate content according to the target characteristic vector and the word vector of the candidate content; and transmitting the target content to the terminal. According to the scheme, user characteristic data (such as gender and age) of a user is fused into content browsing behavior characteristic data of the user, the user characteristic data of a sample user and the content browsing behavior characteristic data (such as a historical browsing content sequence) are fused to be used as data to be trained, a word vector model is trained, word vectors of the user characteristic data and word vectors (including word vectors of each content) corresponding to the content browsing behavior characteristic data are generated, and finally the content is recommended to the user through the constructed target characteristic vector of the user and the word vector of each content in a content library; according to the scheme, the content browsing behavior characteristic data of the user and the user characteristic data of the user can be fused, so that the accuracy of content recommendation performed by the user is greatly improved after the user characteristic data is fused.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute the steps in any content recommendation method provided in the present application. For example, the computer program may perform the steps of:

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the computer-readable storage medium may execute the steps in any content recommendation method provided in the embodiments of the present application, beneficial effects that can be achieved by any content recommendation method provided in the embodiments of the present application may be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The content recommendation method, device, computer device and computer-readable storage medium provided in the embodiments of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A content recommendation method, comprising:

receiving a content browsing request sent by a terminal;

and sending the target content to the terminal.

2. The content recommendation method according to claim 1, wherein the fusing the word vector of the user feature data and the word vector corresponding to the content browsing behavior feature data to obtain a target feature vector comprises:

acquiring the heat weight of each content in a preset content library;

and performing weighted fusion on the word vector of the user characteristic data and the word vector corresponding to the content browsing behavior characteristic data based on the heat weight of each content to obtain a target characteristic vector.

3. The content recommendation method according to claim 2, wherein the performing weighted fusion on the word vector of the user feature data and the word vector corresponding to the content browsing behavior feature data based on the hotness weight of each content to obtain a target feature vector comprises:

4. The content recommendation method according to claim 2, wherein the obtaining of the heat weight of each content in the preset content library comprises:

5. The content recommendation method according to claim 1, further comprising:

training a preset word vector model according to the sample user characteristic data and the sample content browsing behavior characteristic data to obtain a trained word vector model, a word vector of the sample user characteristic data and a word vector corresponding to the sample content browsing behavior characteristic data.

6. The content recommendation method according to claim 5, wherein the sample content browsing behavior feature data includes a historical browsing content sequence browsed by the sample user, and the method for training a preset word vector model according to the sample user feature data and the sample content browsing behavior feature data to obtain a trained word vector model, a word vector of the sample user feature data, and a word vector corresponding to the sample content browsing behavior feature data includes:

7. The content recommendation method according to claim 1, wherein the determining target content to be recommended from candidate content according to the target feature vector and word vectors of candidate content comprises:

8. A content recommendation apparatus characterized by comprising:

and the sending unit is used for sending the target content to the terminal.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any of claims 1-7 are implemented when the program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1-7.