CN113961823B - News recommendation method, system, storage medium and equipment - Google Patents
News recommendation method, system, storage medium and equipment Download PDFInfo
- Publication number
- CN113961823B CN113961823B CN202111545769.8A CN202111545769A CN113961823B CN 113961823 B CN113961823 B CN 113961823B CN 202111545769 A CN202111545769 A CN 202111545769A CN 113961823 B CN113961823 B CN 113961823B
- Authority
- CN
- China
- Prior art keywords
- news
- target user
- texts
- clicked
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Abstract
The invention provides a news recommending method, a system, a storage medium and equipment, wherein the method comprises the following steps: screening by using a hot spot recall model and generating a first candidate list; screening by using a content recall model and generating a second candidate list; screening by using a collaborative filtering recall model and generating a third candidate list; combining the first candidate list, the second candidate list and the third candidate list into a news candidate set; respectively carrying out semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix; calculating the click probability of each news text in the news candidate set by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set; the news texts in the news recommendation set are sequentially recommended to the target user, and personalized news recommendation service for the target user can be realized.
Description
Technical Field
The present invention relates to the field of news recommendation, and in particular, to a method, a system, a storage medium, and a device for news recommendation.
Background
In the mobile internet era, people's reading requirements present characteristics of fragmentation and multiple scenes, and personalized news information recommendation gradually becomes the mainstream of the mobile information industry. In the face of the impact of information explosion, people are more and more accustomed to a shallow reading mode, that is, readers do not have an explicit reading target and tend to passively receive information push, the news is huge for readers, and users only want to watch interested topics and contents, so a news recommendation mechanism is needed to be provided to select and recommend the interesting news of the users to the users from the huge news.
The current news recommendation mechanism still has the following problems: because news languages are highly concentrated and consist of a large number of knowledge entities, and the traditional semantic model or topic model can only judge the relevance between words according to the co-occurrence or clustering structure of the words, the potential relation of knowledge levels between the words is difficult to find, so that the recommended news cannot meet the requirements of the knowledge level of users, and the accuracy of news recommendation cannot be ensured.
Disclosure of Invention
The invention aims to provide a news recommendation method, a news recommendation system, a storage medium and news recommendation equipment, and aims to solve the problems that the relation of potential knowledge levels between the news recommendation method and the news recommendation system is difficult to find, the recommended news is difficult to meet the requirements of a user on the knowledge level, and the accuracy of news recommendation cannot be guaranteed.
The invention provides a news recommendation method, which comprises the following steps:
screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list;
screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using a content recall model and generating a second candidate list;
screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model and generating a third candidate list;
combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on a knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix;
calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set;
and sequentially recommending the news texts in the news recommendation set to the target user.
The news recommendation method provided by the invention has the following beneficial effects:
the invention screens out a preset number of hot news texts by using a hot recall model and generates a first candidate list, screens out a preset number of news texts with high similarity to the news texts historically clicked by a target user by using a content recall model and generates a second candidate list, screens out a preset number of history clicked news texts of the user with the maximum similarity to the target user by using a collaborative filtering recall model and generates a third candidate list, combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set, the news candidate set is obtained from the massive news by using the three recall models, and news recall with three dimensions is performed on the basis of the hot news, the similar news and the news of similar users, so that the richness of the recalled news is increased, and the news recommendation requirement of a target user can be better met;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target user respectively based on a knowledge map to generate a semantic feature preference matrix and an entity feature preference matrix, namely performing knowledge level analysis processing on the titles of the news texts historically clicked by the target user, well reflecting the click preferences of the target user, namely obtaining the click preferences of the user according to the title analysis of the news texts historically clicked by the target user, calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and by using a ranking model, sorting all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set, and sequentially recommending the news texts in the news recommendation set to the target user, and according to the click preference of the target user, the news texts in the news candidate set are reordered to realize personalized news recommendation service for the target user, so that the click and browsing requirements of the target user can be met to the maximum extent, and the recommendation accuracy is provided.
In addition, the news recommendation method provided by the invention can also have the following additional technical characteristics:
further, the step of performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix includes:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
Further, the step of calculating the click probability of each news text in the news candidate set by using a ranking model based on the semantic feature preference matrix and the entity feature preference matrix comprises:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
Further, the step of screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using the collaborative filtering recall model and generating a third candidate list comprises the following steps:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
Further, the step of constructing an interest matrix between the target user and the news text and an interest matrix between the other users and the news text includes:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user;
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history;
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
further, the step of screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating a second candidate list comprises:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the historical clicked news text of the target user by using a TFIDF algorithm, and substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the historical clicked news text of the target user;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
Further, the step of screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list comprises:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
The invention also provides a news recommendation system, which comprises:
a first screening module: the hot spot recall model is used for screening out a preset number of hot spot news texts and generating a first candidate list;
a second screening module: the content recall model is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list;
a third screening module: the collaborative filtering recall model is used for screening out historical click news texts of a preset number of users with the maximum similarity to the target user and generating a third candidate list;
a first generation module: for combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
a second generation module: the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix;
a sorting module: the system comprises a semantic feature preference matrix, an entity feature preference matrix, a sorting model and a news recommendation set, wherein the semantic feature preference matrix and the entity feature preference matrix are used for calculating the click probability of each news text in the news candidate set by using the sorting model, and all the news texts in the news candidate set are sorted in a descending order according to the click probability to generate the news recommendation set;
a recommendation module: and the news texts in the news recommendation set are sequentially recommended to the target user.
The invention also proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the above-mentioned news recommendation method.
The invention also provides a news recommendation device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the news recommendation method.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a news recommendation method according to a first embodiment of the present invention;
FIG. 2 is a system block diagram of a news recommendation system according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a news recommendation device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
As shown in FIG. 1, an embodiment of the invention provides a news recommending method, which includes steps S101-S107.
S101, screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list.
The hotspot recall model can be Hacker News, and the step of screening out a preset number of hotspot News texts by applying the hotspot recall model and generating a first candidate list comprises the following steps of:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text, wherein the calculation formula is as follows:
wherein r is a popularity value of the news text, p is a total browsing amount of the news text, t is a time interval of releasing the news text, and G is a set value and can be 1.8 or 1.5;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
S102, screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model, and generating a second candidate list.
The step of screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list by using the content recall model comprises the following steps:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the news text historically clicked by the target user by using a TFIDF algorithm, substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the news text historically clicked by the target user, wherein the calculation formula is as follows:
TFIDF = TF IDF, where TF is the word frequency and IDF is the inverse file frequency;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
S103, screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model, and generating a third candidate list.
The step of screening out historical click news texts of users with the maximum similarity to the target user in a preset number by using the collaborative filtering recall model and generating a third candidate list comprises the following steps:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
the other users can be similar clustering users of the target user, and the users can be screened firstly through a similar clustering algorithm to obtain the other users in the embodiment of the invention, so that the screening range of the similarity users is further reduced.
Cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
Further, the step of constructing an interest matrix between the target user and the news text and an interest matrix between the other users and the news text includes:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
if the target user clicks a news text 3 times, and likes and collects the news text, wherein the cardinality of clicking behaviors can be set to 1, the cardinality of like behaviors can be set to 2, and the cardinality of collecting behaviors can be set to 3, the behavior data of the target user on the news text is 3+2+3= 8;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user;
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history;
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
s104, combining the first candidate list, the second candidate list and the third candidate list to generate a news candidate set.
And S105, respectively carrying out semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix.
The steps of respectively performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix comprise:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the heading word sequence according to the position of each word in the heading word sequence by using a first word vector model, generating semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain a semantic feature preference matrix, wherein the first word vector model can be a word2vec model;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of each news text historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of each news text historically clicked by the target user to obtain an entity feature preference matrix, wherein the second word vector model can be a TranSE model.
The above steps are performed according to a knowledge graph, the knowledge graph is represented as a triple set of shapes (head entity, relationship, tail entity), wherein the head entity and the tail entity are entities in news, and the entity feature vector is: the position of an entity in the header word sequence is represented as 1, the position of a non-entity is represented as 0, and the semantic feature vector is as follows: the position of each word (including a head entity, a relation and a tail entity) in the title word sequence is represented as 1, the relevance between the words is analyzed by a knowledge map-based method, the potential knowledge level relation between the words can be found, the news requirement of the knowledge level of a user can be further met, and the accuracy of news recommendation is guaranteed.
S106, calculating the click probability of each news text in the news candidate set by using a sequencing model based on the semantic feature preference matrix and the entity feature preference matrix, and sequencing all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set.
The step of calculating the click probability of each news text in the news candidate set by using a ranking model based on the semantic feature preference matrix and the entity feature preference matrix comprises the following steps:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
Specifically, the ranking model may be DKN model, and the DKN model is a knowledge graph-based recommendation model, which includes two parts, namely KCNN network and Attention network. The KCNN is a multi-channel knowledge-aware convolutional neural network with word-entity alignment, and integrates the expression of a semantic level and a knowledge level of news, namely a semantic feature preference matrix and an entity feature preference matrix, the KCNN treats words and entities as a plurality of channels and explicitly maintains the alignment relationship between the words and the entities in the convolution process; the Attention network is used for distributing different weights to each word of a news text title clicked by a user history so as to reduce the influence of irrelevant subjects in the browsing history of the user, and calculating the click probability of the user on each news text in the news candidate set according to the weight of each word of the news text title clicked by the user history, so that the model can better link the news of the same category; and sequencing the candidate news sets according to the trained DKN model, and acquiring N news with the highest scores for recommendation.
S107, sequentially recommending the news texts in the news recommendation set to the target user.
In summary, the news recommendation method provided by the invention has the beneficial effects that: the invention screens out a preset number of hot news texts by using a hot recall model and generates a first candidate list, screens out a preset number of news texts with high similarity to the news texts historically clicked by a target user by using a content recall model and generates a second candidate list, screens out a preset number of history clicked news texts of the user with the maximum similarity to the target user by using a collaborative filtering recall model and generates a third candidate list, combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set, the news candidate set is obtained from the massive news by using the three recall models, and news recall with three dimensions is performed on the basis of the hot news, the similar news and the news of similar users, so that the richness of the recalled news is increased, and the news recommendation requirement of a target user can be better met;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target user respectively based on a knowledge map to generate a semantic feature preference matrix and an entity feature preference matrix, namely performing knowledge level analysis processing on the titles of the news texts historically clicked by the target user, well reflecting the click preferences of the target user, namely obtaining the click preferences of the user according to the title analysis of the news texts historically clicked by the target user, calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and by using a ranking model, sorting all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set, and sequentially recommending the news texts in the news recommendation set to the target user, and according to the click preference of the target user, the news texts in the news candidate set are reordered to realize personalized news recommendation service for the target user, so that the click and browsing requirements of the target user can be met to the maximum extent, and the recommendation accuracy is provided.
Example 2
Referring to fig. 2, the present embodiment provides a news recommendation system, including:
a first screening module: the method is used for screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list.
Wherein the first screening module is further configured to:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text, wherein the calculation formula is as follows:
wherein r is a popularity value of the news text, p is a total browsing amount of the news text, t is a time interval of releasing the news text, and G is a set value and can be 1.8 or 1.5;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
A second screening module: and the method is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating a second candidate list.
Wherein the second screening module is further configured to:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the news text historically clicked by the target user by using a TFIDF algorithm, substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the news text historically clicked by the target user, wherein the calculation formula is as follows:
TFIDF = TF IDF, where TF is the word frequency and IDF is the inverse file frequency;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
A third screening module: and the third candidate list is generated by screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using the collaborative filtering recall model.
Wherein the third screening module is further configured to:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
The third screening module is further configured to:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
i pieces clicked according to the history of the target userGeneration of target user interest matrix from behavior data of news text;
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history;
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
a first generation module: for combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set.
A second generation module: and the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix.
Wherein the second generating module is further configured to:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the heading word sequence according to the position of each word in the heading word sequence by using a first word vector model, generating semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain a semantic feature preference matrix, wherein the first word vector model can be a word2vec model;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of each news text historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of each news text historically clicked by the target user to obtain an entity feature preference matrix, wherein the second word vector model can be a TranSE model.
A sorting module: and the system is used for calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set.
Wherein the ranking module is further configured to:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
A recommendation module: and the news texts in the news recommendation set are sequentially recommended to the target user.
Example 3
Referring to fig. 3, the present invention further provides a news recommendation apparatus, which is a news recommendation apparatus according to a third embodiment of the present invention and includes a memory 20, a processor 10, and a computer program 30 stored in the memory and running on the processor, wherein the processor 10 implements the news recommendation method as described above when executing the computer program 30.
The news recommendation device may specifically be a computer, a server, an upper computer, and the like, and the processor 10 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or another data Processing chip in some embodiments, and is configured to run a program code stored in the memory 20 or process data, for example, execute an access restriction program.
The memory 20 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 20 may in some embodiments be an internal storage unit of the news recommendation device, for example a hard disk of the news recommendation device. The memory 20 may also be an external storage device of the news recommender, such as a plug-in hard disk provided on the news recommender, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. in other embodiments, the memory 20 may be a memory Card provided on the news recommender. Further, the memory 20 may also include both an internal storage unit of the news recommendation apparatus and an external storage device. The memory 20 may be used not only to store application software installed in the news recommending apparatus and various kinds of data, but also to temporarily store data that has been output or will be output.
It should be noted that the configuration shown in fig. 3 does not constitute a limitation of the news recommender, but in other embodiments the news recommender may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the news recommendation method as described above.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (9)
1. A news recommendation method, the method comprising:
screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list;
screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using a content recall model and generating a second candidate list;
screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model and generating a third candidate list;
combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on a knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix;
calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set;
sequentially recommending the news texts in the news recommendation set to the target user;
the steps of respectively performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix comprise:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
2. The method of claim 1, wherein the step of calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and using a ranking model comprises:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
3. The news recommendation method according to claim 1, wherein the step of filtering out historical click news texts of a preset number of users with the highest similarity to the target user by using a collaborative filtering recall model and generating a third candidate list comprises:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
4. The news recommendation method of claim 3, wherein the step of constructing interest matrices between the target user and the news text and between the other users and the news text comprises:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user;
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history;
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
5. the news recommendation method of claim 1, wherein the step of filtering out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating the second candidate list comprises:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the historical clicked news text of the target user by using a TFIDF algorithm, and substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the historical clicked news text of the target user;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
6. The news recommendation method of claim 1, wherein the step of filtering out a preset number of hot news texts using a hot recall model and generating a first candidate list comprises:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
7. A news recommendation system, comprising:
a first screening module: the hot spot recall model is used for screening out a preset number of hot spot news texts and generating a first candidate list;
a second screening module: the content recall model is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list;
a third screening module: the collaborative filtering recall model is used for screening out historical click news texts of a preset number of users with the maximum similarity to the target user and generating a third candidate list;
a first generation module: for combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
a second generation module: the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix;
a sorting module: the system comprises a semantic feature preference matrix, an entity feature preference matrix, a sorting model and a news recommendation set, wherein the semantic feature preference matrix and the entity feature preference matrix are used for calculating the click probability of each news text in the news candidate set by using the sorting model, and all the news texts in the news candidate set are sorted in a descending order according to the click probability to generate the news recommendation set;
a recommendation module: the news text recommendation system is used for sequentially recommending the news texts in the news recommendation set to the target user;
wherein the second generating module is further configured to:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the news recommendation method as claimed in any one of claims 1-6.
9. A news recommender comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the news recommendation method as claimed in any one of claims 1 to 6 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111545769.8A CN113961823B (en) | 2021-12-17 | 2021-12-17 | News recommendation method, system, storage medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111545769.8A CN113961823B (en) | 2021-12-17 | 2021-12-17 | News recommendation method, system, storage medium and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113961823A CN113961823A (en) | 2022-01-21 |
CN113961823B true CN113961823B (en) | 2022-03-25 |
Family
ID=79473343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111545769.8A Active CN113961823B (en) | 2021-12-17 | 2021-12-17 | News recommendation method, system, storage medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113961823B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114969486B (en) * | 2022-08-02 | 2022-11-04 | 平安科技(深圳)有限公司 | Corpus recommendation method, apparatus, device and storage medium |
CN116911304B (en) * | 2023-09-12 | 2024-02-20 | 深圳须弥云图空间科技有限公司 | Text recommendation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165350A (en) * | 2018-08-23 | 2019-01-08 | 成都品果科技有限公司 | A kind of information recommendation method and system based on deep knowledge perception |
CN111831924A (en) * | 2020-07-16 | 2020-10-27 | 腾讯科技(北京)有限公司 | Content recommendation method, device, equipment and readable storage medium |
CN112328879A (en) * | 2020-11-05 | 2021-02-05 | 中国平安人寿保险股份有限公司 | News recommendation method and device, terminal equipment and storage medium |
CN112417313A (en) * | 2020-11-24 | 2021-02-26 | 云南大学 | Model hybrid recommendation method based on knowledge graph convolutional network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951435B (en) * | 2017-02-08 | 2020-05-22 | 广州神马移动信息科技有限公司 | News recommendation method and equipment and programmable equipment |
FR3087921A1 (en) * | 2018-10-31 | 2020-05-01 | Amadeus S.A.S. | RECOMMENDED SYSTEMS AND METHODS USING AUTOMATIC CASCADE LEARNING MODELS |
US20200175084A1 (en) * | 2018-11-30 | 2020-06-04 | Microsoft Technology Licensing, Llc | Incorporating contextual information in large-scale personalized follow recommendations |
-
2021
- 2021-12-17 CN CN202111545769.8A patent/CN113961823B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109165350A (en) * | 2018-08-23 | 2019-01-08 | 成都品果科技有限公司 | A kind of information recommendation method and system based on deep knowledge perception |
CN111831924A (en) * | 2020-07-16 | 2020-10-27 | 腾讯科技(北京)有限公司 | Content recommendation method, device, equipment and readable storage medium |
CN112328879A (en) * | 2020-11-05 | 2021-02-05 | 中国平安人寿保险股份有限公司 | News recommendation method and device, terminal equipment and storage medium |
CN112417313A (en) * | 2020-11-24 | 2021-02-26 | 云南大学 | Model hybrid recommendation method based on knowledge graph convolutional network |
Non-Patent Citations (2)
Title |
---|
"Fine-Grained Deep Knowledge-Aware Network for News Recommendation with Self-Attention";Jie Gao等;《2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)》;20181206;第81-88页 * |
"基于系统动力学的资讯个性化推荐研究";王子岩等;《河北科技大学学报》;20210303;第171-179页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113961823A (en) | 2022-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444428B (en) | Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium | |
JP5281405B2 (en) | Selecting high-quality reviews for display | |
US8751511B2 (en) | Ranking of search results based on microblog data | |
Shi et al. | Learning-to-rank for real-time high-precision hashtag recommendation for streaming news | |
CN113961823B (en) | News recommendation method, system, storage medium and equipment | |
Gkotsis et al. | It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features | |
TW201033823A (en) | Systems and methods for analyzing electronic text | |
Hensinger et al. | Modelling and predicting news popularity | |
CN111177538A (en) | Unsupervised weight calculation-based user interest tag construction method | |
CN109325146A (en) | A kind of video recommendation method, device, storage medium and server | |
CN110413738A (en) | A kind of information processing method, device, server and storage medium | |
CN111506831A (en) | Collaborative filtering recommendation module and method, electronic device and storage medium | |
Lv et al. | FeRe: Exploiting influence of multi-dimensional features resided in news domain for recommendation | |
Kacem et al. | Time-sensitive user profile for optimizing search personlization | |
Yao et al. | Version-aware rating prediction for mobile app recommendation | |
Hai et al. | Coarse-to-fine review selection via supervised joint aspect and sentiment model | |
Pavlov et al. | Collaborative filtering with maximum entropy | |
Choi et al. | Finding informative comments for video viewing | |
Wei et al. | Online education recommendation model based on user behavior data analysis | |
Roy et al. | A tag2vec approach for questions tag suggestion on community question answering sites | |
CN115510326A (en) | Internet forum user interest recommendation algorithm based on text features and emotional tendency | |
CN111310016B (en) | Label mining method, device, server and storage medium | |
Piao et al. | Recommender system architecture based on mahout and a main memory database | |
CN114547435A (en) | Content quality identification method, device, equipment and readable storage medium | |
Santosa et al. | S3PaR: Section-Based Sequential Scientific Paper Recommendation for Paper Writing Assistance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |