CN113961823B - News recommendation method, system, storage medium and equipment - Google Patents

News recommendation method, system, storage medium and equipment Download PDF

Info

Publication number
CN113961823B
CN113961823B CN202111545769.8A CN202111545769A CN113961823B CN 113961823 B CN113961823 B CN 113961823B CN 202111545769 A CN202111545769 A CN 202111545769A CN 113961823 B CN113961823 B CN 113961823B
Authority
CN
China
Prior art keywords
news
target user
texts
clicked
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111545769.8A
Other languages
Chinese (zh)
Other versions
CN113961823A (en
Inventor
陶清华
张恒星
刘丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Zoneyet Technology Co ltd
Jiangxi Zhongye Intelligent Technology Co ltd
Original Assignee
Zhengzhou Zoneyet Technology Co ltd
Jiangxi Zhongye Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Zoneyet Technology Co ltd, Jiangxi Zhongye Intelligent Technology Co ltd filed Critical Zhengzhou Zoneyet Technology Co ltd
Priority to CN202111545769.8A priority Critical patent/CN113961823B/en
Publication of CN113961823A publication Critical patent/CN113961823A/en
Application granted granted Critical
Publication of CN113961823B publication Critical patent/CN113961823B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention provides a news recommending method, a system, a storage medium and equipment, wherein the method comprises the following steps: screening by using a hot spot recall model and generating a first candidate list; screening by using a content recall model and generating a second candidate list; screening by using a collaborative filtering recall model and generating a third candidate list; combining the first candidate list, the second candidate list and the third candidate list into a news candidate set; respectively carrying out semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix; calculating the click probability of each news text in the news candidate set by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set; the news texts in the news recommendation set are sequentially recommended to the target user, and personalized news recommendation service for the target user can be realized.

Description

News recommendation method, system, storage medium and equipment
Technical Field
The present invention relates to the field of news recommendation, and in particular, to a method, a system, a storage medium, and a device for news recommendation.
Background
In the mobile internet era, people's reading requirements present characteristics of fragmentation and multiple scenes, and personalized news information recommendation gradually becomes the mainstream of the mobile information industry. In the face of the impact of information explosion, people are more and more accustomed to a shallow reading mode, that is, readers do not have an explicit reading target and tend to passively receive information push, the news is huge for readers, and users only want to watch interested topics and contents, so a news recommendation mechanism is needed to be provided to select and recommend the interesting news of the users to the users from the huge news.
The current news recommendation mechanism still has the following problems: because news languages are highly concentrated and consist of a large number of knowledge entities, and the traditional semantic model or topic model can only judge the relevance between words according to the co-occurrence or clustering structure of the words, the potential relation of knowledge levels between the words is difficult to find, so that the recommended news cannot meet the requirements of the knowledge level of users, and the accuracy of news recommendation cannot be ensured.
Disclosure of Invention
The invention aims to provide a news recommendation method, a news recommendation system, a storage medium and news recommendation equipment, and aims to solve the problems that the relation of potential knowledge levels between the news recommendation method and the news recommendation system is difficult to find, the recommended news is difficult to meet the requirements of a user on the knowledge level, and the accuracy of news recommendation cannot be guaranteed.
The invention provides a news recommendation method, which comprises the following steps:
screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list;
screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using a content recall model and generating a second candidate list;
screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model and generating a third candidate list;
combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on a knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix;
calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set;
and sequentially recommending the news texts in the news recommendation set to the target user.
The news recommendation method provided by the invention has the following beneficial effects:
the invention screens out a preset number of hot news texts by using a hot recall model and generates a first candidate list, screens out a preset number of news texts with high similarity to the news texts historically clicked by a target user by using a content recall model and generates a second candidate list, screens out a preset number of history clicked news texts of the user with the maximum similarity to the target user by using a collaborative filtering recall model and generates a third candidate list, combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set, the news candidate set is obtained from the massive news by using the three recall models, and news recall with three dimensions is performed on the basis of the hot news, the similar news and the news of similar users, so that the richness of the recalled news is increased, and the news recommendation requirement of a target user can be better met;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target user respectively based on a knowledge map to generate a semantic feature preference matrix and an entity feature preference matrix, namely performing knowledge level analysis processing on the titles of the news texts historically clicked by the target user, well reflecting the click preferences of the target user, namely obtaining the click preferences of the user according to the title analysis of the news texts historically clicked by the target user, calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and by using a ranking model, sorting all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set, and sequentially recommending the news texts in the news recommendation set to the target user, and according to the click preference of the target user, the news texts in the news candidate set are reordered to realize personalized news recommendation service for the target user, so that the click and browsing requirements of the target user can be met to the maximum extent, and the recommendation accuracy is provided.
In addition, the news recommendation method provided by the invention can also have the following additional technical characteristics:
further, the step of performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix includes:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
Further, the step of calculating the click probability of each news text in the news candidate set by using a ranking model based on the semantic feature preference matrix and the entity feature preference matrix comprises:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
Further, the step of screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using the collaborative filtering recall model and generating a third candidate list comprises the following steps:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
Further, the step of constructing an interest matrix between the target user and the news text and an interest matrix between the other users and the news text includes:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user
Figure 928826DEST_PATH_IMAGE001
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history
Figure 103456DEST_PATH_IMAGE002
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
Figure 828835DEST_PATH_IMAGE003
further, the step of screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating a second candidate list comprises:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the historical clicked news text of the target user by using a TFIDF algorithm, and substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the historical clicked news text of the target user;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
Further, the step of screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list comprises:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
The invention also provides a news recommendation system, which comprises:
a first screening module: the hot spot recall model is used for screening out a preset number of hot spot news texts and generating a first candidate list;
a second screening module: the content recall model is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list;
a third screening module: the collaborative filtering recall model is used for screening out historical click news texts of a preset number of users with the maximum similarity to the target user and generating a third candidate list;
a first generation module: for combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
a second generation module: the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix;
a sorting module: the system comprises a semantic feature preference matrix, an entity feature preference matrix, a sorting model and a news recommendation set, wherein the semantic feature preference matrix and the entity feature preference matrix are used for calculating the click probability of each news text in the news candidate set by using the sorting model, and all the news texts in the news candidate set are sorted in a descending order according to the click probability to generate the news recommendation set;
a recommendation module: and the news texts in the news recommendation set are sequentially recommended to the target user.
The invention also proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the above-mentioned news recommendation method.
The invention also provides a news recommendation device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the news recommendation method.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a news recommendation method according to a first embodiment of the present invention;
FIG. 2 is a system block diagram of a news recommendation system according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a news recommendation device according to a third embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
As shown in FIG. 1, an embodiment of the invention provides a news recommending method, which includes steps S101-S107.
S101, screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list.
The hotspot recall model can be Hacker News, and the step of screening out a preset number of hotspot News texts by applying the hotspot recall model and generating a first candidate list comprises the following steps of:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text, wherein the calculation formula is as follows:
Figure 823336DEST_PATH_IMAGE004
wherein r is a popularity value of the news text, p is a total browsing amount of the news text, t is a time interval of releasing the news text, and G is a set value and can be 1.8 or 1.5;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
S102, screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model, and generating a second candidate list.
The step of screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list by using the content recall model comprises the following steps:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the news text historically clicked by the target user by using a TFIDF algorithm, substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the news text historically clicked by the target user, wherein the calculation formula is as follows:
TFIDF = TF IDF, where TF is the word frequency and IDF is the inverse file frequency;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
S103, screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model, and generating a third candidate list.
The step of screening out historical click news texts of users with the maximum similarity to the target user in a preset number by using the collaborative filtering recall model and generating a third candidate list comprises the following steps:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
the other users can be similar clustering users of the target user, and the users can be screened firstly through a similar clustering algorithm to obtain the other users in the embodiment of the invention, so that the screening range of the similarity users is further reduced.
Cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
Further, the step of constructing an interest matrix between the target user and the news text and an interest matrix between the other users and the news text includes:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
if the target user clicks a news text 3 times, and likes and collects the news text, wherein the cardinality of clicking behaviors can be set to 1, the cardinality of like behaviors can be set to 2, and the cardinality of collecting behaviors can be set to 3, the behavior data of the target user on the news text is 3+2+3= 8;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user
Figure 167730DEST_PATH_IMAGE001
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history
Figure 931286DEST_PATH_IMAGE005
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
Figure 843879DEST_PATH_IMAGE003
s104, combining the first candidate list, the second candidate list and the third candidate list to generate a news candidate set.
And S105, respectively carrying out semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix.
The steps of respectively performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix comprise:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the heading word sequence according to the position of each word in the heading word sequence by using a first word vector model, generating semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain a semantic feature preference matrix, wherein the first word vector model can be a word2vec model;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of each news text historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of each news text historically clicked by the target user to obtain an entity feature preference matrix, wherein the second word vector model can be a TranSE model.
The above steps are performed according to a knowledge graph, the knowledge graph is represented as a triple set of shapes (head entity, relationship, tail entity), wherein the head entity and the tail entity are entities in news, and the entity feature vector is: the position of an entity in the header word sequence is represented as 1, the position of a non-entity is represented as 0, and the semantic feature vector is as follows: the position of each word (including a head entity, a relation and a tail entity) in the title word sequence is represented as 1, the relevance between the words is analyzed by a knowledge map-based method, the potential knowledge level relation between the words can be found, the news requirement of the knowledge level of a user can be further met, and the accuracy of news recommendation is guaranteed.
S106, calculating the click probability of each news text in the news candidate set by using a sequencing model based on the semantic feature preference matrix and the entity feature preference matrix, and sequencing all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set.
The step of calculating the click probability of each news text in the news candidate set by using a ranking model based on the semantic feature preference matrix and the entity feature preference matrix comprises the following steps:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
Specifically, the ranking model may be DKN model, and the DKN model is a knowledge graph-based recommendation model, which includes two parts, namely KCNN network and Attention network. The KCNN is a multi-channel knowledge-aware convolutional neural network with word-entity alignment, and integrates the expression of a semantic level and a knowledge level of news, namely a semantic feature preference matrix and an entity feature preference matrix, the KCNN treats words and entities as a plurality of channels and explicitly maintains the alignment relationship between the words and the entities in the convolution process; the Attention network is used for distributing different weights to each word of a news text title clicked by a user history so as to reduce the influence of irrelevant subjects in the browsing history of the user, and calculating the click probability of the user on each news text in the news candidate set according to the weight of each word of the news text title clicked by the user history, so that the model can better link the news of the same category; and sequencing the candidate news sets according to the trained DKN model, and acquiring N news with the highest scores for recommendation.
S107, sequentially recommending the news texts in the news recommendation set to the target user.
In summary, the news recommendation method provided by the invention has the beneficial effects that: the invention screens out a preset number of hot news texts by using a hot recall model and generates a first candidate list, screens out a preset number of news texts with high similarity to the news texts historically clicked by a target user by using a content recall model and generates a second candidate list, screens out a preset number of history clicked news texts of the user with the maximum similarity to the target user by using a collaborative filtering recall model and generates a third candidate list, combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set, the news candidate set is obtained from the massive news by using the three recall models, and news recall with three dimensions is performed on the basis of the hot news, the similar news and the news of similar users, so that the richness of the recalled news is increased, and the news recommendation requirement of a target user can be better met;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target user respectively based on a knowledge map to generate a semantic feature preference matrix and an entity feature preference matrix, namely performing knowledge level analysis processing on the titles of the news texts historically clicked by the target user, well reflecting the click preferences of the target user, namely obtaining the click preferences of the user according to the title analysis of the news texts historically clicked by the target user, calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and by using a ranking model, sorting all the news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set, and sequentially recommending the news texts in the news recommendation set to the target user, and according to the click preference of the target user, the news texts in the news candidate set are reordered to realize personalized news recommendation service for the target user, so that the click and browsing requirements of the target user can be met to the maximum extent, and the recommendation accuracy is provided.
Example 2
Referring to fig. 2, the present embodiment provides a news recommendation system, including:
a first screening module: the method is used for screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list.
Wherein the first screening module is further configured to:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text, wherein the calculation formula is as follows:
Figure 325676DEST_PATH_IMAGE004
wherein r is a popularity value of the news text, p is a total browsing amount of the news text, t is a time interval of releasing the news text, and G is a set value and can be 1.8 or 1.5;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
A second screening module: and the method is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating a second candidate list.
Wherein the second screening module is further configured to:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the news text historically clicked by the target user by using a TFIDF algorithm, substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the news text historically clicked by the target user, wherein the calculation formula is as follows:
TFIDF = TF IDF, where TF is the word frequency and IDF is the inverse file frequency;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
A third screening module: and the third candidate list is generated by screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using the collaborative filtering recall model.
Wherein the third screening module is further configured to:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
The third screening module is further configured to:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
i pieces clicked according to the history of the target userGeneration of target user interest matrix from behavior data of news text
Figure 473760DEST_PATH_IMAGE001
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history
Figure 216457DEST_PATH_IMAGE005
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
Figure 424585DEST_PATH_IMAGE003
a first generation module: for combining the first candidate list, the second candidate list, and the third candidate list and generating a news candidate set.
A second generation module: and the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix.
Wherein the second generating module is further configured to:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the heading word sequence according to the position of each word in the heading word sequence by using a first word vector model, generating semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the semantic feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain a semantic feature preference matrix, wherein the first word vector model can be a word2vec model;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of each news text historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of each news text historically clicked by the target user to obtain an entity feature preference matrix, wherein the second word vector model can be a TranSE model.
A sorting module: and the system is used for calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set.
Wherein the ranking module is further configured to:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
A recommendation module: and the news texts in the news recommendation set are sequentially recommended to the target user.
Example 3
Referring to fig. 3, the present invention further provides a news recommendation apparatus, which is a news recommendation apparatus according to a third embodiment of the present invention and includes a memory 20, a processor 10, and a computer program 30 stored in the memory and running on the processor, wherein the processor 10 implements the news recommendation method as described above when executing the computer program 30.
The news recommendation device may specifically be a computer, a server, an upper computer, and the like, and the processor 10 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or another data Processing chip in some embodiments, and is configured to run a program code stored in the memory 20 or process data, for example, execute an access restriction program.
The memory 20 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 20 may in some embodiments be an internal storage unit of the news recommendation device, for example a hard disk of the news recommendation device. The memory 20 may also be an external storage device of the news recommender, such as a plug-in hard disk provided on the news recommender, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. in other embodiments, the memory 20 may be a memory Card provided on the news recommender. Further, the memory 20 may also include both an internal storage unit of the news recommendation apparatus and an external storage device. The memory 20 may be used not only to store application software installed in the news recommending apparatus and various kinds of data, but also to temporarily store data that has been output or will be output.
It should be noted that the configuration shown in fig. 3 does not constitute a limitation of the news recommender, but in other embodiments the news recommender may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the news recommendation method as described above.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A news recommendation method, the method comprising:
screening out a preset number of hot news texts by using a hot recall model and generating a first candidate list;
screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using a content recall model and generating a second candidate list;
screening out historical click news texts of a preset number of users with the maximum similarity to the target user by using a collaborative filtering recall model and generating a third candidate list;
combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users respectively based on a knowledge graph, and generating a semantic feature preference matrix and an entity feature preference matrix;
calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix by using a sequencing model, and sequencing all news texts in the news candidate set in a descending order according to the click probability to generate a news recommendation set;
sequentially recommending the news texts in the news recommendation set to the target user;
the steps of respectively performing semantic and entity vectorization processing on the titles of the news texts historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix comprise:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
2. The method of claim 1, wherein the step of calculating the click probability of each news text in the news candidate set based on the semantic feature preference matrix and the entity feature preference matrix and using a ranking model comprises:
inputting the semantic feature preference matrix, the entity feature preference matrix and the news candidate set into the ranking model;
splicing and fusing the semantic feature preference matrix and the entity feature preference matrix by using the sequencing model to generate a user preference matrix;
and calculating the click probability of each news text in the news candidate set according to the user preference matrix.
3. The news recommendation method according to claim 1, wherein the step of filtering out historical click news texts of a preset number of users with the highest similarity to the target user by using a collaborative filtering recall model and generating a third candidate list comprises:
constructing an interest matrix between a target user and a news text and an interest matrix between other users and the news text, and respectively setting the interest matrices as a target user interest matrix and other user interest matrices;
cosine computing the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity algorithm to obtain the similarity between the target user and the other users, and screening out the user with the maximum similarity;
and acquiring the historical clicked news texts with the preset number of users with the maximum similarity according to timeliness and generating a third candidate list.
4. The news recommendation method of claim 3, wherein the step of constructing interest matrices between the target user and the news text and between the other users and the news text comprises:
acquiring behavior data of i news texts historically clicked by the target user according to a time sequence, wherein the behavior data comprises click times, praise and collection;
generating a target user interest matrix according to the behavior data of the i news texts which are historically clicked by the target user
Figure 472053DEST_PATH_IMAGE001
Acquiring behavior data of i news texts which are historically clicked by any other user;
generating other user interest matrixes according to behavior data of i news texts clicked by other users in history
Figure 517370DEST_PATH_IMAGE002
The step of performing cosine computation on the target user interest matrix and the other user interest matrices by using a cosine similarity algorithm to obtain the similarity between the target user and the other users comprises the following steps:
cosine calculating the interest matrix of the target user and the interest matrixes of the other users by using a cosine similarity calculation method to obtain the similarity between the target user and the other users, wherein the calculation formula is as follows:
Figure 240475DEST_PATH_IMAGE003
5. the news recommendation method of claim 1, wherein the step of filtering out a preset number of news texts with high similarity to the news texts historically clicked by the target user by using the content recall model and generating the second candidate list comprises:
acquiring a news text historically clicked by the target user and all news texts in a news library;
segmenting the news text historically clicked by the target user according to the part of speech to obtain all words of the news text historically clicked by the target user and generate a text word sequence;
calculating TFIDF values of all words of the historical clicked news text of the target user by using a TFIDF algorithm, and substituting the TFIDF values of all the words into the text word sequence to obtain a TFIDF value vector of the historical clicked news text of the target user;
sequentially calculating the similarity between the TFIDF value vector of the news text historically clicked by the target user and the TFIDF value vector of each news text in the news library by using a cosine similarity algorithm to obtain the similarity between the news text historically clicked by the target user and each news text in the news library;
sorting all news texts in the news library in a descending order according to the similarity, and screening out a plurality of news texts with high similarity with the news texts historically clicked by the target user;
and respectively and sequentially extracting the news texts with high similarity with each news text historically clicked by the target user within preset time in equal quantity, and combining to generate the second candidate list.
6. The news recommendation method of claim 1, wherein the step of filtering out a preset number of hot news texts using a hot recall model and generating a first candidate list comprises:
acquiring index data of all news texts in a news library, wherein the index data comprises browsing amount and release time;
calculating the popularity of each news text by using a news ranking algorithm according to the index data of each news text to obtain the popularity of each news text;
and according to the popularity value of each news text, performing descending order arrangement on all news texts in the news library, and screening out a preset number of news texts with popularity values arranged in front to generate the first candidate list.
7. A news recommendation system, comprising:
a first screening module: the hot spot recall model is used for screening out a preset number of hot spot news texts and generating a first candidate list;
a second screening module: the content recall model is used for screening out a preset number of news texts with high similarity to the news texts historically clicked by the target user and generating a second candidate list;
a third screening module: the collaborative filtering recall model is used for screening out historical click news texts of a preset number of users with the maximum similarity to the target user and generating a third candidate list;
a first generation module: for combining the first candidate list, the second candidate list and the third candidate list and generating a news candidate set;
a second generation module: the system is used for respectively carrying out semantic and entity vectorization processing on the titles of the news texts which are historically clicked by the target users based on the knowledge graph and generating a semantic feature preference matrix and an entity feature preference matrix;
a sorting module: the system comprises a semantic feature preference matrix, an entity feature preference matrix, a sorting model and a news recommendation set, wherein the semantic feature preference matrix and the entity feature preference matrix are used for calculating the click probability of each news text in the news candidate set by using the sorting model, and all the news texts in the news candidate set are sorted in a descending order according to the click probability to generate the news recommendation set;
a recommendation module: the news text recommendation system is used for sequentially recommending the news texts in the news recommendation set to the target user;
wherein the second generating module is further configured to:
acquiring N pieces of news texts historically clicked by the target user according to a time sequence, and aligning the titles of each piece of news text historically clicked by the target user according to a preset length;
segmenting the aligned headlines of each news text clicked by the target user history to obtain all the words of the headlines of each news text clicked by the target user history, and forming a headline word sequence by all the words of the headlines of each news text clicked by the target user history;
performing feature vectorization on each word in the title word sequence according to the position of each word in the title word sequence by using a first word vector model, generating semantic feature vectors corresponding to the title of each news text historically clicked by the target user, and combining the semantic feature vectors corresponding to the title of each news text historically clicked by the target user to obtain a semantic feature preference matrix;
and performing feature vectorization on each entity in the heading word sequence according to the position of each entity in the heading word sequence by using a second word vector model, generating entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user, and combining the entity feature vectors corresponding to the titles of all the news texts historically clicked by the target user to obtain an entity feature preference matrix.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the news recommendation method as claimed in any one of claims 1-6.
9. A news recommender comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the news recommendation method as claimed in any one of claims 1 to 6 when executing the program.
CN202111545769.8A 2021-12-17 2021-12-17 News recommendation method, system, storage medium and equipment Active CN113961823B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111545769.8A CN113961823B (en) 2021-12-17 2021-12-17 News recommendation method, system, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111545769.8A CN113961823B (en) 2021-12-17 2021-12-17 News recommendation method, system, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113961823A CN113961823A (en) 2022-01-21
CN113961823B true CN113961823B (en) 2022-03-25

Family

ID=79473343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111545769.8A Active CN113961823B (en) 2021-12-17 2021-12-17 News recommendation method, system, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113961823B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114969486B (en) * 2022-08-02 2022-11-04 平安科技(深圳)有限公司 Corpus recommendation method, apparatus, device and storage medium
CN116911304B (en) * 2023-09-12 2024-02-20 深圳须弥云图空间科技有限公司 Text recommendation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165350A (en) * 2018-08-23 2019-01-08 成都品果科技有限公司 A kind of information recommendation method and system based on deep knowledge perception
CN111831924A (en) * 2020-07-16 2020-10-27 腾讯科技(北京)有限公司 Content recommendation method, device, equipment and readable storage medium
CN112328879A (en) * 2020-11-05 2021-02-05 中国平安人寿保险股份有限公司 News recommendation method and device, terminal equipment and storage medium
CN112417313A (en) * 2020-11-24 2021-02-26 云南大学 Model hybrid recommendation method based on knowledge graph convolutional network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951435B (en) * 2017-02-08 2020-05-22 广州神马移动信息科技有限公司 News recommendation method and equipment and programmable equipment
FR3087921A1 (en) * 2018-10-31 2020-05-01 Amadeus S.A.S. RECOMMENDED SYSTEMS AND METHODS USING AUTOMATIC CASCADE LEARNING MODELS
US20200175084A1 (en) * 2018-11-30 2020-06-04 Microsoft Technology Licensing, Llc Incorporating contextual information in large-scale personalized follow recommendations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165350A (en) * 2018-08-23 2019-01-08 成都品果科技有限公司 A kind of information recommendation method and system based on deep knowledge perception
CN111831924A (en) * 2020-07-16 2020-10-27 腾讯科技(北京)有限公司 Content recommendation method, device, equipment and readable storage medium
CN112328879A (en) * 2020-11-05 2021-02-05 中国平安人寿保险股份有限公司 News recommendation method and device, terminal equipment and storage medium
CN112417313A (en) * 2020-11-24 2021-02-26 云南大学 Model hybrid recommendation method based on knowledge graph convolutional network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Fine-Grained Deep Knowledge-Aware Network for News Recommendation with Self-Attention";Jie Gao等;《2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)》;20181206;第81-88页 *
"基于系统动力学的资讯个性化推荐研究";王子岩等;《河北科技大学学报》;20210303;第171-179页 *

Also Published As

Publication number Publication date
CN113961823A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
CN111444428B (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
JP5281405B2 (en) Selecting high-quality reviews for display
US8751511B2 (en) Ranking of search results based on microblog data
Shi et al. Learning-to-rank for real-time high-precision hashtag recommendation for streaming news
CN113961823B (en) News recommendation method, system, storage medium and equipment
Gkotsis et al. It's all in the content: state of the art best answer prediction based on discretisation of shallow linguistic features
TW201033823A (en) Systems and methods for analyzing electronic text
Hensinger et al. Modelling and predicting news popularity
CN111177538A (en) Unsupervised weight calculation-based user interest tag construction method
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN110413738A (en) A kind of information processing method, device, server and storage medium
CN111506831A (en) Collaborative filtering recommendation module and method, electronic device and storage medium
Lv et al. FeRe: Exploiting influence of multi-dimensional features resided in news domain for recommendation
Kacem et al. Time-sensitive user profile for optimizing search personlization
Yao et al. Version-aware rating prediction for mobile app recommendation
Hai et al. Coarse-to-fine review selection via supervised joint aspect and sentiment model
Pavlov et al. Collaborative filtering with maximum entropy
Choi et al. Finding informative comments for video viewing
Wei et al. Online education recommendation model based on user behavior data analysis
Roy et al. A tag2vec approach for questions tag suggestion on community question answering sites
CN115510326A (en) Internet forum user interest recommendation algorithm based on text features and emotional tendency
CN111310016B (en) Label mining method, device, server and storage medium
Piao et al. Recommender system architecture based on mahout and a main memory database
CN114547435A (en) Content quality identification method, device, equipment and readable storage medium
Santosa et al. S3PaR: Section-Based Sequential Scientific Paper Recommendation for Paper Writing Assistance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant