WO2020224128A1 - News recommendation method and apparatus based on short-term interest of user, and electronic device and medium - Google Patents

News recommendation method and apparatus based on short-term interest of user, and electronic device and medium Download PDF

Info

Publication number
WO2020224128A1
WO2020224128A1 PCT/CN2019/103700 CN2019103700W WO2020224128A1 WO 2020224128 A1 WO2020224128 A1 WO 2020224128A1 CN 2019103700 W CN2019103700 W CN 2019103700W WO 2020224128 A1 WO2020224128 A1 WO 2020224128A1
Authority
WO
WIPO (PCT)
Prior art keywords
news
user
term
short
matrix
Prior art date
Application number
PCT/CN2019/103700
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
贾雪丽
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020224128A1 publication Critical patent/WO2020224128A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • This application relates to the field of data analysis technology, and more specifically, to a news recommendation method and device, electronic equipment, and media based on users' short-term interests.
  • the outline of a user based on the content is called a user portrait.
  • the key issue of content-based news recommendation is how to construct user portraits based on the user's reading history.
  • most content-based recommendation systems consider the user's reading history as a whole.
  • the long-term interest of a user may be relatively stable, but in the short term, the content that the user pays attention to will change. For example, a sports enthusiast, his focus may change with the competition of different events. Therefore, using long-term reading history to determine the user's preference cannot accurately recommend news for him, nor can it better stimulate the user's interest in reading.
  • the purpose of this application is to provide a news recommendation method and device, electronic equipment and medium based on the user's short-term interest that combine the long-term and short-term preferences of the user to recommend news to the user.
  • a news recommendation device based on a user’s short-term interest, including: a collection module that collects user behavioral data on news, the behavioral data includes a news matrix; a word vector matrix module, based on the news matrix Obtain the corresponding word vector matrix; clustering module, cluster the word vector matrix, obtain the grouping result of each news, and group each news into corresponding news groups according to the grouping result; user portrait obtaining module, A long-term portrait and a short-term portrait of each user are obtained through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news.
  • the first similarity acquisition module which analyzes the similarity between the long-term portrait of each user and different newsgroups, and obtains multiple first similarities
  • the preference newsgroup acquisition module in descending order, compares the multiple first similarities According to the ranking results, the first set number of news groups corresponding to each user is obtained based on the result of the ranking
  • the second similarity obtaining module analyzes the latest short-term portrait of each user and the first set number of news groups The second degree of similarity between each news; a bipartite graph construction module, which constructs a user-news bipartite graph according to the second degree of similarity; a recommendation module, which selects the recommended news on the bipartite graph using an absorption random walk method , So as to get the recommended news of each user.
  • a news recommendation method based on users' short-term interests including: step S1, collecting user behavior data on news, the behavior data including a news matrix; step S2, according to the news matrix Obtain the corresponding word vector matrix; step S3, cluster the word vector matrix to obtain the grouping result of each news, and group each news into the corresponding news group according to the grouping result; step S4, pass each The long-term behavior data and short-term behavior data of each news user obtain a long-term portrait and a short-term portrait of each user respectively, and the long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news; step S5 Analyze the similarity between the long-term portrait of each user and the different newsgroups to obtain multiple first similarities; step S6, sort the multiple first similarities in descending order, and obtain each The first set number of newsgroups corresponding to the user; step S7, analyzing the second similarity between the latest short-term portrait of each
  • the present application also provides an electronic device including a memory and a processor, and the memory includes a news recommendation program based on the user's short-term interest, and the news recommendation program based on the user's short-term interest When executed by the processor, the above-mentioned news recommendation method based on the user's short-term interest is realized.
  • the present application also provides a computer non-volatile readable storage medium
  • the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interests, and the When the interest news recommendation program is executed by the processor, the steps of the above-mentioned news recommendation method based on the user's short-term interest are realized.
  • the news recommendation method and device based on the short-term interests of users, electronic equipment and media described in this application establishes a user-item bipartite graph based on long-term and short-term user portraits, and seamlessly integrates long-term and short-term users to represent users’ reading preferences.
  • Absorbing random walk algorithm to select news in different topics not only can provide relevant news articles about user interests, but also expand user preferences by introducing articles on different topics.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application;
  • Fig. 2 is a schematic diagram of a news recommendation device based on the short-term interests of users in this application;
  • Fig. 3 is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • This application provides a news recommendation method based on a user's short-term interest, which is applied to an electronic device 1.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • the electronic device 1 may be a terminal client with computing functions such as a server, a mobile phone, a tablet computer, a portable computer, a desktop computer, and the like.
  • the memory 11 includes at least one type of readable storage medium.
  • the at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1.
  • the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. Secure Digital (SD) card, flash card (Flash Card), etc.
  • SD Secure Digital
  • flash card Flash Card
  • the readable storage medium of the memory 11 is generally used to store a news recommendation program 10 based on the user's short-term interests installed in the electronic device 1 and the like.
  • the memory 11 can also be used to temporarily store data that has been output or will be output.
  • the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to execute a short-term Interested news recommendation program 10 etc.
  • CPU central processing unit
  • microprocessor or other data processing chip
  • the network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic clients.
  • a standard wired interface and a wireless interface such as a WI-FI interface
  • the communication bus 14 is used to realize the connection and communication between these components.
  • FIG. 1 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the electronic device 1 may also include a user interface, and the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other clients with voice recognition functions, and a voice output device such as audio, earphones, etc. Etc.
  • the user interface may also include a standard wired interface and a wireless interface.
  • the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit.
  • it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device.
  • the display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the electronic device 1 further includes a touch sensor.
  • the area provided by the touch sensor for the user to perform a touch operation is called a touch area.
  • the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like.
  • the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like.
  • the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
  • the electronic device 1 may also include logic gate circuits, sensors, audio circuits, etc., which will not be repeated here.
  • the memory 11 as a computer storage medium may include an operating system and a news recommendation program 10 based on the user's short-term interest; the processor 12 executes the information stored in the memory 11 based on the user's short-term interest
  • the news recommendation program implements the following steps at 10:
  • Step S1 collecting user behavior data on news, the behavior data including a news matrix
  • Step S2 Obtain a corresponding word vector matrix according to the news matrix
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news.
  • Step S5 Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
  • Step S8 construct a user news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  • the news recommendation program 10 based on the user's short-term interests can also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the content.
  • the module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
  • the above electronic device obtains the long-term portrait of the user while also modeling the short-term reading preference of the user, and according to the short-term reading preference, recommends articles that can arouse the user's reading interest to expand the user's reading volume.
  • FIG. 2 is a schematic diagram of a news recommendation device based on a user's short-term interest in this application. As shown in FIG. 2, the news recommendation device includes:
  • the collection module 110 collects user behavior data on news.
  • the behavior data includes a news matrix.
  • the behavior data further includes a news matrix and a behavior matrix.
  • the behavior matrix is a news matrix of each user in the user matrix.
  • the word vector matrix module 120 obtains a corresponding word vector matrix according to the news matrix
  • the clustering module 130 clusters the word vector matrix to obtain a grouping result of each news, and groups each news into a corresponding news group according to the grouping result;
  • the user portrait obtaining module 140 obtains a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the words contained in the news.
  • the first similarity obtaining module 150 analyzes the similarity between the long-term portrait of each user and different news groups to obtain multiple first similarities
  • the preferred newsgroup obtaining module 160 sorts the plurality of first similarities in descending order, and obtains a first set number of newsgroups corresponding to each user based on the sorting result;
  • the second similarity obtaining module 170 analyzes the second similarity between the latest short-term portrait of each user and each news in the first set number of news groups;
  • the bipartite graph construction module 180 constructs a user-news bipartite graph according to the second similarity
  • the recommendation module 190 selects the recommended news by using an absorption random walk method on the bipartite graph, so as to obtain the recommended news of each user.
  • the aforementioned clustering module 130 includes:
  • the hierarchical clustering unit performs hierarchical clustering on the word vector matrix of the word vector matrix module to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
  • Dunn index obtaining unit to obtain the Dunn index corresponding to each clustering result of the hierarchical clustering unit
  • a cutting unit cutting the hierarchical clustering dendrogram of the hierarchical clustering unit through the layer corresponding to the maximum Dunn index obtained by the Dunn index obtaining unit to obtain the best hierarchical clustering dendrogram;
  • the news grouping unit cuts the cutting unit to form the best hierarchical clustering dendrogram and the news corresponding to the leaf nodes belonging to the same parent node belong to the same news group, thereby obtaining the news grouping of each news.
  • the above-mentioned news recommendation device further includes: a topic matrix construction module, which analyzes the word vector matrix using a linear discriminant analysis method to obtain topic probability matrices of multiple topics of each news and different words corresponding to each topic
  • the word probability matrix of the vector, the topic value of each news is obtained through the combination of the topic probability matrix, word probability matrix, and word vector matrix of each news.
  • the topic value of each news forms the topic matrix.
  • the clustering module 130 obtains the topic vector of each news group through the topic matrix constructed by the topic matrix building module; the first similarity obtaining module 150 uses the vector similarity measurement method to determine the long-term portrait of the user and the topic of each news group The first similarity of the vector; the second similarity obtaining module 170 uses a vector similarity measurement method to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
  • this application also provides a news recommendation method based on users' short-term interests.
  • FIG. 3 it is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the news recommendation method based on the user's short-term interest includes:
  • Step S1 Collect user behavior data about news.
  • the behavior data includes a user matrix, preferably a news matrix and a behavior matrix.
  • the behavior matrix is the behavioral data of each user in the user matrix to each news in the news matrix.
  • Matrix of behavior indicators are the behavioral data of each user in the user matrix to each news in the news matrix.
  • N [n 1 , n 2 ,..., n b ]
  • U is the user matrix
  • a is the total number of users
  • N is the news matrix
  • b is the total number of news
  • UN is the behavior matrix formed by each user's behavior indicators for each news
  • UN a is the behavior vector of the a-th user
  • un ab is the behavior indicator of the a-th user on the b-th news.
  • the behavior indicators include the number of clicks, the number of reads, the number of likes, the number of evaluations, the length of reading, the frequency of clicks (the number of clicks per unit time), the frequency of reading, and the like
  • One or more of frequency and evaluation frequency for example, collecting user browsing history of news websites through web crawler technology, sorting user identifiers into a user matrix, sorting news identifiers in news websites into a news matrix, and dividing any The number of times the user clicks on any news is used as the user's behavior indicator for the news. When the user is not browsing news, the number of clicks by the user on the news is 0, which constitutes a behavior matrix;
  • Step S2 Obtain the corresponding word vector matrix according to the news matrix, that is to say, convert the words in each news in the news matrix into word vectors to form the corresponding word vector matrix
  • W is the word vector matrix of all news
  • c is the number of the longest word vector in the news
  • w bc represents the word vector of the c-th word in the b-th news, when the number of news word vectors is not enough c, Fill it with zeros
  • W b is the word vector matrix of the b-th news
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result, and the news group represents the grouping of news clusters;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term and short-term are in terms of time (for example, the long-term can be one month, The short-term may be one week), the long-term includes a plurality of the short-terms, and the long-term portrait and the short-term portrait represent the user's preference for the word vector corresponding to the word contained in the news;
  • Step S5 separately analyze the first similarity of the word vector between the long-term portrait of each user and each news group;
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 respectively analyze the second similarity of the word vector between the short-term portrait of each user closest to the analysis time and each news in the first set number of news groups;
  • Step S8 construct a user-news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select the recommended news on the bipartite graph, so as to obtain the recommended news of each user.
  • the above-mentioned news recommendation method based on users’ short-term interests emphasizes the influence of the evolution of user’s interests when establishing user portraits, and seamlessly integrates long-term and short-term users as users’ reading preferences, establishes a relationship diagram between specific news and users, and then The absorption random walk method is implemented on the graph to select news articles with different topics.
  • the foregoing news recommendation method based on the user's short-term interest includes:
  • step S4 the word vector of each news is used as a label, and the long-term portrait and short-term portrait are the user's preference weight for each label,
  • P is a short-term portrait of a user
  • P' is a long-term portrait of a user
  • P b represents the short-term weight vector of the user for the b-th news
  • p bc is the user's c-th news in the b-th news.
  • step S5 the matrix similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup, for example, the correlation coefficient of the matrix, the cosine theorem of the space vector, etc., or the word vector of the news in the newsgroup
  • the similarity between the newsgroup matrix and the corresponding long-term profile sub-matrix (including the preference of the word vector of newsgroup news).
  • Another example is to use the cosine function to flatten the newsgroup matrix and the long-term profile sub-matrix, using the vector similarity method Obtain the first degree of similarity, for example, subtract the elements of the newsgroup matrix and the long-term portrait sub-matrix to square and then sum to obtain the first degree of similarity;
  • step S7 a matrix similarity measurement method is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group;
  • step S8 in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained.
  • a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news The higher the score, the greater the weight.
  • the above-mentioned news recommendation method based on the user's short-term interest screens newsgroups through the user's long-term portraits and short-term portraits, so that the selected newsgroups not only conform to the users' long-term preferences but also conform to the users' short-term interests, and improve the accuracy of news recommendation
  • Euclidean distance Euclidean distance, Manhattan distance, Chebyshev distance, Minkowski distance, normalized Euclidean distance, Mahalanobis distance, angle cosine, Hamming distance, Jeckard Vector similarity measurement methods such as distance &Jaccard's similarity coefficient, correlation coefficient & correlation distance obtain the second similarity between the user's short-term portrait and each news in the first set number of newsgroups, for example, after the user's long-term portrait filtering
  • d(P i , W i ) is the second degree of similarity between the user and news n 1 ;
  • each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to
  • Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news.
  • the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
  • the above-mentioned news recommendation method based on users' short-term interests has two stages in news selection. First, long-term portraits are used to distinguish whether newsgroups meet user preferences, and then short-term portraits are used to filter specific news articles to users, so that users’ long-term preferences and short-term preferences Preference for seamless connection, which improves the accuracy of recommendations.
  • the news recommendation method based on the user's short-term interest includes:
  • step S2 LDA (Latent Dirichlet Allocation, linear discriminant analysis) is used to analyze the word vector matrix to obtain the topic value of each news, thereby obtaining the topic matrix, specifically including: obtaining each of the news matrix through LDA The topic probability matrix of multiple topics of news and the word probability matrix of different word vectors corresponding to each topic
  • ⁇ b is the topic probability matrix of the b-th news, Is the probability that the b-th news corresponds to the d-th topic, Is the word probability matrix of the b-th news, Indicates the probability that the dth topic generates the cth word vector in the bth news;
  • T b is the topic value of the b-th news, ".” means matrix multiplication
  • step S3 the word vector matrix is clustered to obtain the news group to which each news belongs, thereby obtaining the topic vector of each news group.
  • a news group is [n i , n j ], corresponding to the topic
  • the vector is [z i , z j ].
  • step S4 LDA is used as a language model for detecting potential topics, and a long-term portrait and a short-term portrait of each user are obtained. Specifically: the long-term portrait and the short-term portrait are obtained through the topic probability matrix, word probability matrix and behavior matrix of each news , Among them, the user’s behavioral index for news is taken as the user’s behavioral index for each word vector in the news,
  • z a [z a1 , z a2 ,..., z ab ]
  • un ab (c) represents the behavior vector of the a-th user to the c word vectors in the b-th news, that is, un ab (c) is composed of c un abs , and z ab is the a-th user pair
  • the topic value of the b-th news, z a is the long-term portrait or short-term portrait of the a-th user.
  • step S5 the similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup.
  • the cosine similarity method is used to obtain the first similarity.
  • sm , n represents the similarity between the m-th long-term portrait and the n-th newsgroup
  • (x 1 , x 2 ,..., x b ) is the topic vector of the m-th long-term portrait
  • (y 1 , y 2 ,...,y b ) is the nth newsgroup topic vector.
  • a newsgroup X includes the first news and the third news
  • the topic vector of the newsgroup is (z 1 ,z 3 )
  • the corresponding long-term portrait vector of the a-th user is (Z a1 ,Z a3 )
  • step S7 the similarity measurement method of step S5 is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
  • step S8 in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained.
  • a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news set.
  • the above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens newsgroups through similarity, which reduces the amount of calculation while ensuring the accuracy of recommendation .
  • step S4 the long-term portrait is obtained by formula (3), and the short-term portrait is obtained by the following formula (5)
  • step S7 the similarity measurement method is used to determine the second similarity between the short-term portrait of the user and each news of each news group of the first set number.
  • the cosine similarity method is used to obtain the second similarity.
  • s′ m,n represents the similarity between the m-th short-term portrait and the n-th news
  • (x 1 ,x 2 ,...,x c ) is the topic vector of the m-th short-term portrait
  • (y 1 , y 2 ,...,y c ) are the word vectors of the nth news, all of which are 1 ⁇ c vectors.
  • each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to
  • Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news.
  • the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
  • the above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens news groups and news respectively, reduces the amount of calculation, increases the speed of recommendation, and improves the recommendation. Accuracy.
  • step S2 LDA is used to analyze the word vector matrix, and the topic vector of each news is obtained by the following formula (7)
  • step S7 the second similarity between each user's short-term portrait and each news is obtained by the similarity between each user's short-term portrait and the topic vector of each news.
  • step S4 the step of obtaining the long-term portrait and the short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news respectively further includes:
  • the long-term portrait of the user is obtained in a weighted manner according to the user portrait of the user in each time frame, wherein the short-term portrait of the user closer to the analysis time has a higher weight.
  • a time equation is used to weighted combination of multiple short-term portraits of users into a long-term portrait of users
  • P u represents a long-term portrait
  • is the constant parameter of the time equation
  • the aforementioned news recommendation method based on the user's short-term interests first constructs a long-term portrait of a given user based on time-sensitive weighting, and then analyzes the user's latest reading history to analyze his short-term preferences.
  • step S3 the step of clustering the word vector matrix includes:
  • the above method of clustering the word vector matrix first uses a hierarchical agglomerative clustering algorithm to construct a news hierarchy purely based on the content of news articles, and then uses Dunn’s effectiveness index to determine the best hierarchical dendrogram, which avoids the cluster decision Quantity.
  • Dunn index calculates the shortest distance between any two cluster elements (between clusters) divided by the maximum distance (within cluster) in any cluster. The larger the index, the greater the distance between clusters and the smaller the distance within the cluster. Use Dunn The index decides which layer to cut the tree diagram. After obtaining news groups, LDA can be used to analyze each group, and the theme of each group can be represented by a theme vector to match the long-term user portrait for group filtering.
  • step S9 news is selected in different topics by absorbing random walk method.
  • the absorbing random walk method first chooses an initial point, and then randomly jumps to any point on the graph with the probability of p. The remaining 1-p probability will be assigned to the adjacent points according to the weight of the edge, and the same probability will be used every time. Jump to a random point or adjacent point, and use the transition matrix to calculate the jump probability. After several iterations, the jump probability stabilizes, and the news with the highest transition probability will be recommended, and the random walk method will decrease afterwards. The jump probability of the same article of the article in order to achieve the purpose of selecting more types of news. In this way, the news recommendation method based on the user's short-term interest described in this application can not only provide relevant news articles about the user's interest, but also expand the user's preferences by introducing articles on different topics.
  • step S9 includes:
  • each user acts as a node, and each news also acts as a node.
  • the random walk restart method is used to obtain the correlation value between the nodes;
  • the adjacent set of each user formed by the adjacent nodes of each user node form the first sub-correlation matrix of each user from the correlation value between any two nodes in the adjacent set, and divide the first sub-correlation matrix
  • the reciprocal of the mean value of the off-diagonal elements in the correlation matrix is used as the bridging value of each user, combined with the bridging values of user nodes in adjacent sets to form the bridging matrix of each user, for example, a user node u 1 , and its adjacent set is [n 2 , n 4 , u 3 ], the first autocorrelation matrix of user node u 1 r 23 is the correlation value between news node n 2 and user node u 3 , and the bridge value q 1 of user node u 1 is the mean value of the off-diagonal elements in the first correlation matrix, namely The bridging matrix of user node u 1 is [q 1 , q 3 ];
  • the correlation value of each user node and the user node in the adjacent set and the news node in the adjacent set constitutes the second sub-correlation matrix of each user, as in the above example, the second sub-correlation matrix of user node u 1
  • the bridge matrix of each user and the second sub-correlation matrix are multiplied to obtain the recommended value of the news node
  • the news nodes are sorted according to the recommended value in descending order, and the set number of news with the highest sorting is selected to recommend the user.
  • the step of using a random walk restart method to obtain correlation values between nodes includes:
  • Iterative processing is performed on the adjacency matrix until the adjacency matrix converges, and the elements in the adjacency matrix after the convergence are the correlation values between the one node and the other node.
  • an embodiment of the present application also proposes a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interest, and the news based on the user's short-term interest The following steps are implemented when the recommended program is executed by the processor:
  • Step S1 Collect user behavior data on news, the behavior data includes a user matrix
  • Step S2 Obtain a corresponding word vector matrix according to the news matrix
  • Step S3 clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
  • Step S4 Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news.
  • the long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news.
  • Step S5 Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities
  • Step S6 sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
  • Step S7 analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
  • Step S8 construct a user news bipartite graph according to the second similarity
  • Step S9 Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  • the specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned news recommendation method and device based on the user's short-term interest, and electronic equipment, and will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A news recommendation method and apparatus based on short-term interest of a user, and an electronic device and a storage medium, relating to the field of data analysis, and capable of combining long-term and short-term preferences of the user. The method comprises: collecting behavior data of a user on news (S1); obtaining a word vector matrix corresponding to a news matrix (S2); clustering the word vector matrix to obtain a news group of each news sub-group (S3); obtaining a long-term portrait and a short-term portrait of each user by means of long-term behavior data and short-term behavior data of each user for each piece of news (S4); analyzing a first similarity between the long-term portrait of each user and each news group (S5); sorting the news groups of each user in a descending order according to the first similarity, and taking a first set number of news groups sorted at the top (S6); analyzing a second similarity between the recent short-term portrait of each user and each piece of news in the first set number of news groups (S7); constructing a user-news bipartite graph according to the second similarity (S8); and selecting recommended news on the bipartite graph by using an absorption random walk method (S9).

Description

基于用户短期兴趣的新闻推荐方法及装置、电子设备及介质News recommendation method and device, electronic equipment and medium based on user's short-term interest
本申请要求申请号为201910379183.5,申请日为2019年5月8日,发明创造名称为“基于用户短期兴趣的新闻推荐方法、装置及介质”的专利申请的优先权。This application requires the priority of the patent application whose application number is 201910379183.5, the filing date is May 8, 2019, and the invention and creation titled "News recommendation method, device and medium based on user's short-term interests".
技术领域Technical field
本申请涉及数据分析技术领域,更为具体地,涉及一种基于用户短期兴趣的新闻推荐方法及装置、电子设备及介质。This application relates to the field of data analysis technology, and more specifically, to a news recommendation method and device, electronic equipment, and media based on users' short-term interests.
背景技术Background technique
在推荐新闻时参考用户的阅读历史是至关重要的。根据内容描绘出一个用户的轮廓被称为用户画像。基于内容的新闻推荐的关键问题是如何根据用户的阅读历史构建用户画像。在处理这个问题时,大多数基于内容的推荐系统将用户的阅读历史考虑为一个整体。一个用户的长期兴趣可能相对稳定,但是从短期来看,用户关注的内容会发生变化。比如一个体育运动爱好者,他的关注点可能会随着不同项目的比赛而改变。因此,采用长期阅读历史确定用户的偏好,并不能准确的为他推荐新闻,也无法更好的激发用户的阅读兴趣。It is important to refer to the user’s reading history when recommending news. The outline of a user based on the content is called a user portrait. The key issue of content-based news recommendation is how to construct user portraits based on the user's reading history. When dealing with this problem, most content-based recommendation systems consider the user's reading history as a whole. The long-term interest of a user may be relatively stable, but in the short term, the content that the user pays attention to will change. For example, a sports enthusiast, his focus may change with the competition of different events. Therefore, using long-term reading history to determine the user's preference cannot accurately recommend news for him, nor can it better stimulate the user's interest in reading.
发明内容Summary of the invention
鉴于上述问题,本申请的目的是提供一种结合用户长期与短期的偏好来给用户推荐新闻的基于用户短期兴趣的新闻推荐方法及装置、电子设备及介质。In view of the above-mentioned problems, the purpose of this application is to provide a news recommendation method and device, electronic equipment and medium based on the user's short-term interest that combine the long-term and short-term preferences of the user to recommend news to the user.
根据本申请的一个方面,提供一种基于用户短期兴趣的新闻推荐装置,包括:采集模块,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵;词向量矩阵模块,根据所述新闻矩阵获得对应的词向量矩阵;聚类模块,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;用户画像获得模块,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;第一相似度获得模块,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;有偏好新闻组获得模块,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;第二相似度获得模块,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;二分图构建模块,根据所述第二相似度构建用户-新闻二分图;推荐模块,在所述二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。According to one aspect of this application, there is provided a news recommendation device based on a user’s short-term interest, including: a collection module that collects user behavioral data on news, the behavioral data includes a news matrix; a word vector matrix module, based on the news matrix Obtain the corresponding word vector matrix; clustering module, cluster the word vector matrix, obtain the grouping result of each news, and group each news into corresponding news groups according to the grouping result; user portrait obtaining module, A long-term portrait and a short-term portrait of each user are obtained through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news. ; The first similarity acquisition module, which analyzes the similarity between the long-term portrait of each user and different newsgroups, and obtains multiple first similarities; the preference newsgroup acquisition module, in descending order, compares the multiple first similarities According to the ranking results, the first set number of news groups corresponding to each user is obtained based on the result of the ranking; the second similarity obtaining module analyzes the latest short-term portrait of each user and the first set number of news groups The second degree of similarity between each news; a bipartite graph construction module, which constructs a user-news bipartite graph according to the second degree of similarity; a recommendation module, which selects the recommended news on the bipartite graph using an absorption random walk method , So as to get the recommended news of each user.
根据本申请的第二个方面,提供一种基于用户短期兴趣的新闻推荐方法,包括:步骤S1,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵; 步骤S2,根据所述新闻矩阵获得对应的词向量矩阵;步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;步骤S5,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;步骤S7,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;步骤S8,根据所述第二相似度构建用户新闻二分图;步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。According to a second aspect of the present application, a news recommendation method based on users' short-term interests is provided, including: step S1, collecting user behavior data on news, the behavior data including a news matrix; step S2, according to the news matrix Obtain the corresponding word vector matrix; step S3, cluster the word vector matrix to obtain the grouping result of each news, and group each news into the corresponding news group according to the grouping result; step S4, pass each The long-term behavior data and short-term behavior data of each news user obtain a long-term portrait and a short-term portrait of each user respectively, and the long-term portrait and the short-term portrait are used to represent the user's preference for the word vector corresponding to the word contained in the news; step S5 Analyze the similarity between the long-term portrait of each user and the different newsgroups to obtain multiple first similarities; step S6, sort the multiple first similarities in descending order, and obtain each The first set number of newsgroups corresponding to the user; step S7, analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups; step S8, according to The second degree of similarity constructs a user news bipartite graph; step S9, using an absorption random walk method on the user news bipartite graph to select recommended news to obtain recommended news for each user.
此外,为了实现上述目的,本申请还提供一种电子设备,所述电子设备包括存储器和处理器,所述存储器中包括基于用户短期兴趣的新闻推荐程序,所述基于用户短期兴趣的新闻推荐程序被所述处理器执行时实现上述基于用户短期兴趣的新闻推荐方法。In addition, in order to achieve the above object, the present application also provides an electronic device including a memory and a processor, and the memory includes a news recommendation program based on the user's short-term interest, and the news recommendation program based on the user's short-term interest When executed by the processor, the above-mentioned news recommendation method based on the user's short-term interest is realized.
此外,为了实现上述目的,本申请还提供一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质中包括基于用户短期兴趣的新闻推荐程序,所述基于用户短期兴趣的新闻推荐程序被处理器执行时,实现上述的基于用户短期兴趣的新闻推荐方法的步骤。In addition, in order to achieve the above object, the present application also provides a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interests, and the When the interest news recommendation program is executed by the processor, the steps of the above-mentioned news recommendation method based on the user's short-term interest are realized.
本申请所述基于用户短期兴趣的新闻推荐方法及装置、电子设备及介质建立了基于长期和短期用户画像的用户-物品二分图,将长期和短期用户无缝集成表示成用户的阅读偏好,通过吸收随机游走方法算法来在不同的主题中选择新闻,不仅可以提供相关的关于用户兴趣的新闻文章,也可以通过引入不同主题的文章来拓展用户的偏好。The news recommendation method and device based on the short-term interests of users, electronic equipment and media described in this application establishes a user-item bipartite graph based on long-term and short-term user portraits, and seamlessly integrates long-term and short-term users to represent users’ reading preferences. Absorbing random walk algorithm to select news in different topics, not only can provide relevant news articles about user interests, but also expand user preferences by introducing articles on different topics.
附图说明Description of the drawings
图1是本申请基于用户短期兴趣的新闻推荐方法较佳实施例的应用环境示意图;FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application;
图2是本申请基于用户短期兴趣的新闻推荐装置的示意图;Fig. 2 is a schematic diagram of a news recommendation device based on the short-term interests of users in this application;
图3是本申请基于用户短期兴趣的新闻推荐方法较佳实施例的流程图。Fig. 3 is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
以下将结合附图对本申请的具体实施例进行详细描述。The specific embodiments of the present application will be described in detail below in conjunction with the accompanying drawings.
本申请提供一种基于用户短期兴趣的新闻推荐方法,应用于一种电子设备1。参照图1所示,为本申请基于用户短期兴趣的新闻推荐方法较佳实施例的应用环境示意图。This application provides a news recommendation method based on a user's short-term interest, which is applied to an electronic device 1. Referring to FIG. 1, it is a schematic diagram of an application environment of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application.
在本实施例中,电子设备1可以是服务器、手机、平板电脑、便携计算机、桌上型计算机等具有运算功能的终端客户端。In this embodiment, the electronic device 1 may be a terminal client with computing functions such as a server, a mobile phone, a tablet computer, a portable computer, a desktop computer, and the like.
存储器11包括至少一种类型的可读存储介质。所述至少一种类型的可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,所述可读存储介质可以是所述电子设备1的内部存储单元,例如该电子设备1的硬盘。在另一些实施例中,所述可读存储介质也可以是所述电子设备1的外部存储器,例如所述电子设备1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, hard disk, multimedia card, card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1. In other embodiments, the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk or a smart memory card (Smart Media Card, SMC) equipped on the electronic device 1. Secure Digital (SD) card, flash card (Flash Card), etc.
在本实施例中,所述存储器11的可读存储介质通常用于存储安装于所述电子设备1的基于用户短期兴趣的新闻推荐程序10等。所述存储器11还可以用于暂时地存储已经输出或者将要输出的数据。In this embodiment, the readable storage medium of the memory 11 is generally used to store a news recommendation program 10 based on the user's short-term interests installed in the electronic device 1 and the like. The memory 11 can also be used to temporarily store data that has been output or will be output.
处理器12在一些实施例中可以是一中央处理器(Central Processing Unit,CPU),微处理器或其他数据处理芯片,用于运行存储器11中存储的程序代码或处理数据,例如执行基于用户短期兴趣的新闻推荐程序10等。In some embodiments, the processor 12 may be a central processing unit (CPU), a microprocessor or other data processing chip, which is used to run the program code or process data stored in the memory 11, for example, to execute a short-term Interested news recommendation program 10 etc.
网络接口13可选地可以包括标准的有线接口、无线接口(如WI-FI接口),通常用于在该电子设备1与其他电子客户端之间建立通信连接。The network interface 13 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface), and is usually used to establish a communication connection between the electronic device 1 and other electronic clients.
通信总线14用于实现这些组件之间的连接通信。The communication bus 14 is used to realize the connection and communication between these components.
图1仅示出了具有组件11-14的电子设备1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。FIG. 1 only shows the electronic device 1 with the components 11-14, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.
可选地,该电子设备1还可以包括用户接口,用户接口可以包括输入单元比如键盘(Keyboard)、语音输入装置比如麦克风(microphone)等具有语音识别功能的客户端、语音输出装置比如音响、耳机等,可选地用户接口还可以包括标准的有线接口、无线接口。Optionally, the electronic device 1 may also include a user interface, and the user interface may include an input unit such as a keyboard (Keyboard), a voice input device such as a microphone (microphone) and other clients with voice recognition functions, and a voice output device such as audio, earphones, etc. Etc. Optionally, the user interface may also include a standard wired interface and a wireless interface.
可选地,该电子设备1还可以包括显示器,显示器也可以称为显示屏或显示单元。Optionally, the electronic device 1 may also include a display, and the display may also be called a display screen or a display unit.
在一些实施例中可以是LED显示器、液晶显示器、触控式液晶显示器以及有机发光二极管(Organic Light-Emitting Diode,OLED)触摸器等。显示器用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。In some embodiments, it may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an organic light-emitting diode (Organic Light-Emitting Diode, OLED) touch device. The display is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
可选地,该电子设备1还包括触摸传感器。所述触摸传感器所提供的供用户进行触摸操作的区域称为触控区域。此外,这里所述的触摸传感器可以为电阻式触摸传感器、电容式触摸传感器等。而且,所述触摸传感器不仅包括接触式的触摸传感器,也可包括接近式的触摸传感器等。此外,所述触摸传感器可以为单个传感器,也可以为例如阵列布置的多个传感器。Optionally, the electronic device 1 further includes a touch sensor. The area provided by the touch sensor for the user to perform a touch operation is called a touch area. In addition, the touch sensor described here may be a resistive touch sensor, a capacitive touch sensor, or the like. Moreover, the touch sensor includes not only a contact type touch sensor, but also a proximity type touch sensor and the like. In addition, the touch sensor may be a single sensor, or may be, for example, a plurality of sensors arranged in an array.
可选地,该电子设备1还可以包括逻辑门电路,传感器、音频电路等等,在此不再赘述。Optionally, the electronic device 1 may also include logic gate circuits, sensors, audio circuits, etc., which will not be repeated here.
在图1所示的装置实施例中,作为一种计算机存储介质的存储器11中可以包括操作系统以及基于用户短期兴趣的新闻推荐程序10;处理器12执行存储器11中存储的基于用户短期兴趣的新闻推荐程序10时实现如下步骤:In the device embodiment shown in FIG. 1, the memory 11 as a computer storage medium may include an operating system and a news recommendation program 10 based on the user's short-term interest; the processor 12 executes the information stored in the memory 11 based on the user's short-term interest The news recommendation program implements the following steps at 10:
步骤S1,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵;Step S1, collecting user behavior data on news, the behavior data including a news matrix;
步骤S2,根据所述新闻矩阵获得对应的词向量矩阵;Step S2: Obtain a corresponding word vector matrix according to the news matrix;
步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;Step S3, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;Step S4: Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news. Vector preference
步骤S5,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;Step S5: Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities;
步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;Step S6, sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
步骤S7,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;Step S7, analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
步骤S8,根据所述第二相似度构建用户新闻二分图;Step S8, construct a user news bipartite graph according to the second similarity;
步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。Step S9: Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
在其他实施例中,所述基于用户短期兴趣的新闻推荐程序10还可以被分割为一个或者多个模块,一个或者多个模块被存储于存储器11中,并由处理器12执行,以完成本申请。本申请所称的模块是指能够完成特定功能的一系列计算机程序指令段。In other embodiments, the news recommendation program 10 based on the user's short-term interests can also be divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by the processor 12 to complete the content. Application. The module referred to in this application refers to a series of computer program instruction segments that can complete specific functions.
上述电子设备获得用户长期画像的同时还对用户短期阅读偏好进行建模,根据短期阅读偏好,推荐可以引起用户阅读兴趣的文章以此来扩大用户的阅读量。The above electronic device obtains the long-term portrait of the user while also modeling the short-term reading preference of the user, and according to the short-term reading preference, recommends articles that can arouse the user's reading interest to expand the user's reading volume.
图2是本申请基于用户短期兴趣的新闻推荐装置的示意图,如图2所示,所述新闻推荐装置包括:FIG. 2 is a schematic diagram of a news recommendation device based on a user's short-term interest in this application. As shown in FIG. 2, the news recommendation device includes:
采集模块110,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵,优选地,所述行为数据还包括新闻矩阵和行为矩阵,所述行为矩阵为用户矩阵中的每个用户对新闻矩阵中的每个新闻的行为指标构成的矩阵;The collection module 110 collects user behavior data on news. The behavior data includes a news matrix. Preferably, the behavior data further includes a news matrix and a behavior matrix. The behavior matrix is a news matrix of each user in the user matrix. A matrix of behavioral indicators for each news in;
词向量矩阵模块120,根据所述新闻矩阵获得对应的词向量矩阵;The word vector matrix module 120 obtains a corresponding word vector matrix according to the news matrix;
聚类模块130,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;The clustering module 130 clusters the word vector matrix to obtain a grouping result of each news, and groups each news into a corresponding news group according to the grouping result;
用户画像获得模块140,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;The user portrait obtaining module 140 obtains a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the words contained in the news. The preference of the corresponding word vector;
第一相似度获得模块150,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;The first similarity obtaining module 150 analyzes the similarity between the long-term portrait of each user and different news groups to obtain multiple first similarities;
有偏好新闻组获得模块160,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;The preferred newsgroup obtaining module 160 sorts the plurality of first similarities in descending order, and obtains a first set number of newsgroups corresponding to each user based on the sorting result;
第二相似度获得模块170,分析每个用户最新的短期画像与所述第一设定 数量的新闻组中每个新闻之间的第二相似度;The second similarity obtaining module 170 analyzes the second similarity between the latest short-term portrait of each user and each news in the first set number of news groups;
二分图构建模块180,根据所述第二相似度构建用户-新闻二分图;The bipartite graph construction module 180 constructs a user-news bipartite graph according to the second similarity;
推荐模块190,在所述二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。The recommendation module 190 selects the recommended news by using an absorption random walk method on the bipartite graph, so as to obtain the recommended news of each user.
优选地,上述聚类模块130包括:Preferably, the aforementioned clustering module 130 includes:
层次聚类单元,对词向量矩阵模块的词向量矩阵进行层次聚类,得到层次聚类树状图,所述层次聚类树状图的一个叶结点对应一个新闻;The hierarchical clustering unit performs hierarchical clustering on the word vector matrix of the word vector matrix module to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
邓恩指数获得单元,获得层次聚类单元的每一次聚类结果对应的邓恩指数;Dunn index obtaining unit, to obtain the Dunn index corresponding to each clustering result of the hierarchical clustering unit;
切割单元,通过邓恩指数获得单元获得的邓恩指数最大值对应的层对所述层次聚类单元的层次聚类树状图进行切割,获得最佳层次聚类树状图;A cutting unit, cutting the hierarchical clustering dendrogram of the hierarchical clustering unit through the layer corresponding to the maximum Dunn index obtained by the Dunn index obtaining unit to obtain the best hierarchical clustering dendrogram;
新闻分组单元,将切割单元切割形成的最佳层次聚类树状图中属于同一父节点的叶结点对应的新闻属于同一新闻组,从而获得每个新闻的新闻分组。The news grouping unit cuts the cutting unit to form the best hierarchical clustering dendrogram and the news corresponding to the leaf nodes belonging to the same parent node belong to the same news group, thereby obtaining the news grouping of each news.
另外,优选地,上述新闻推荐装置还包括:主题矩阵构建模块,对词向量矩阵使用线性判别分析方法进行分析,获得每个新闻的多个主题的主题概率矩阵及每个主题对应的不同的词向量的词概率矩阵,通过每个新闻的主题概率矩阵、词概率矩阵、词向量矩阵组合获得每个新闻的主题值,每个新闻的主题值构成主题矩阵,In addition, preferably, the above-mentioned news recommendation device further includes: a topic matrix construction module, which analyzes the word vector matrix using a linear discriminant analysis method to obtain topic probability matrices of multiple topics of each news and different words corresponding to each topic The word probability matrix of the vector, the topic value of each news is obtained through the combination of the topic probability matrix, word probability matrix, and word vector matrix of each news. The topic value of each news forms the topic matrix.
其中,所述聚类模块130通过主题矩阵构建模块构建的主题矩阵获得每个新闻组的主题向量;第一相似度获得模块150采用向量相似度度量方法确定用户长期画像与每个新闻组的主题向量的第一相似度;所述第二相似度获得模块170采用向量相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度。Wherein, the clustering module 130 obtains the topic vector of each news group through the topic matrix constructed by the topic matrix building module; the first similarity obtaining module 150 uses the vector similarity measurement method to determine the long-term portrait of the user and the topic of each news group The first similarity of the vector; the second similarity obtaining module 170 uses a vector similarity measurement method to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
此外,本申请还提供一种基于用户短期兴趣的新闻推荐方法。参照图3所示,为本申请基于用户短期兴趣的新闻推荐方法较佳实施例的流程图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。In addition, this application also provides a news recommendation method based on users' short-term interests. Referring to FIG. 3, it is a flowchart of a preferred embodiment of a news recommendation method based on a user's short-term interest in this application. The method can be executed by a device, and the device can be implemented by software and/or hardware.
在本实施例中,基于用户短期兴趣的新闻推荐方法包括:In this embodiment, the news recommendation method based on the user's short-term interest includes:
步骤S1,采集用户对新闻的行为数据,所述行为数据包括用户矩阵,优选地还包括新闻矩阵和行为矩阵,所述行为矩阵为用户矩阵中的每个用户对新闻矩阵中的每个新闻的行为指标构成的矩阵Step S1: Collect user behavior data about news. The behavior data includes a user matrix, preferably a news matrix and a behavior matrix. The behavior matrix is the behavioral data of each user in the user matrix to each news in the news matrix. Matrix of behavior indicators
U=[u 1,u 2,…,u a] U=[u 1 , u 2 ,..., u a ]
N=[n 1,n 2,…,n b] N=[n 1 , n 2 ,..., n b ]
Figure PCTCN2019103700-appb-000001
Figure PCTCN2019103700-appb-000001
其中,U为用户矩阵,a为用户总数,N为新闻矩阵,b为新闻总数,UN为每个用户对每个新闻的行为指标构成的行为矩阵,UN a为第a个用户的行为向量,un ab为第a个用户对第b个新闻的行为指标,行为指标包括点击次数、阅 读次数、点赞次数、评价次数、阅读时长、点击频次(单位时间的点击次数)、阅读频次、点赞频次和评价频次中的一个或多个,例如,通过网络爬虫技术采集新闻网站的用户浏览历史,将用户标识符排序组成用户矩阵,将新闻网站中的新闻标识符排序组成新闻矩阵,将任一用户对任一新闻的点击次数作为所述用户对所述新闻行为指标,当用户没有浏览新闻时,所述用户对所述新闻的点击次数为0,构成行为矩阵; Among them, U is the user matrix, a is the total number of users, N is the news matrix, b is the total number of news, UN is the behavior matrix formed by each user's behavior indicators for each news, and UN a is the behavior vector of the a-th user, un ab is the behavior indicator of the a-th user on the b-th news. The behavior indicators include the number of clicks, the number of reads, the number of likes, the number of evaluations, the length of reading, the frequency of clicks (the number of clicks per unit time), the frequency of reading, and the like One or more of frequency and evaluation frequency, for example, collecting user browsing history of news websites through web crawler technology, sorting user identifiers into a user matrix, sorting news identifiers in news websites into a news matrix, and dividing any The number of times the user clicks on any news is used as the user's behavior indicator for the news. When the user is not browsing news, the number of clicks by the user on the news is 0, which constitutes a behavior matrix;
步骤S2,根据新闻矩阵获得对应的词向量矩阵,也就是说,将新闻矩阵中每个新闻中的词语转化为词向量构成对应的词向量矩阵Step S2: Obtain the corresponding word vector matrix according to the news matrix, that is to say, convert the words in each news in the news matrix into word vectors to form the corresponding word vector matrix
Figure PCTCN2019103700-appb-000002
Figure PCTCN2019103700-appb-000002
其中,W为所有新闻的词向量矩阵,c为新闻中最长词向量的个数,w bc表示第b个新闻中第c个词的词向量,当新闻词向量个数不够c个时,用零填充,W b为第b个新闻的词向量矩阵; Among them, W is the word vector matrix of all news, c is the number of the longest word vector in the news, w bc represents the word vector of the c-th word in the b-th news, when the number of news word vectors is not enough c, Fill it with zeros, W b is the word vector matrix of the b-th news;
步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组,所述新闻组表示新闻聚类的分群;Step S3, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result, and the news group represents the grouping of news clusters;
步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期和短期是针对时间而言(例如,长期可以为一个月,短期可以为一周),所述长期包括多个所述短期,所述长期画像和短期画像表示了用户对新闻包含的词对应的词向量的偏好;Step S4: Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term and short-term are in terms of time (for example, the long-term can be one month, The short-term may be one week), the long-term includes a plurality of the short-terms, and the long-term portrait and the short-term portrait represent the user's preference for the word vector corresponding to the word contained in the news;
步骤S5,分别分析每个用户的长期画像与每个新闻组之间的词向量的第一相似度;Step S5: separately analyze the first similarity of the word vector between the long-term portrait of each user and each news group;
步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;Step S6, sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
步骤S7,分别分析每个用户最接近分析时刻的短期画像与所述第一设定数量的新闻组中每个新闻之间的词向量的第二相似度;Step S7: respectively analyze the second similarity of the word vector between the short-term portrait of each user closest to the analysis time and each news in the first set number of news groups;
步骤S8,根据所述第二相似度构建用户-新闻二分图;Step S8, construct a user-news bipartite graph according to the second similarity;
步骤S9,在所述二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。Step S9: Use the absorption random walk method to select the recommended news on the bipartite graph, so as to obtain the recommended news of each user.
上述基于用户短期兴趣的新闻推荐方法在建立用户画像时强调用户兴趣演变的影响,将长期和短期用户无缝集成表示成用户的阅读偏好,建立特定新闻和用户之间的关系图,然后在这个图上执行吸收随机游走方法以选择具有不同主题的新闻文章。The above-mentioned news recommendation method based on users’ short-term interests emphasizes the influence of the evolution of user’s interests when establishing user portraits, and seamlessly integrates long-term and short-term users as users’ reading preferences, establishes a relationship diagram between specific news and users, and then The absorption random walk method is implemented on the graph to select news articles with different topics.
在本申请的一个实施例中,上述基于用户短期兴趣的新闻推荐方法包括:In an embodiment of the present application, the foregoing news recommendation method based on the user's short-term interest includes:
在步骤S4中,将每个新闻的词向量作为标签,所述长期画像和短期画像是用户对每个标签的偏好权重,In step S4, the word vector of each news is used as a label, and the long-term portrait and short-term portrait are the user's preference weight for each label,
Figure PCTCN2019103700-appb-000003
Figure PCTCN2019103700-appb-000003
Figure PCTCN2019103700-appb-000004
Figure PCTCN2019103700-appb-000004
其中,P为一个用户的短期画像,P'为一个用户的长期画像,P b表示所述用户对第b个新闻的短期的权重向量,p bc为所述用户对第b个新闻中第c个词向量的短期的权重; Where P is a short-term portrait of a user, P'is a long-term portrait of a user, P b represents the short-term weight vector of the user for the b-th news, and p bc is the user's c-th news in the b-th news. Short-term weights of word vectors;
在步骤S5中,采用矩阵相似度度量方法确定用户长期画像与每个新闻组的第一相似度,例如,采用矩阵的相关系数、空间矢量的余弦定理等或者新闻组中新闻的词向量组成的新闻组矩阵与对应的长期画像子矩阵(包括新闻组的新闻的词向量的偏好)的相似度,又如,利用余弦函数将新闻组矩阵和长期画像子矩阵展平,采用向量相似度的方法获得第一相似度,再如,将新闻组矩阵和长期画像子矩阵相减之后的元素去平方再求和获得第一相似度;In step S5, the matrix similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup, for example, the correlation coefficient of the matrix, the cosine theorem of the space vector, etc., or the word vector of the news in the newsgroup The similarity between the newsgroup matrix and the corresponding long-term profile sub-matrix (including the preference of the word vector of newsgroup news). Another example is to use the cosine function to flatten the newsgroup matrix and the long-term profile sub-matrix, using the vector similarity method Obtain the first degree of similarity, for example, subtract the elements of the newsgroup matrix and the long-term portrait sub-matrix to square and then sum to obtain the first degree of similarity;
在步骤S7中,采用矩阵相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度;In step S7, a matrix similarity measurement method is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group;
在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻组进行排序,取前第二设定数量(小于第一设定数量)的新闻组,得到每个用户的所述第二设定数量的新闻组,根据每个用户与各自的第二设定数量的新闻组中的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定,评分越高,权重越大。In step S8, in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained. According to the second set number of newsgroups, a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news The higher the score, the greater the weight.
上述基于用户短期兴趣的新闻推荐方法通过用户长期画像和短期画像对新闻组进行了筛选,使得筛选出来的新闻组不仅符合用户的长期偏好而且符合用户的短期兴趣,使得新闻推荐准确性提高The above-mentioned news recommendation method based on the user's short-term interest screens newsgroups through the user's long-term portraits and short-term portraits, so that the selected newsgroups not only conform to the users' long-term preferences but also conform to the users' short-term interests, and improve the accuracy of news recommendation
在另一个实施例中,上述步骤S7中,采用欧氏距离、曼哈顿距离、切比雪夫距离、闵可夫斯基距离、标准化欧氏距离、马氏距离、夹角余弦、汉明距离、杰卡德距离&杰卡德相似系数、相关系数&相关距离等向量相似性度量方法获得用户短期画像与第一设定数量的新闻组中的每个新闻的第二相似度,例如,经过用户长期画像过滤后的第一设定数量的新闻组中的一个新闻组的一个新闻n i的词向量为W i=[w 11,w 12,…,w 1c],对应的用户短期画像的向量为P i=[p 11,p 12,…,p 1c],以欧式距离为例进行获得第二相似度的说明, In another embodiment, in the above step S7, Euclidean distance, Manhattan distance, Chebyshev distance, Minkowski distance, normalized Euclidean distance, Mahalanobis distance, angle cosine, Hamming distance, Jeckard Vector similarity measurement methods such as distance &Jaccard's similarity coefficient, correlation coefficient & correlation distance obtain the second similarity between the user's short-term portrait and each news in the first set number of newsgroups, for example, after the user's long-term portrait filtering The word vector of a news n i of a news group in the first set number of news groups is W i =[w 11 , w 12 ,..., w 1c ], and the vector of the corresponding short-term portrait of the user is P i =[p 11 , p 12 ,..., p 1c ], take Euclidean distance as an example to explain how to obtain the second similarity,
Figure PCTCN2019103700-appb-000005
Figure PCTCN2019103700-appb-000005
其中,d(P i,W i)为用户与新闻n 1的第二相似度; Among them, d(P i , W i ) is the second degree of similarity between the user and news n 1 ;
在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻进行排序, 取前第三设定数量的新闻,得到每个用户的所述第三设定数量的新闻,根据每个用户与各自的第三设定数量的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定,优选地,在步骤S8中,将第二相似度作为二分图上边线的权重构建用户-新闻二分图,也可以不进行第二相似度的排序直接构建用户-新闻二分图。In step S8, each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news. Preferably, in step S8, the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
上述基于用户短期兴趣的新闻推荐方法在新闻选择时共有两个阶段,首先利用长期画像来区分新闻组是否符合用户偏好,然后是通过短期画像来过滤特定新闻文章给用户,使得用户长期偏好和短期偏好无缝连接,提高了推荐的准确率。The above-mentioned news recommendation method based on users' short-term interests has two stages in news selection. First, long-term portraits are used to distinguish whether newsgroups meet user preferences, and then short-term portraits are used to filter specific news articles to users, so that users’ long-term preferences and short-term preferences Preference for seamless connection, which improves the accuracy of recommendations.
在本申请的第二实施例中,基于用户短期兴趣的新闻推荐方法包括:In the second embodiment of the present application, the news recommendation method based on the user's short-term interest includes:
在步骤S2中,对词向量矩阵使用LDA(Latent Dirichlet Allocation,线性判别分析)进行分析,得到每个新闻的主题值,从而得到主题矩阵,具体地,包括:通过LDA获得新闻矩阵中的每个新闻的多个主题的主题概率矩阵及每个主题对应的不同的词向量的词概率矩阵In step S2, LDA (Latent Dirichlet Allocation, linear discriminant analysis) is used to analyze the word vector matrix to obtain the topic value of each news, thereby obtaining the topic matrix, specifically including: obtaining each of the news matrix through LDA The topic probability matrix of multiple topics of news and the word probability matrix of different word vectors corresponding to each topic
Figure PCTCN2019103700-appb-000006
Figure PCTCN2019103700-appb-000006
Figure PCTCN2019103700-appb-000007
Figure PCTCN2019103700-appb-000007
其中,θ b为第b个新闻的主题概率矩阵,
Figure PCTCN2019103700-appb-000008
为第b个新闻对应第d个主题的概率,
Figure PCTCN2019103700-appb-000009
为第b个新闻的词概率矩阵,
Figure PCTCN2019103700-appb-000010
表示第d个主题生成第b个新闻中第c个词向量的概率;
Among them, θ b is the topic probability matrix of the b-th news,
Figure PCTCN2019103700-appb-000008
Is the probability that the b-th news corresponds to the d-th topic,
Figure PCTCN2019103700-appb-000009
Is the word probability matrix of the b-th news,
Figure PCTCN2019103700-appb-000010
Indicates the probability that the dth topic generates the cth word vector in the bth news;
通过每个新闻的主题概率矩阵、词概率矩阵、词向量矩阵组合获得每个新闻的主题值Get the topic value of each news through the combination of the topic probability matrix, word probability matrix, and word vector matrix of each news
Figure PCTCN2019103700-appb-000011
Figure PCTCN2019103700-appb-000011
其中,T b为第b个新闻的主题值,“.”表示矩阵相乘; Among them, T b is the topic value of the b-th news, "." means matrix multiplication;
每个新闻的主题值构成主题矩阵Z=[z 1,z 2,…,z b]。 The topic value of each news constitutes a topic matrix Z=[z 1 , z 2 ,..., z b ].
步骤S3中,对所述词向量矩阵进行聚类,得到每个新闻所属的新闻组,从而得到每个新闻组的主题向量,例如,一个新闻组为[n i,n j],对应的主题向量为[z i,z j]。 In step S3, the word vector matrix is clustered to obtain the news group to which each news belongs, thereby obtaining the topic vector of each news group. For example, a news group is [n i , n j ], corresponding to the topic The vector is [z i , z j ].
在步骤S4中,使用LDA作为检测潜在主题的语言模型,得到每个用户的长期画像和短期画像,具体地:通过每个新闻的主题概率矩阵、词概率矩阵和行为矩阵获得长期画像和短期画像,其中,将用户对新闻的行为指标作为用户对新闻中每个词向量的行为指标,In step S4, LDA is used as a language model for detecting potential topics, and a long-term portrait and a short-term portrait of each user are obtained. Specifically: the long-term portrait and the short-term portrait are obtained through the topic probability matrix, word probability matrix and behavior matrix of each news , Among them, the user’s behavioral index for news is taken as the user’s behavioral index for each word vector in the news,
un ab(c)=[un ab,un ab,…,un ab] T un ab (c)=[un ab , un ab ,..., un ab ] T
Figure PCTCN2019103700-appb-000012
Figure PCTCN2019103700-appb-000012
z a=[z a1,z a2,…,z ab] z a =[z a1 , z a2 ,..., z ab ]
其中,un ab(c)表示第a个用户对第b个新闻中c个词向量的行为向量, 也就是说,un ab(c)由c个un ab构成,z ab为第a个用户对第b个新闻的主题值,z a为第a个用户的长期画像或短期画像。 Among them, un ab (c) represents the behavior vector of the a-th user to the c word vectors in the b-th news, that is, un ab (c) is composed of c un abs , and z ab is the a-th user pair The topic value of the b-th news, z a is the long-term portrait or short-term portrait of the a-th user.
在步骤S5中,采用相似度度量方法确定用户长期画像与每个新闻组的第一相似度,优选地,采用余弦相似度的方法得到所述第一相似度In step S5, the similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup. Preferably, the cosine similarity method is used to obtain the first similarity.
Figure PCTCN2019103700-appb-000013
Figure PCTCN2019103700-appb-000013
其中,s m,n表示第m个长期画像与第n个新闻组的相似度,(x 1,x 2,...,x b)为第m个长期画像的主题向量,(y 1,y 2,...,y b)为第n个新闻组主题向量,例如,一个新闻组X包括第一个新闻和第三个新闻,新闻组的主题向量为(z 1,z 3),对应的第a个用户的长期画像向量为(Z a1,Z a3),
Figure PCTCN2019103700-appb-000014
Among them, sm , n represents the similarity between the m-th long-term portrait and the n-th newsgroup, (x 1 , x 2 ,..., x b ) is the topic vector of the m-th long-term portrait, (y 1 , y 2 ,...,y b ) is the nth newsgroup topic vector. For example, a newsgroup X includes the first news and the third news, and the topic vector of the newsgroup is (z 1 ,z 3 ), The corresponding long-term portrait vector of the a-th user is (Z a1 ,Z a3 ),
Figure PCTCN2019103700-appb-000014
在步骤S7中,采用步骤S5的相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度。In step S7, the similarity measurement method of step S5 is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻组进行排序,取前第二设定数量(小于第一设定数量)的新闻组,得到每个用户的所述第二设定数量的新闻组,根据每个用户与各自的第二设定数量的新闻组中的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定。In step S8, in the second similarity of each user, each news group is sorted in descending order, and the second set number (less than the first set number) of the news group is taken, and all the news groups of each user are obtained. According to the second set number of newsgroups, a user-news bipartite graph is constructed according to the news of each user and the second set number of newsgroups, where the weight of the upper edge of the bipartite graph is set according to the user’s rating of news set.
上述基于用户短期兴趣的新闻推荐方法通过LDA分析获得每个新闻的主题向量和用户短期画像和长期画像向量,通过相似度对新闻组进行筛选,在降低计算量的同时,保证了推荐的准确性。The above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens newsgroups through similarity, which reduces the amount of calculation while ensuring the accuracy of recommendation .
在一个可选实施例中,上述基于用户短期兴趣的新闻推荐方法中:In an optional embodiment, in the above-mentioned news recommendation method based on the user's short-term interest:
在步骤S4中,长期画像通过公式(3)获得,短期画像通过下式(5)获得In step S4, the long-term portrait is obtained by formula (3), and the short-term portrait is obtained by the following formula (5)
Figure PCTCN2019103700-appb-000015
Figure PCTCN2019103700-appb-000015
在步骤S7中,采用相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的每个新闻的第二相似度,优选地,采用余弦相似度的方法得到所述第一相似度In step S7, the similarity measurement method is used to determine the second similarity between the short-term portrait of the user and each news of each news group of the first set number. Preferably, the cosine similarity method is used to obtain the second similarity. A similarity
Figure PCTCN2019103700-appb-000016
Figure PCTCN2019103700-appb-000016
其中,s′ m,n表示第m个短期画像与第n个新闻的相似度,(x 1,x 2,...,x c)为第m个短期画像的主题向量,(y 1,y 2,...,y c)为第n个新闻的词向量,均为1×c的向 量。 Among them, s′ m,n represents the similarity between the m-th short-term portrait and the n-th news, (x 1 ,x 2 ,...,x c ) is the topic vector of the m-th short-term portrait, (y 1 , y 2 ,...,y c ) are the word vectors of the nth news, all of which are 1×c vectors.
在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻进行排序,取前第三设定数量的新闻,得到每个用户的所述第三设定数量的新闻,根据每个用户与各自的第三设定数量的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定,优选地,在步骤S8中,将第二相似度作为二分图上边线的权重构建用户-新闻二分图,也可以不进行第二相似度的排序直接构建用户-新闻二分图。In step S8, each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to Each user constructs a user-news bipartite graph with their respective third set number of news, wherein the weight of the sideline on the bipartite graph is set according to the user’s rating of the news. Preferably, in step S8, the second similarity The user-news bipartite graph is constructed as the weight of the upper edge of the bipartite graph, or the user-news bipartite graph can be constructed directly without the second similarity ranking.
上述基于用户短期兴趣的新闻推荐方法通过LDA分析获得每个新闻的主题向量和用户短期画像和长期画像向量,分别对新闻组和新闻进行了筛选,降低计算量,提高推荐速度的同时提高了推荐的准确性。The above-mentioned news recommendation method based on the user's short-term interest obtains the topic vector of each news and the user's short-term portrait and long-term portrait vector through LDA analysis, and screens news groups and news respectively, reduces the amount of calculation, increases the speed of recommendation, and improves the recommendation. Accuracy.
优选地,在步骤S2中,对词向量矩阵使用LDA进行分析,通过下式(7)得到每个新闻的主题向量Preferably, in step S2, LDA is used to analyze the word vector matrix, and the topic vector of each news is obtained by the following formula (7)
Figure PCTCN2019103700-appb-000017
Figure PCTCN2019103700-appb-000017
在步骤S7中,通过每个用户短期画像与每个新闻的主题向量的相似度获得每个用户短期画像与每个新闻的第二相似度。In step S7, the second similarity between each user's short-term portrait and each news is obtained by the similarity between each user's short-term portrait and the topic vector of each news.
在上述各实施例中,在步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像的步骤还包括:In each of the foregoing embodiments, in step S4, the step of obtaining the long-term portrait and the short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news respectively further includes:
设定时间帧,将所述时间帧作为短期,长期包括多个时间帧;Set a time frame, regard the time frame as a short-term, and the long-term includes multiple time frames;
根据用户在每个时间帧内对新闻的各词向量的行为数据获得用户在每个时间帧的用户画像,从而获得每个时间帧的用户短期画像;Obtain the user portrait of the user in each time frame according to the user's behavior data of each word vector of the news in each time frame, thereby obtaining the short-term portrait of the user in each time frame;
根据用户在每个时间帧的用户画像采用加权的方式获得用户的长期画像,其中,距离分析时刻越近的用户短期画像的权重越大。The long-term portrait of the user is obtained in a weighted manner according to the user portrait of the user in each time frame, wherein the short-term portrait of the user closer to the analysis time has a higher weight.
优选地,采用时间方程将多个用户短期画像加权组合为用户长期画像Preferably, a time equation is used to weighted combination of multiple short-term portraits of users into a long-term portrait of users
Figure PCTCN2019103700-appb-000018
Figure PCTCN2019103700-appb-000018
其中,P u代表长期画像,
Figure PCTCN2019103700-appb-000019
代表第g个时间帧t g对应的短期画像,f(t)为时间方程f(t)=e -λt,λ为时间方程的常数参数。
Among them, P u represents a long-term portrait,
Figure PCTCN2019103700-appb-000019
Represents the short-term image corresponding to the g-th time frame t g , f(t) is the time equation f(t)=e- λt , and λ is the constant parameter of the time equation.
上述基于用户短期兴趣的新闻推荐方法首先构建了一个基于时间敏感加权的给定用户长期画像,然后通过分析用户最新的阅读历史来分析他的短期偏好。在推荐时,我们建立一个基于长期和短期用户画像的用户-物品二分图,然后通过吸收随机游走方法算法来在不同的主题中选择新闻,不仅可以提供相关的关于用户兴趣的新闻文章,也可以通过引入不同主题的文章来拓展用户的偏好。The aforementioned news recommendation method based on the user's short-term interests first constructs a long-term portrait of a given user based on time-sensitive weighting, and then analyzes the user's latest reading history to analyze his short-term preferences. When recommending, we build a user-item bipartite graph based on long-term and short-term user portraits, and then select news from different topics by absorbing the random walk method algorithm, which can not only provide relevant news articles about user interests, but also You can expand user preferences by introducing articles on different topics.
在上述各实施例中,在步骤S3中,所述对所述词向量矩阵进行聚类步骤包括:In the foregoing embodiments, in step S3, the step of clustering the word vector matrix includes:
对词向量矩阵进行层次聚类,得到层次聚类树状图,所述层次聚类树状图的一个叶结点对应一个新闻;Perform hierarchical clustering on the word vector matrix to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
获得层次聚类每一次聚类结果对应的邓恩指数,在邓恩指数最大值对应 的层对上述层次聚类树状图进行切割,获得最佳层次聚类树状图,最佳层次聚类树状图中属于同一父节点的叶结点对应的新闻属于同一新闻组,从而获得每个新闻的新闻分组。上述对所述词向量矩阵进行聚类方法首先使用分层凝聚聚类算法构建纯粹基于新闻文章内容的新闻层次结构,然后使用Dunn的有效性指数决定最佳层次树状图,避免了决定簇的数量。邓恩指数计算任意两个簇元素之间的最短距离(类间)除以任意簇中的最大距离(类内),指数越大说明类间距离越大,类内距离越小,使用邓恩指数决定在哪一层对树状图进行切割,获得新闻分组之后,可以对每一组使用LDA进行分析,并将每个组的主题用主题向量表示,以便配合长期用户画像进行组过滤。Obtain the Dunn index corresponding to each clustering result of hierarchical clustering, and cut the above-mentioned hierarchical clustering dendrogram at the layer corresponding to the maximum value of Dunn's index to obtain the best hierarchical clustering dendrogram and the best hierarchical clustering The news corresponding to the leaf nodes belonging to the same parent node in the tree diagram belong to the same news group, thereby obtaining the news grouping of each news. The above method of clustering the word vector matrix first uses a hierarchical agglomerative clustering algorithm to construct a news hierarchy purely based on the content of news articles, and then uses Dunn’s effectiveness index to determine the best hierarchical dendrogram, which avoids the cluster decision Quantity. Dunn index calculates the shortest distance between any two cluster elements (between clusters) divided by the maximum distance (within cluster) in any cluster. The larger the index, the greater the distance between clusters and the smaller the distance within the cluster. Use Dunn The index decides which layer to cut the tree diagram. After obtaining news groups, LDA can be used to analyze each group, and the theme of each group can be represented by a theme vector to match the long-term user portrait for group filtering.
在一个实施例中,在步骤S9中,通过吸收随机游走方法在不同的主题中选择新闻。吸收随机游走方法首选选择一个初始点,然后以p的概率随机跳到图上任意一点,剩下1-p的概率会按边的权重分配给相邻点,之后每一次都以相同的概率跳到随机点或者相邻点,采用转移矩阵来计算跳转概率,经过几次迭代之后,跳转概率趋于稳定,转移概率最高的新闻将会被推荐,之后吸收随机游走方法会将降低该文章同类文章的跳转概率,以此来达到选择更多种类新闻的目的。这样,本申请所述基于用户短期兴趣的新闻推荐方法不仅可以提供相关的关于用户兴趣的新闻文章,也可以通过引入不同主题的文章来拓展用户的偏好。In one embodiment, in step S9, news is selected in different topics by absorbing random walk method. The absorbing random walk method first chooses an initial point, and then randomly jumps to any point on the graph with the probability of p. The remaining 1-p probability will be assigned to the adjacent points according to the weight of the edge, and the same probability will be used every time. Jump to a random point or adjacent point, and use the transition matrix to calculate the jump probability. After several iterations, the jump probability stabilizes, and the news with the highest transition probability will be recommended, and the random walk method will decrease afterwards. The jump probability of the same article of the article in order to achieve the purpose of selecting more types of news. In this way, the news recommendation method based on the user's short-term interest described in this application can not only provide relevant news articles about the user's interest, but also expand the user's preferences by introducing articles on different topics.
在另一个实施例中,步骤S9包括:In another embodiment, step S9 includes:
在用户新闻二分图中,每个用户作为一个节点,每个新闻也作为一个节点,采用随机游走重启的方法获得各节点之间的相关值;In the user news bipartite graph, each user acts as a node, and each news also acts as a node. The random walk restart method is used to obtain the correlation value between the nodes;
获得每个用户节点的相邻节点构成的每个用户的相邻集合,从相邻集合中任意两个节点之间的相关值构成每个用户的第一子相关矩阵,将所述第一子相关矩阵中非对角线元素均值的倒数作为每个用户的桥接值,结合相邻集合中用户节点的桥接值构成每个用户的桥接矩阵,例如,一个用户节点u 1,其相邻集合为[n 2,n 4,u 3],用户节点u 1的第一自相关矩阵
Figure PCTCN2019103700-appb-000020
r 23为新闻节点n 2与用户节点u 3的相关值,用户节点u 1的桥接值q 1为第一相关矩阵中非对角线元素均值的,即
Figure PCTCN2019103700-appb-000021
Figure PCTCN2019103700-appb-000022
用户节点u 1的桥接矩阵为[q 1,q 3];
Obtain the adjacent set of each user formed by the adjacent nodes of each user node, form the first sub-correlation matrix of each user from the correlation value between any two nodes in the adjacent set, and divide the first sub-correlation matrix The reciprocal of the mean value of the off-diagonal elements in the correlation matrix is used as the bridging value of each user, combined with the bridging values of user nodes in adjacent sets to form the bridging matrix of each user, for example, a user node u 1 , and its adjacent set is [n 2 , n 4 , u 3 ], the first autocorrelation matrix of user node u 1
Figure PCTCN2019103700-appb-000020
r 23 is the correlation value between news node n 2 and user node u 3 , and the bridge value q 1 of user node u 1 is the mean value of the off-diagonal elements in the first correlation matrix, namely
Figure PCTCN2019103700-appb-000021
Figure PCTCN2019103700-appb-000022
The bridging matrix of user node u 1 is [q 1 , q 3 ];
将每个用户节点及其相邻集合中的用户节点与相邻集合中新闻节点的相关值构成每个用户的第二子相关矩阵,如上例,用户节点u 1的第二子相关矩阵
Figure PCTCN2019103700-appb-000023
The correlation value of each user node and the user node in the adjacent set and the news node in the adjacent set constitutes the second sub-correlation matrix of each user, as in the above example, the second sub-correlation matrix of user node u 1
Figure PCTCN2019103700-appb-000023
每个用户的桥接矩阵和第二子相关矩阵相乘获得上述新闻节点的推荐值;The bridge matrix of each user and the second sub-correlation matrix are multiplied to obtain the recommended value of the news node;
按照推荐值由大到小的顺序对新闻节点进行排序,取排序靠前的设定数量的新闻对该用户进行推荐。The news nodes are sorted according to the recommended value in descending order, and the set number of news with the highest sorting is selected to recommend the user.
优选地,所述采用随机游走重启的方法获得各节点之间的相关值的步骤包括:Preferably, the step of using a random walk restart method to obtain correlation values between nodes includes:
以一个节点为出发节点,将所述一个节点与其他节点的第二相似度组成的向量作为重启向量,计算二分图上各个节点之间的跳转概率;Taking a node as a starting node, and using a vector composed of the second similarity between the one node and other nodes as a restart vector, and calculating the jump probability between each node on the bipartite graph;
将所述节点之间的跳转概率组成邻接矩阵;Compose the jump probability between the nodes into an adjacency matrix;
对邻接矩阵进行迭代处理,直到邻接矩阵收敛,所述收敛后的邻接矩阵矩阵中元素为所述一个节点与一个其他节点的两者之间的相关值。Iterative processing is performed on the adjacency matrix until the adjacency matrix converges, and the elements in the adjacency matrix after the convergence are the correlation values between the one node and the other node.
此外,本申请实施例还提出一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质中包括基于用户短期兴趣的新闻推荐程序,所述基于用户短期兴趣的新闻推荐程序被处理器执行时实现如下步骤:In addition, an embodiment of the present application also proposes a computer non-volatile readable storage medium, the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interest, and the news based on the user's short-term interest The following steps are implemented when the recommended program is executed by the processor:
步骤S1,采集用户对新闻的行为数据,所述行为数据包括用户矩阵;Step S1: Collect user behavior data on news, the behavior data includes a user matrix;
步骤S2,根据所述新闻矩阵获得对应的词向量矩阵;Step S2: Obtain a corresponding word vector matrix according to the news matrix;
步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;Step S3, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;Step S4: Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news. Vector preference
步骤S5,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;Step S5: Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities;
步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;Step S6, sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
步骤S7,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;Step S7, analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
步骤S8,根据所述第二相似度构建用户新闻二分图;Step S8, construct a user news bipartite graph according to the second similarity;
步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。Step S9: Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
本申请之计算机非易失性可读存储介质的具体实施方式与上述基于用户短期兴趣的新闻推荐方法及装置、电子设备的具体实施方式大致相同,在此不再赘述。The specific implementation of the computer non-volatile readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned news recommendation method and device based on the user's short-term interest, and electronic equipment, and will not be repeated here.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (20)

  1. 一种基于用户短期兴趣的新闻推荐方法,其特征在于,包括:A news recommendation method based on users' short-term interests, which is characterized in that it includes:
    步骤S1,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵;Step S1, collecting user behavior data on news, the behavior data including a news matrix;
    步骤S2,根据所述新闻矩阵获得对应的词向量矩阵;Step S2: Obtain a corresponding word vector matrix according to the news matrix;
    步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;Step S3, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
    步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;Step S4: Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news. Vector preference
    步骤S5,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;Step S5: Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities;
    步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;Step S6, sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
    步骤S7,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;Step S7, analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
    步骤S8,根据所述第二相似度构建用户新闻二分图;Step S8, construct a user news bipartite graph according to the second similarity;
    步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。Step S9: Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  2. 根据权利要求1所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S3中,对所述词向量矩阵进行聚类的步骤包括:The news recommendation method based on the user's short-term interests according to claim 1, wherein in step S3, the step of clustering the word vector matrix comprises:
    对词向量矩阵进行层次聚类,得到层次聚类树状图,所述层次聚类树状图的一个叶结点对应一个新闻;Perform hierarchical clustering on the word vector matrix to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
    获得层次聚类每一次聚类结果对应的邓恩指数,在邓恩指数最大值对应的层对所述层次聚类树状图进行切割,获得最佳层次聚类树状图,最佳层次聚类树状图中属于同一父节点的叶结点对应的新闻属于同一新闻组,从而获得每个新闻的新闻分组。The Dunn index corresponding to each clustering result of hierarchical clustering is obtained, and the hierarchical clustering dendrogram is cut at the layer corresponding to the maximum value of Dunn index to obtain the best hierarchical clustering dendrogram and the best hierarchical clustering. The news corresponding to the leaf nodes that belong to the same parent node in the class tree graph belong to the same news group, thereby obtaining the news grouping of each news.
  3. 根据权利要求2所述的基于用户短期兴趣的新闻推荐方法,其特征在于,The news recommendation method based on users' short-term interests according to claim 2, characterized in that:
    在步骤S2中,对词向量矩阵使用线性判别分析方法进行分析,获得每个新闻的多个主题的主题概率矩阵及每个主题对应的不同的词向量的词概率矩阵,通过每个新闻的主题概率矩阵、词概率矩阵、词向量矩阵组合获得每个新闻的主题值,每个新闻的主题值构成主题矩阵;In step S2, the word vector matrix is analyzed using the linear discriminant analysis method, and the topic probability matrix of multiple topics of each news and the word probability matrix of different word vectors corresponding to each topic are obtained. Through each news topic The probability matrix, word probability matrix, and word vector matrix are combined to obtain the topic value of each news, and the topic value of each news constitutes the topic matrix;
    在步骤S3中,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组,从而得到每个新闻组的新闻的主题值构成的主题向量;In step S3, the word vector matrix is clustered to obtain the grouping result of each news, and each news is grouped into corresponding news groups according to the grouping result, so as to obtain the topic value of the news of each news group Constitute the subject vector;
    在步骤S4中,使用线性判别分析方法作为检测潜在主题的语言模型,得到每个用户的长期画像和短期画像;In step S4, a linear discriminant analysis method is used as a language model for detecting potential topics to obtain a long-term portrait and a short-term portrait of each user;
    在步骤S5中,采用向量相似度度量方法确定用户长期画像与每个新闻组 的第一相似度;In step S5, a vector similarity measurement method is used to determine the first similarity between the long-term portrait of the user and each newsgroup;
    在步骤S7中,采用向量相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度;In step S7, a vector similarity measurement method is used to determine the second similarity between the short-term portrait of the user and the first set number of each news group;
    在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻组进行排序,取前第二设定数量的新闻组,得到每个用户的所述第二设定数量的新闻组,根据每个用户与各自的第二设定数量的新闻组中的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定,评分越高,边线的权重越大。In step S8, each news group is sorted in descending order in the second similarity of each user, and the second set number of news groups are taken to obtain the second set number of news for each user Group, construct a user-news bipartite graph based on the news of each user and the second set number of news groups. The weight of the sideline on the bipartite graph is set according to the user’s rating of news. The higher the rating, the more the sideline The greater the weight.
  4. 根据权利要求3所述的基于用户短期兴趣的新闻推荐方法,其特征在于,The news recommendation method based on the user's short-term interest according to claim 3, characterized in that:
    在步骤S1中,所述行为数据还包括用户矩阵和行为矩阵,所述行为矩阵为用户矩阵中的每个用户对新闻矩阵中的每个新闻的行为指标构成的矩阵;In step S1, the behavior data further includes a user matrix and a behavior matrix, and the behavior matrix is a matrix composed of behavior indicators of each user in the user matrix for each news in the news matrix;
    在步骤S4中,使用线性判别分析方法作为检测潜在主题的语言模型,得到每个用户的长期画像和短期画像的方法包括:In step S4, the linear discriminant analysis method is used as a language model for detecting potential topics, and the methods for obtaining long-term and short-term portraits of each user include:
    对词向量矩阵使用线性判别分析方法进行分析,获得每个新闻的多个主题的主题概率矩阵及每个主题对应的不同的词向量的词概率矩阵;Use the linear discriminant analysis method to analyze the word vector matrix to obtain the topic probability matrix of multiple topics of each news and the word probability matrix of different word vectors corresponding to each topic;
    通过每个新闻的主题概率矩阵、词概率矩阵和行为矩阵根据下式获得长期画像和短期画像,其中,将用户对新闻的行为指标作为用户对新闻中每个词向量的行为指标Through the topic probability matrix, word probability matrix and behavior matrix of each news, long-term portraits and short-term portraits are obtained according to the following formula. Among them, the user's behavior index for news is used as the user's behavior index for each word vector in the news
    Figure PCTCN2019103700-appb-100001
    Figure PCTCN2019103700-appb-100001
    其中,un ab(c)=[un ab,un ab,…,un ab] T,un ab(c)表示第a个用户对第b个新闻中c个词向量的长期或短期的行为向量,z ab为第a个用户对第b个新闻的长期或短期的主题值,z a=[z a1,z a2,…,z ab],z a为第a个用户的长期画像或短期画像,θ b为第b个新闻的主题概率矩阵,
    Figure PCTCN2019103700-appb-100002
    为第b个新闻的词概率矩阵。
    Among them, un ab (c) = [un ab , un ab ,..., un ab ] T , un ab (c) represents the long-term or short-term behavior vector of the a-th user to the c word vector in the b-th news, z ab is the long-term or short-term topic value of the b-th news of the a-th user, z a =[z a1 , z a2 ,..., z ab ], z a is the long-term or short-term portrait of the a-th user, θ b is the topic probability matrix of the b-th news,
    Figure PCTCN2019103700-appb-100002
    Is the word probability matrix of the b-th news.
  5. 根据权利要求4所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S4中,短期画像通过下式获得The news recommendation method based on the user's short-term interest according to claim 4, characterized in that, in step S4, the short-term portrait is obtained by the following formula
    Figure PCTCN2019103700-appb-100003
    Figure PCTCN2019103700-appb-100003
    在步骤S7中,采用相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的每个新闻的第二相似度;In step S7, a similarity measurement method is used to determine the second similarity between the short-term portrait of the user and each news of each news group of the first set number;
    在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻进行排序,取前第三设定数量的新闻,得到每个用户的所述第三设定数量的新闻,根据每个用户与各自的第三设定数量的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定。In step S8, each news is sorted in descending order in the second similarity of each user, and the first third set number of news is taken to obtain the third set number of news for each user, according to Each user constructs a user-news bipartite graph with a third set number of news, in which the weight of the upper edge of the bipartite graph is set according to the user's rating of the news.
  6. 根据权利要求5所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S8中,将第二相似度作为二分图上边线的权重构建用户-新闻二分图,进行第二相似度的排序或不进行第二相似度的排序构建用户-新闻二分图。The news recommendation method based on the user's short-term interest according to claim 5, characterized in that, in step S8, the second similarity is used as the weight of the upper edge of the bipartite graph to construct the user-news bipartite graph, and the second similarity The user-news bipartite graph is constructed with or without the second similarity ranking.
  7. 根据权利要求3所述的基于用户短期兴趣的新闻推荐方法,其特征在于, 在步骤S5中,采用相似度度量方法确定用户长期画像与每个新闻组的第一相似度的步骤包括:采用余弦相似度的方法得到所述第一相似度The news recommendation method based on the user's short-term interest according to claim 3, characterized in that, in step S5, the step of using a similarity measurement method to determine the first similarity between the user's long-term portrait and each news group includes: adopting cosine Similarity method to obtain the first similarity
    Figure PCTCN2019103700-appb-100004
    Figure PCTCN2019103700-appb-100004
    其中,s m,n表示第m个长期画像与第n个新闻组的相似度,(x 1,x 2,...,x b)为第m个长期画像的主题向量,(y 1,y 2,...,y b)为第n个新闻组主题向量。 Among them, sm , n represents the similarity between the m-th long-term portrait and the n-th newsgroup, (x 1 , x 2 ,..., x b ) is the topic vector of the m-th long-term portrait, (y 1 , y 2 ,...,y b ) is the nth newsgroup topic vector.
  8. 根据权利要求3所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S2中,对词向量矩阵使用LDA进行分析,通过下式得到每个新闻的主题向量The news recommendation method based on user short-term interests according to claim 3, characterized in that, in step S2, the word vector matrix is analyzed using LDA, and the topic vector of each news is obtained by the following formula
    Figure PCTCN2019103700-appb-100005
    Figure PCTCN2019103700-appb-100005
    其中,θ b为第b个新闻的主题概率矩阵,
    Figure PCTCN2019103700-appb-100006
    为第b个新闻的词概率矩阵,z' b为第b个新闻的主题向量;
    Among them, θ b is the topic probability matrix of the b-th news,
    Figure PCTCN2019103700-appb-100006
    Is the word probability matrix of the b-th news, z' b is the topic vector of the b-th news;
    在步骤S7中,通过每个用户短期画像与每个新闻的主题向量的相似度获得每个用户短期画像与每个新闻的第二相似度。In step S7, the second similarity between each user's short-term portrait and each news is obtained by the similarity between each user's short-term portrait and the topic vector of each news.
  9. 根据权利要求1所述的基于用户短期兴趣的新闻推荐方法,其特征在于,所述通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像的步骤包括:The news recommendation method based on users' short-term interests according to claim 1, characterized in that the long-term and short-term portraits of each user are obtained through the long-term behavior data and short-term behavior data of each user for each news. The steps include:
    设定时间帧,将所述时间帧作为短期,长期包括多个时间帧;Set a time frame, regard the time frame as a short-term, and the long-term includes multiple time frames;
    根据用户在每个时间帧内对新闻的各词向量的行为数据获得用户在每个时间帧的用户画像,从而获得每个时间帧的用户短期画像;Obtain the user portrait of the user in each time frame according to the user's behavior data of each word vector of the news in each time frame, thereby obtaining the short-term portrait of the user in each time frame;
    根据用户在每个时间帧的用户画像采用加权的方式获得用户的长期画像,其中,距离分析时刻越近的用户短期画像的权重越大。The long-term portrait of the user is obtained in a weighted manner according to the user portrait of the user in each time frame, wherein the short-term portrait of the user closer to the analysis time has a higher weight.
  10. 根据权利要求9所述的基于用户短期兴趣的新闻推荐方法,其特征在于,所述根据用户在每个时间帧的用户画像采用加权的方式获得用户的长期画像的步骤包括:The news recommendation method based on the user's short-term interest according to claim 9, wherein the step of obtaining the long-term portrait of the user in a weighted manner according to the user portrait of the user in each time frame comprises:
    采用时间方程将多个用户短期画像加权组合为用户长期画像Use the time equation to weight multiple user short-term portraits into a user long-term portrait
    Figure PCTCN2019103700-appb-100007
    Figure PCTCN2019103700-appb-100007
    其中,P u代表长期画像,
    Figure PCTCN2019103700-appb-100008
    代表第g个时间帧t g对应的短期画像,f(t)为时间方程f(t)=e -λt,λ为时间方程的常数参数。
    Among them, P u represents a long-term portrait,
    Figure PCTCN2019103700-appb-100008
    Represents the short-term image corresponding to the g-th time frame t g , f(t) is the time equation f(t)=e- λt , and λ is the constant parameter of the time equation.
  11. 根据权利要求1所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S4中,将每个新闻的词向量作为标签,所述长期画像和短期画像是用户对每个标签的偏好权重;在步骤S5中,采用矩阵相似度度量方法确定用户长期画像与每个新闻组的第一相似度;在步骤S7中,采用矩阵相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度; 在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻组进行排序,取前第二设定数量的新闻组,得到每个用户的所述第二设定数量的新闻组,根据每个用户与各自的第二设定数量的新闻组中的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定,评分越高,权重越大。The news recommendation method based on the user's short-term interests according to claim 1, wherein in step S4, the word vector of each news is used as a label, and the long-term portrait and the short-term portrait are the user's preference for each label. Weight; In step S5, the matrix similarity measurement method is used to determine the first similarity between the user’s long-term portrait and each newsgroup; in step S7, the matrix similarity measurement method is used to determine the user’s short-term portrait and the first setting In step S8, in the second similarity of each user, each news group is sorted in descending order, and the second set number of news groups are taken to obtain each news group. According to the second set number of news groups of each user, a user-news bipartite graph is constructed according to the news of each user and the respective second set number of newsgroups, wherein the weight of the edge on the bipartite graph is based on the user The score setting of news, the higher the score, the greater the weight.
  12. 根据权利要求11所述的基于用户短期兴趣的新闻推荐方法,其特征在于,在步骤S7中,向量相似性度量方法获得用户短期画像与第一设定数量的新闻组中的每个新闻的第二相似度;在步骤S8中,在每个用户的第二相似度中按照降序对每个新闻进行排序,取前第三设定数量的新闻,得到每个用户的所述第三设定数量的新闻,根据每个用户与各自的第三设定数量的新闻构建用户-新闻二分图,其中,二分图上边线的权重根据用户对新闻的评分设定。The news recommendation method based on the user's short-term interest according to claim 11, wherein in step S7, the vector similarity measurement method obtains the short-term portrait of the user and the first set of each news in the first set number of news groups. Second degree of similarity; in step S8, each news is sorted in descending order in the second degree of similarity of each user, and the first third set number of news is taken to obtain the third set number of each user For news, construct a user-news bipartite graph based on each user and their third set number of news, where the weight of the upper edge of the bipartite graph is set according to the user’s news rating.
  13. 根据权利要求1所述的基于用户短期兴趣的新闻推荐方法,其特征在于,步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻的步骤包括:首选选择一个初始点,然后以p的概率随机跳到图上任意一点,剩下1-p的概率会按边的权重分配给相邻点,之后每一次都以相同的概率跳到随机点或者相邻点,采用转移矩阵来计算跳转概率,经过几次迭代之后,跳转概率趋于稳定,转移概率最高的新闻将会被推荐。The news recommendation method based on the user's short-term interest according to claim 1, wherein, in step S9, the step of using an absorption random walk method on the user news bipartite graph to select recommended news includes: firstly selecting an initial Point, then randomly jump to any point on the graph with the probability of p, the remaining 1-p probability will be assigned to the adjacent points according to the weight of the edge, and then jump to the random point or the adjacent point with the same probability every time, The transition matrix is used to calculate the jump probability. After several iterations, the jump probability stabilizes, and the news with the highest transition probability will be recommended.
  14. 根据权利要求1所述的基于用户短期兴趣的新闻推荐方法,其特征在于,步骤S9包括:The news recommendation method based on users' short-term interests according to claim 1, wherein step S9 comprises:
    在用户新闻二分图中,每个用户作为一个节点,每个新闻也作为一个节点,采用随机游走重启的方法获得各节点之间的相关值;In the user news bipartite graph, each user acts as a node, and each news also acts as a node. The random walk restart method is used to obtain the correlation value between the nodes;
    获得每个用户节点的相邻节点构成的每个用户的相邻集合,从相邻集合中任意两个节点之间的相关值构成每个用户的第一子相关矩阵,将所述第一子相关矩阵中非对角线元素均值的倒数作为每个用户的桥接值,结合相邻集合中用户节点的桥接值构成每个用户的桥接矩阵;Obtain the adjacent set of each user formed by the adjacent nodes of each user node, form the first sub-correlation matrix of each user from the correlation value between any two nodes in the adjacent set, and divide the first sub-correlation matrix The reciprocal of the mean value of the off-diagonal elements in the correlation matrix is used as the bridge value of each user, and the bridge value of the user nodes in the adjacent set is combined to form the bridge matrix of each user;
    将每个用户节点及其相邻集合中的用户节点与相邻集合中新闻节点的相关值构成每个用户的第二子相关矩阵;Each user node and the correlation value of the user node in the adjacent set and the news node in the adjacent set form the second sub-correlation matrix of each user;
    每个用户的桥接矩阵和第二子相关矩阵相乘获得上述新闻节点的推荐值;The bridge matrix of each user and the second sub-correlation matrix are multiplied to obtain the recommended value of the news node;
    按照推荐值由大到小的顺序对新闻节点进行排序,取排序靠前的设定数量的新闻对该用户进行推荐。The news nodes are sorted according to the recommended value in descending order, and the set number of news with the highest sorting is selected to recommend the user.
  15. 根据权利要求14所述的基于用户短期兴趣的新闻推荐方法,其特征在于,所述采用随机游走重启的方法获得各节点之间的相关值的步骤包括:The news recommendation method based on the user's short-term interest according to claim 14, wherein the step of using a random walk restart method to obtain correlation values between nodes comprises:
    以一个节点为出发节点,将所述一个节点与其他节点的第二相似度组成的向量作为重启向量,计算二分图上各个节点之间的跳转概率;Taking a node as a starting node, and using a vector composed of the second similarity between the one node and other nodes as a restart vector, and calculating the jump probability between each node on the bipartite graph;
    将所述节点之间的跳转概率组成邻接矩阵;Compose the jump probability between the nodes into an adjacency matrix;
    对邻接矩阵进行迭代处理,直到邻接矩阵收敛,所述收敛后的邻接矩阵矩阵中元素为所述一个节点与一个其他节点的两者之间的相关值。Iterative processing is performed on the adjacency matrix until the adjacency matrix converges, and the elements in the adjacency matrix after the convergence are the correlation values between the one node and the other node.
  16. 一种基于用户短期兴趣的新闻推荐装置,其特征在于,包括:A news recommendation device based on users' short-term interests is characterized in that it includes:
    采集模块,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵;The collection module collects user behavior data on news, and the behavior data includes a news matrix;
    词向量矩阵模块,根据所述新闻矩阵获得对应的词向量矩阵;The word vector matrix module obtains the corresponding word vector matrix according to the news matrix;
    聚类模块,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;A clustering module, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
    用户画像获得模块,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;The user portrait acquisition module obtains the long-term portrait and the short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the user's correspondence to the word contained in the news. The preference of the word vector;
    第一相似度获得模块,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;The first similarity obtaining module analyzes the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities;
    有偏好新闻组获得模块,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;There is a preference newsgroup obtaining module, which sorts the plurality of first similarities in descending order, and obtains a first set number of newsgroups corresponding to each user based on the sorting result;
    第二相似度获得模块,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;The second similarity obtaining module analyzes the second similarity between the latest short-term portrait of each user and each news in the first set number of news groups;
    二分图构建模块,根据所述第二相似度构建用户-新闻二分图;A bipartite graph construction module, which constructs a user-news bipartite graph according to the second similarity;
    推荐模块,在所述二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。The recommendation module uses the absorption random walk method to select the recommended news on the bipartite graph, thereby obtaining the recommended news of each user.
  17. 根据权利要求16所述的基于用户短期兴趣的新闻推荐装置,其特征在于,所述聚类模块包括:The news recommendation device based on a user's short-term interest according to claim 16, wherein the clustering module comprises:
    层次聚类单元,对词向量矩阵模块的词向量矩阵进行层次聚类,得到层次聚类树状图,所述层次聚类树状图的一个叶结点对应一个新闻;The hierarchical clustering unit performs hierarchical clustering on the word vector matrix of the word vector matrix module to obtain a hierarchical clustering dendrogram, where one leaf node of the hierarchical clustering dendrogram corresponds to one news;
    邓恩指数获得单元,获得层次聚类单元的每一次聚类结果对应的邓恩指数;Dunn index obtaining unit, to obtain the Dunn index corresponding to each clustering result of the hierarchical clustering unit;
    切割单元,通过邓恩指数获得单元获得的邓恩指数最大值对应的层对所述层次聚类单元的层次聚类树状图进行切割,获得最佳层次聚类树状图;A cutting unit, cutting the hierarchical clustering dendrogram of the hierarchical clustering unit through the layer corresponding to the maximum Dunn index obtained by the Dunn index obtaining unit to obtain the best hierarchical clustering dendrogram;
    新闻分组单元,将切割单元切割形成的最佳层次聚类树状图中属于同一父节点的叶结点对应的新闻属于同一新闻组,从而获得每个新闻的新闻分组。The news grouping unit cuts the cutting unit to form the best hierarchical clustering dendrogram and the news corresponding to the leaf nodes belonging to the same parent node belong to the same news group, thereby obtaining the news grouping of each news.
  18. 根据权利要求16所述的基于用户短期兴趣的新闻推荐装置,其特征在于,还包括:The news recommendation device based on the user's short-term interest according to claim 16, characterized in that it further comprises:
    主题矩阵构建模块,对词向量矩阵使用线性判别分析方法进行分析,获得每个新闻的多个主题的主题概率矩阵及每个主题对应的不同的词向量的词概率矩阵,通过每个新闻的主题概率矩阵、词概率矩阵、词向量矩阵组合获得每个新闻的主题值,每个新闻的主题值构成主题矩阵,The topic matrix building module uses the linear discriminant analysis method to analyze the word vector matrix to obtain the topic probability matrix of multiple topics of each news and the word probability matrix of different word vectors corresponding to each topic. Through each news topic The probability matrix, word probability matrix, and word vector matrix are combined to obtain the topic value of each news, and the topic value of each news constitutes the topic matrix,
    其中,所述聚类模块通过主题矩阵构建模块构建的主题矩阵获得每个新闻组的主题向量;第一相似度获得模块采用向量相似度度量方法确定用户长期画像与每个新闻组的主题向量的第一相似度;所述第二相似度获得模块采用向量相似度度量方法确定用户短期画像与所述第一设定数量的每个新闻组的第二相似度。Wherein, the clustering module obtains the topic vector of each news group through the topic matrix constructed by the topic matrix construction module; the first similarity obtaining module uses the vector similarity measurement method to determine the difference between the long-term portrait of the user and the topic vector of each news group The first similarity; the second similarity obtaining module uses a vector similarity measurement method to determine the second similarity between the short-term portrait of the user and the first set number of each news group.
  19. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存 储有基于用户短期兴趣的新闻推荐程序,所述基于用户短期兴趣的新闻推荐程序被所述处理器执行时实现如下步骤:An electronic device, characterized by comprising a memory and a processor, the memory stores a news recommendation program based on the user's short-term interest, and the following steps are implemented when the news recommendation program based on the user's short-term interest is executed by the processor :
    步骤S1,采集用户对新闻的行为数据,所述行为数据包括新闻矩阵;Step S1, collecting user behavior data on news, the behavior data including a news matrix;
    步骤S2,根据所述新闻矩阵获得对应的词向量矩阵;Step S2: Obtain a corresponding word vector matrix according to the news matrix;
    步骤S3,对所述词向量矩阵进行聚类,得到每个新闻的分群结果,根据所述分群结果将每个新闻分群到对应的新闻组;Step S3, clustering the word vector matrix to obtain a grouping result of each news, and grouping each news into a corresponding news group according to the grouping result;
    步骤S4,通过每个用户对每个新闻的长期行为数据和短期行为数据分别获得每个用户的长期画像和短期画像,所述长期画像和短期画像用于表征用户对新闻包含的词对应的词向量的偏好;Step S4: Obtain a long-term portrait and a short-term portrait of each user through the long-term behavior data and short-term behavior data of each user for each news. The long-term portrait and the short-term portrait are used to represent the word corresponding to the word contained in the news. Vector preference
    步骤S5,分析每个用户的长期画像与不同新闻组之间的相似度,得到多个第一相似度;Step S5: Analyze the similarity between the long-term portrait of each user and different newsgroups to obtain multiple first similarities;
    步骤S6,按照降序对所述多个第一相似度进行排序,基于排序的结果得到每个用户对应的第一设定数量的新闻组;Step S6, sort the plurality of first similarities in descending order, and obtain a first set number of newsgroups corresponding to each user based on the sorting result;
    步骤S7,分析每个用户最新的短期画像与所述第一设定数量的新闻组中每个新闻之间的第二相似度;Step S7, analyzing the second similarity between the latest short-term portrait of each user and each news in the first set number of newsgroups;
    步骤S8,根据所述第二相似度构建用户新闻二分图;Step S8, construct a user news bipartite graph according to the second similarity;
    步骤S9,在所述用户新闻二分图上使用吸收随机游走方法选取被推荐的新闻,从而得到每个用户的推荐新闻。Step S9: Use the absorption random walk method to select recommended news on the user news bipartite graph, so as to obtain the recommended news of each user.
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机非易失性可读存储介质中包括有基于用户短期兴趣的新闻推荐程序,所述基于用户短期兴趣的新闻推荐程序被处理器执行时,实现如权利要求1至15中任一项所述基于用户短期兴趣的新闻推荐方法的步骤。A computer non-volatile readable storage medium, wherein the computer non-volatile readable storage medium includes a news recommendation program based on the user's short-term interest, and the news recommendation program based on the user's short-term interest is When the processor is executed, the steps of the news recommendation method based on the user's short-term interest as described in any one of claims 1 to 15 are implemented.
PCT/CN2019/103700 2019-05-08 2019-08-30 News recommendation method and apparatus based on short-term interest of user, and electronic device and medium WO2020224128A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910379183.5A CN110275952A (en) 2019-05-08 2019-05-08 News recommended method, device and medium based on user's short-term interest
CN201910379183.5 2019-05-08

Publications (1)

Publication Number Publication Date
WO2020224128A1 true WO2020224128A1 (en) 2020-11-12

Family

ID=67959845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103700 WO2020224128A1 (en) 2019-05-08 2019-08-30 News recommendation method and apparatus based on short-term interest of user, and electronic device and medium

Country Status (2)

Country Link
CN (1) CN110275952A (en)
WO (1) WO2020224128A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883292A (en) * 2021-02-06 2021-06-01 西北大学 User behavior recommendation model establishment and position recommendation method based on spatio-temporal information
CN114969566A (en) * 2022-06-27 2022-08-30 中国测绘科学研究院 Distance-measuring government affair service item collaborative filtering recommendation method

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733006B (en) * 2019-10-14 2022-12-02 中国移动通信集团上海有限公司 User portrait generation method, device and equipment and storage medium
CN111062757B (en) * 2019-12-17 2023-09-01 山大地纬软件股份有限公司 Information recommendation method and system based on multipath optimizing matching
CN111444428B (en) * 2020-03-27 2022-08-30 腾讯科技(深圳)有限公司 Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
CN111680218B (en) * 2020-06-10 2023-08-11 网易传媒科技(北京)有限公司 User interest identification method and device, electronic equipment and storage medium
CN111680073A (en) * 2020-06-11 2020-09-18 天元大数据信用管理有限公司 Financial service platform policy information recommendation method based on user data
CN114817753B (en) * 2022-06-29 2022-09-09 京东方艺云(杭州)科技有限公司 Method and device for recommending art painting

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5782487B2 (en) * 2013-08-07 2015-09-24 日本電信電話株式会社 Action purpose extraction method and apparatus
CN106503014A (en) * 2015-09-08 2017-03-15 腾讯科技(深圳)有限公司 A kind of recommendation methods, devices and systems of real time information
CN107133290A (en) * 2017-04-19 2017-09-05 中国人民解放军国防科学技术大学 A kind of Personalized search and device
CN108197335A (en) * 2018-03-09 2018-06-22 中国人民解放军国防科技大学 Personalized query recommendation method and device based on user behaviors
CN108446350A (en) * 2018-03-09 2018-08-24 华中科技大学 A kind of recommendation method based on topic model analysis and user's length interest

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8589378B2 (en) * 2010-10-11 2013-11-19 Yahoo! Inc. Topic-oriented diversified item recommendation
CN103116639B (en) * 2013-02-20 2016-05-11 新浪网技术(中国)有限公司 Based on article recommend method and the system of user-article bipartite graph model
CN104572797A (en) * 2014-05-12 2015-04-29 深圳市智搜信息技术有限公司 Individual service recommendation system and method based on topic model
CN105022840B (en) * 2015-08-18 2018-06-05 新华网股份有限公司 A kind of news information processing method, news recommend method and relevant apparatus
CN105913296B (en) * 2016-04-01 2020-01-03 北京理工大学 Personalized recommendation method based on graph
CN108197211A (en) * 2017-12-28 2018-06-22 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, server and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5782487B2 (en) * 2013-08-07 2015-09-24 日本電信電話株式会社 Action purpose extraction method and apparatus
CN106503014A (en) * 2015-09-08 2017-03-15 腾讯科技(深圳)有限公司 A kind of recommendation methods, devices and systems of real time information
CN107133290A (en) * 2017-04-19 2017-09-05 中国人民解放军国防科学技术大学 A kind of Personalized search and device
CN108197335A (en) * 2018-03-09 2018-06-22 中国人民解放军国防科技大学 Personalized query recommendation method and device based on user behaviors
CN108446350A (en) * 2018-03-09 2018-08-24 华中科技大学 A kind of recommendation method based on topic model analysis and user's length interest

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883292A (en) * 2021-02-06 2021-06-01 西北大学 User behavior recommendation model establishment and position recommendation method based on spatio-temporal information
CN112883292B (en) * 2021-02-06 2023-04-18 西北大学 User behavior recommendation model establishment and position recommendation method based on spatio-temporal information
CN114969566A (en) * 2022-06-27 2022-08-30 中国测绘科学研究院 Distance-measuring government affair service item collaborative filtering recommendation method

Also Published As

Publication number Publication date
CN110275952A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
WO2020224128A1 (en) News recommendation method and apparatus based on short-term interest of user, and electronic device and medium
US11710054B2 (en) Information recommendation method, apparatus, and server based on user data in an online forum
US20180349384A1 (en) Differentially private database queries involving rank statistics
CN105224699B (en) News recommendation method and device
CN107862022B (en) Culture resource recommendation system
WO2021068610A1 (en) Resource recommendation method and apparatus, electronic device and storage medium
TWI636416B (en) Method and system for multi-phase ranking for content personalization
US8880548B2 (en) Dynamic search interaction
US20110258148A1 (en) Active prediction of diverse search intent based upon user browsing behavior
CN110503506B (en) Item recommendation method, device and medium based on grading data
CN109753601B (en) Method and device for determining click rate of recommended information and electronic equipment
JP2014500988A (en) Text set matching
CN105426514A (en) Personalized mobile APP recommendation method
CN105531701A (en) Personalized trending image search suggestion
CN106951527B (en) Song recommendation method and device
KR101590976B1 (en) Method and Apparatus for Collaborative Filtering of Matrix Localization by Using Semantic Clusters Generated from Linked Data
US11144783B2 (en) Servers, non-transitory computer-readable media and methods for providing articles
US20140297628A1 (en) Text Information Processing Apparatus, Text Information Processing Method, and Computer Usable Medium Having Text Information Processing Program Embodied Therein
CN112825089B (en) Article recommendation method, device, equipment and storage medium
CN111967914A (en) User portrait based recommendation method and device, computer equipment and storage medium
CN106354867A (en) Multimedia resource recommendation method and device
JP5424393B2 (en) Word theme relevance calculation device, word theme relevance calculation program, and information search device
Bendimerad et al. User-driven geolocated event detection in social media
Lorenz-Spreen et al. Tracking online topics over time: understanding dynamic hashtag communities
CN108446378B (en) Method, system and computer storage medium based on user search

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927848

Country of ref document: EP

Kind code of ref document: A1