CN105913296B

CN105913296B - Personalized recommendation method based on graph

Info

Publication number: CN105913296B
Application number: CN201610202059.8A
Authority: CN
Inventors: 胡晶晶; 刘琳竹; 薛静锋; 单纯; 段智伟
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2016-04-01
Filing date: 2016-04-01
Publication date: 2020-01-03
Anticipated expiration: 2036-04-01
Also published as: CN105913296A

Abstract

The invention provides a personalized recommendation method based on a graph, which can effectively reduce the influence of sparsity on a recommendation effect. Step one, calculating historical scoring records of users by using a hidden semantic model to respectively obtain hidden relations between users and between articles; step two, respectively calculating the similarity between users and the similarity between articles by using the implicit relationship obtained in the step one, and constructing graphs between similar users and similar articles; thirdly, constructing a user-article graph model by using the user graph model and the article graph model obtained in the second step and bipartite graphs of the user and the article obtained from historical scoring records of the user; and fourthly, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner by using a stochastic walk-based personalrank algorithm, and taking the first N articles to form a recommendation list to recommend the users.

Description

Personalized recommendation method based on graph

Technical Field

The invention belongs to the technical field of recommendation methods, and relates to a personalized recommendation method based on a graph.

Background

The Recommendation System (RS) is a System capable of actively recommending commodities or items for a user by using the user's preferences, and it uses the user's historical data to explore the user's interest preferences, thereby pushing items that the user may be interested in to a specific user, and a good recommendation System can bring considerable economic benefits to merchants.

The composition of a complete recommendation system must include three elements: a user model, a recommendation object model and a recommendation algorithm. Wherein the recommendation algorithm is the core of the recommendation system. At present, the more mature recommendation algorithms mainly include: collaborative filtering based recommendations, implicit semantic models, graph model based recommendations, combined recommendations, and the like.

Similar to the invention: the invention has three differences in summary with other inventions:

(1) and (5) calculating the similarity. The invention relates to a method for calculating similarity between users by using label marking information.

(2) And (5) establishing a graph model. He constructs an undirected graph model with weights and does not consider the similarity among the articles, and the invention constructs undirected graph models without weights and establishes direct connection among the similar articles.

(3) The running speed problem. His patent does not take into account the speed of operation of the random walk on the graph. The CUDA is utilized to parallelize the personalrank algorithm.

Wherein, the semantic model is implied: by using the implied semantic analysis technology, the implied subject or classification can be found, and the relation between the characteristics is established through the implied subject or classification. Common implicit semantic analysis techniques are mainly LFM and LSI, LDA, TopicModel and the like. These technologies were originally proposed in the field of text mining, and in recent years, they have been applied to other fields, and have achieved good application results.

Recommendation algorithm based on graph model: user behavior can be represented by bipartite graph, and the task of recommending articles to user u can be converted into measuring user vertex v_uAnd v_uThe relevance of the item nodes without edges directly connected on the graph is larger, and the weight of the item with higher relevance in the recommendation list is larger.

A pair of vertices with high general relevance has the following characteristics:

1) the number of connecting paths between two vertexes is large;

2) the lengths of the paths connected between the two vertexes are short;

3) the output degree of the vertex passed by the connecting path between the two vertexes is larger.

Researchers have devised many ways to calculate the relevance of vertices in graphs. Among them is the personalrank algorithm based on random walk.

CUDA is an architecture for supporting general parallel computing developed based on a Graphics Processing Unit (GPU) and introduced by NVIDIA corporation in 2006, and the idea is to fully utilize the respective advantages of the CPU and the GPU in an application program. And the advantage of solving the complex calculation problem at high speed by the GPU is exerted. And can write parallel programs by using a high-level language like the C language, and is widely applied to numerical calculation in various fields.

The CUDA programming is divided into a Host side (Host) and a Device side (Device), and one Host and a plurality of devices can exist in one system. In this programming model, the CPU and GPU are working in conjunction with each other. The CPU is responsible for performing highly logical transaction processing and serial computation, while the GPU is dedicated to performing highly threaded parallel processing tasks. The program at the host end is run in the CPU in series, and when the program runs to a Kernel function (Kernel), the program calls the GPU to execute, and the parallel processing of multiple threads is realized.

The existing recommendation system mainly has the problems of sparsity, cold start, expandability and the like, wherein the sparsity is caused by the fact that the recommendation quality of an algorithm is seriously reduced due to the fact that the existing data scale is large, the overlap of two users is small, and the scoring data is sparse. The scalability problem is due to the fact that the computation time of the recommendation algorithm is growing dramatically as the number of users and items increases.

Disclosure of Invention

The invention provides a graph-based personalized recommendation method, which establishes graph models among users and graph models among articles by using results obtained by a hidden semantic model, and establishes hidden relations among users and among articles as much as possible, thereby effectively reducing the influence of sparsity on recommendation effect.

The invention is realized by the following technical scheme:

a personalized recommendation method based on a graph comprises the following steps:

step one, calculating historical scoring records of users by using a hidden semantic model to respectively obtain hidden relations between users and between articles;

secondly, respectively calculating the similarity between users and the similarity between articles by using the implicit relationship obtained in the first step, constructing graphs between similar users and between similar articles, wherein the users and the articles are used as nodes, and if the similarity between the articles of the users is higher than a preset threshold value, establishing an edge until graph models between the users and between the articles are constructed, and constructing a connection graph between the users and the articles by using historical scoring records;

thirdly, constructing a user-article graph model by using the user graph model and the article graph model obtained in the second step and bipartite graphs of the user and the article obtained from historical scoring records of the user;

and fourthly, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner by using a stochastic walk-based personalrank algorithm, and taking the first N articles to form a recommendation list to recommend the users.

Further, a method combining a matrix-angle solving method and CUDA parallelization is adopted to improve the running speed of the personalrank algorithm.

Detailed Description

The invention is further described below.

(1) Implicit semantic analysis

The invention adopts a hidden semantic model (LFM), and the main idea is to use the product of two low-dimensional matrixes to express a scoring matrix of a user on an article. Firstly, historical scoring records of a user on an article need to be collected, and then the LFM is used for modeling the article, so that a model shown in the following graph can be obtained:

the R matrix is a user item matrix, and the matrix value Rij represents the interest of useri in itemj, which is the required value. The LFM algorithm can extract a plurality of classes from the user scoring the item, and the classes are used as a bridge for connecting between a user and the item, and an R matrix is represented as multiplication of a P matrix and a Q matrix.

Wherein the P matrix is a user-class matrix, and the matrix value Pij represents the interest of useri on class j; the Q matrix class-item matrix, wherein the matrix value Qij represents the weight of itemj in class, and the higher the weight is, the more representative the class is. The LFM calculates the interest level of the user U in the item I according to the following formula:

for the calculation of parameter values in matrix P and matrix Q. The loss function can be minimized to solve the parameters by adopting a random gradient descent method.

Where λ (| | p)_u||²+||q_i||²) Is a regularization term that prevents overfitting. The number of implicit features F and the regularization parameter λ need to be obtained experimentally.

The step utilizes a hidden semantic model, and adopts automatic clustering based on user behavior statistics from the perspective of data. The implicit semantic analysis technique has the following four advantages:

1) the classification of implicit semantic analysis techniques comes from statistics on user behavior and represents the user's opinion of the classification of items.

2) The classification granularity can be controlled, and the larger the set final classification number is, the finer the classification granularity is, and conversely, the coarser the classification granularity is.

3) The weight of an item in each class can be determined by counting user behavior, so that each item is not rigidly classified into a certain class.

4) It can be given that each classification is of different dimensions, calculated entirely from the user's historical data.

(2) Similarity calculation

The step is that the similarity between users and the similarity between articles are respectively calculated by utilizing the matrix P and the matrix Q obtained in the step (1), a graph is constructed between similar users and between similar articles, the users and the articles are used as nodes, and if the similarity between the users (articles) is higher than a certain threshold value, an edge is established.

The similarity calculation adopts a Euclidean distance calculation method, and the Euclidean distance between two n-dimensional vectors a (x11, x12, … x1n) and b (x21, x22, … x2n) is calculated.

(3) User-item graph model based recommendations

And (3) constructing a user-item graph model by using the user graph model and the item graph model obtained in the step (2) and the bipartite graph of the user and the item obtained from the historical scoring record of the user. And predicting the articles which are possibly interested by the user by utilizing the personalrank algorithm.

The idea of the algorithm is that user u starts from the starting node v_uA random walk is started. When walking to a random node, firstly, according to the probability alpha, deciding whether to continue walking or terminate walking and starting from the starting node v_uAnd starting to swim again. If the node is to continue, a node is randomly selected from the nodes pointed by the node with medium probability as the next node for wandering. Thus, when the user walks to the end, the probability that each item node is visited converges to a number, which is used as the final visit probability of the item node. Is formulated as follows:

where d is the probability of continuing the walk, | out (i) | is the degree of node i, and pr (i) is the access probability of node i.

The recommendation based on the graph model can better and more intuitively establish the relationship between the user and the article, and more naturally generate a Top-N recommendation result set, but the algorithm needs iteration on a bipartite graph and needs each vertex in the graph to generate a corresponding PR value to converge, so the time complexity is high.

Therefore, Personalrank is converted into a form of matrix operation, instead of the iterative method. M is the transition probability matrix of the bipartite graph, i.e.

Then, the iterative formula can be converted to:

r＝(1-α)r₀+αM^Tr

obtaining by solution:

r＝(1-α)(1-αM^T)r₀

only one calculation of (1-. alpha.M) is required^T)^-1I.e., but for 1- α M^TAnd (5) quickly inverting the sparse matrix. Therefore, the CUDA parallelization-based programming technology is combined with the Gauss Jordan algorithm to solve the inverse matrix (1-alpha M)^T)^-1The problem of the operating speed calculated by the personalrank algorithm by using a continuous iteration method is solved.

The idea of the Gauss Jordan algorithm is: and solving the inverse of the matrix A, namely only putting the identity matrix I on the right side, and performing matrix transformation on the whole to obtain the identity matrix on the left side, namely the right side.

The following is a detailed design scheme based on CUDA parallelized personalrank:

the host (host) side pseudo-code is as follows:

a) allocating threads:

dim3thread(threads)；

dim3rBlock((int)ceil(columns/threads)+1)；

dim3cBlock((int)ceil(size*columns/threads)+1)；

b) to (1-alpha M)^T) The matrix and the identity matrix I allocate space in the global memory of the GPU:

cudaMalloc((void**)&devMatrix,size*columns*sizeof(float))；

c) will be (1-alpha M)^T) Copying the matrix and the unit matrix I from the memory to the video memory:

cudaMemcpy(d_A,L,ddsize,cudaMemcpyHostToDevice)；

cudaMemcpy(dI,I,ddsize,cudaMemcpyHostToDevice)；

d) to obtain (1-. alpha.M)^T) Inverse of the matrix:

e) copying the matrix and the unit matrix from the video memory back to the memory:

cudaMemcpy(matrix,devMatrix,size*columns*sizeof(float),

cudaMemcpyDeviceToHost)；

f) release of GPU global memory space:

cudaFree(devMatrix)；

device (device) side pseudo code is as follows:

a) call rowExchange () in kernel, which swaps the row diagonal element not 0

b) FixRows () is called to make the diagonal element become 1 by dividing the whole line by the diagonal element

c) Calling fixColumns () causes the other elements of the line to become 0

(4) Top-N recommendation

And (4) performing descending order according to the access probability of the items which are obtained in the step (3) and have no scoring record for each user, and taking the top N items to form a recommendation list to recommend each user.

Claims

1. A personalized recommendation method based on a graph is characterized by comprising the following steps:

and fourthly, predicting the articles which are possibly interested by the user by using a stochastic walk-based personalrank algorithm, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner, and taking the first N articles to form a recommendation list to recommend the users.

2. The method of claim 1, further comprising a method of combining a matrix-based solution method with CUDA parallelization to increase the running speed of the personalrank algorithm.

3. A graph-based personalized recommendation method according to claim 1 or 2, characterized in that further the personalrank algorithm is used to start node v for user u_uStarting random walk, when walking to a certain random node, firstly deciding whether to continue or terminate the walk according to the probability alpha and starting from the starting node v_uStarting to swim again; if the node is to continue, a node is randomly selected from the nodes pointed by the node with medium probability as the next node for walking, so that the probability that each item node is visited converges to a number at the end of walking, and the probability is used as the final visit probability of the item node.