CN105913296B - Personalized recommendation method based on graph - Google Patents

Personalized recommendation method based on graph Download PDF

Info

Publication number
CN105913296B
CN105913296B CN201610202059.8A CN201610202059A CN105913296B CN 105913296 B CN105913296 B CN 105913296B CN 201610202059 A CN201610202059 A CN 201610202059A CN 105913296 B CN105913296 B CN 105913296B
Authority
CN
China
Prior art keywords
articles
user
users
graph
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610202059.8A
Other languages
Chinese (zh)
Other versions
CN105913296A (en
Inventor
胡晶晶
刘琳竹
薛静锋
单纯
段智伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201610202059.8A priority Critical patent/CN105913296B/en
Publication of CN105913296A publication Critical patent/CN105913296A/en
Application granted granted Critical
Publication of CN105913296B publication Critical patent/CN105913296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a personalized recommendation method based on a graph, which can effectively reduce the influence of sparsity on a recommendation effect. Step one, calculating historical scoring records of users by using a hidden semantic model to respectively obtain hidden relations between users and between articles; step two, respectively calculating the similarity between users and the similarity between articles by using the implicit relationship obtained in the step one, and constructing graphs between similar users and similar articles; thirdly, constructing a user-article graph model by using the user graph model and the article graph model obtained in the second step and bipartite graphs of the user and the article obtained from historical scoring records of the user; and fourthly, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner by using a stochastic walk-based personalrank algorithm, and taking the first N articles to form a recommendation list to recommend the users.

Description

Personalized recommendation method based on graph
Technical Field
The invention belongs to the technical field of recommendation methods, and relates to a personalized recommendation method based on a graph.
Background
The Recommendation System (RS) is a System capable of actively recommending commodities or items for a user by using the user's preferences, and it uses the user's historical data to explore the user's interest preferences, thereby pushing items that the user may be interested in to a specific user, and a good recommendation System can bring considerable economic benefits to merchants.
The composition of a complete recommendation system must include three elements: a user model, a recommendation object model and a recommendation algorithm. Wherein the recommendation algorithm is the core of the recommendation system. At present, the more mature recommendation algorithms mainly include: collaborative filtering based recommendations, implicit semantic models, graph model based recommendations, combined recommendations, and the like.
Similar to the invention: the invention has three differences in summary with other inventions:
(1) and (5) calculating the similarity. The invention relates to a method for calculating similarity between users by using label marking information.
(2) And (5) establishing a graph model. He constructs an undirected graph model with weights and does not consider the similarity among the articles, and the invention constructs undirected graph models without weights and establishes direct connection among the similar articles.
(3) The running speed problem. His patent does not take into account the speed of operation of the random walk on the graph. The CUDA is utilized to parallelize the personalrank algorithm.
Wherein, the semantic model is implied: by using the implied semantic analysis technology, the implied subject or classification can be found, and the relation between the characteristics is established through the implied subject or classification. Common implicit semantic analysis techniques are mainly LFM and LSI, LDA, TopicModel and the like. These technologies were originally proposed in the field of text mining, and in recent years, they have been applied to other fields, and have achieved good application results.
Recommendation algorithm based on graph model: user behavior can be represented by bipartite graph, and the task of recommending articles to user u can be converted into measuring user vertex vuAnd vuThe relevance of the item nodes without edges directly connected on the graph is larger, and the weight of the item with higher relevance in the recommendation list is larger.
A pair of vertices with high general relevance has the following characteristics:
1) the number of connecting paths between two vertexes is large;
2) the lengths of the paths connected between the two vertexes are short;
3) the output degree of the vertex passed by the connecting path between the two vertexes is larger.
Researchers have devised many ways to calculate the relevance of vertices in graphs. Among them is the personalrank algorithm based on random walk.
CUDA is an architecture for supporting general parallel computing developed based on a Graphics Processing Unit (GPU) and introduced by NVIDIA corporation in 2006, and the idea is to fully utilize the respective advantages of the CPU and the GPU in an application program. And the advantage of solving the complex calculation problem at high speed by the GPU is exerted. And can write parallel programs by using a high-level language like the C language, and is widely applied to numerical calculation in various fields.
The CUDA programming is divided into a Host side (Host) and a Device side (Device), and one Host and a plurality of devices can exist in one system. In this programming model, the CPU and GPU are working in conjunction with each other. The CPU is responsible for performing highly logical transaction processing and serial computation, while the GPU is dedicated to performing highly threaded parallel processing tasks. The program at the host end is run in the CPU in series, and when the program runs to a Kernel function (Kernel), the program calls the GPU to execute, and the parallel processing of multiple threads is realized.
The existing recommendation system mainly has the problems of sparsity, cold start, expandability and the like, wherein the sparsity is caused by the fact that the recommendation quality of an algorithm is seriously reduced due to the fact that the existing data scale is large, the overlap of two users is small, and the scoring data is sparse. The scalability problem is due to the fact that the computation time of the recommendation algorithm is growing dramatically as the number of users and items increases.
Disclosure of Invention
The invention provides a graph-based personalized recommendation method, which establishes graph models among users and graph models among articles by using results obtained by a hidden semantic model, and establishes hidden relations among users and among articles as much as possible, thereby effectively reducing the influence of sparsity on recommendation effect.
The invention is realized by the following technical scheme:
a personalized recommendation method based on a graph comprises the following steps:
step one, calculating historical scoring records of users by using a hidden semantic model to respectively obtain hidden relations between users and between articles;
secondly, respectively calculating the similarity between users and the similarity between articles by using the implicit relationship obtained in the first step, constructing graphs between similar users and between similar articles, wherein the users and the articles are used as nodes, and if the similarity between the articles of the users is higher than a preset threshold value, establishing an edge until graph models between the users and between the articles are constructed, and constructing a connection graph between the users and the articles by using historical scoring records;
thirdly, constructing a user-article graph model by using the user graph model and the article graph model obtained in the second step and bipartite graphs of the user and the article obtained from historical scoring records of the user;
and fourthly, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner by using a stochastic walk-based personalrank algorithm, and taking the first N articles to form a recommendation list to recommend the users.
Further, a method combining a matrix-angle solving method and CUDA parallelization is adopted to improve the running speed of the personalrank algorithm.
Detailed Description
The invention is further described below.
(1) Implicit semantic analysis
The invention adopts a hidden semantic model (LFM), and the main idea is to use the product of two low-dimensional matrixes to express a scoring matrix of a user on an article. Firstly, historical scoring records of a user on an article need to be collected, and then the LFM is used for modeling the article, so that a model shown in the following graph can be obtained:
Figure BDA0000956344860000031
the R matrix is a user item matrix, and the matrix value Rij represents the interest of useri in itemj, which is the required value. The LFM algorithm can extract a plurality of classes from the user scoring the item, and the classes are used as a bridge for connecting between a user and the item, and an R matrix is represented as multiplication of a P matrix and a Q matrix.
Figure BDA0000956344860000032
Wherein the P matrix is a user-class matrix, and the matrix value Pij represents the interest of useri on class j; the Q matrix class-item matrix, wherein the matrix value Qij represents the weight of itemj in class, and the higher the weight is, the more representative the class is. The LFM calculates the interest level of the user U in the item I according to the following formula:
for the calculation of parameter values in matrix P and matrix Q. The loss function can be minimized to solve the parameters by adopting a random gradient descent method.
Where λ (| | p)u||2+||qi||2) Is a regularization term that prevents overfitting. The number of implicit features F and the regularization parameter λ need to be obtained experimentally.
The step utilizes a hidden semantic model, and adopts automatic clustering based on user behavior statistics from the perspective of data. The implicit semantic analysis technique has the following four advantages:
1) the classification of implicit semantic analysis techniques comes from statistics on user behavior and represents the user's opinion of the classification of items.
2) The classification granularity can be controlled, and the larger the set final classification number is, the finer the classification granularity is, and conversely, the coarser the classification granularity is.
3) The weight of an item in each class can be determined by counting user behavior, so that each item is not rigidly classified into a certain class.
4) It can be given that each classification is of different dimensions, calculated entirely from the user's historical data.
(2) Similarity calculation
The step is that the similarity between users and the similarity between articles are respectively calculated by utilizing the matrix P and the matrix Q obtained in the step (1), a graph is constructed between similar users and between similar articles, the users and the articles are used as nodes, and if the similarity between the users (articles) is higher than a certain threshold value, an edge is established.
The similarity calculation adopts a Euclidean distance calculation method, and the Euclidean distance between two n-dimensional vectors a (x11, x12, … x1n) and b (x21, x22, … x2n) is calculated.
(3) User-item graph model based recommendations
And (3) constructing a user-item graph model by using the user graph model and the item graph model obtained in the step (2) and the bipartite graph of the user and the item obtained from the historical scoring record of the user. And predicting the articles which are possibly interested by the user by utilizing the personalrank algorithm.
The idea of the algorithm is that user u starts from the starting node vuA random walk is started. When walking to a random node, firstly, according to the probability alpha, deciding whether to continue walking or terminate walking and starting from the starting node vuAnd starting to swim again. If the node is to continue, a node is randomly selected from the nodes pointed by the node with medium probability as the next node for wandering. Thus, when the user walks to the end, the probability that each item node is visited converges to a number, which is used as the final visit probability of the item node. Is formulated as follows:
Figure BDA0000956344860000051
Figure BDA0000956344860000052
where d is the probability of continuing the walk, | out (i) | is the degree of node i, and pr (i) is the access probability of node i.
The recommendation based on the graph model can better and more intuitively establish the relationship between the user and the article, and more naturally generate a Top-N recommendation result set, but the algorithm needs iteration on a bipartite graph and needs each vertex in the graph to generate a corresponding PR value to converge, so the time complexity is high.
Therefore, Personalrank is converted into a form of matrix operation, instead of the iterative method. M is the transition probability matrix of the bipartite graph, i.e.
Figure BDA0000956344860000053
Then, the iterative formula can be converted to:
r=(1-α)r0+αMTr
obtaining by solution:
r=(1-α)(1-αMT)r0
only one calculation of (1-. alpha.M) is requiredT)-1I.e., but for 1- α MTAnd (5) quickly inverting the sparse matrix. Therefore, the CUDA parallelization-based programming technology is combined with the Gauss Jordan algorithm to solve the inverse matrix (1-alpha M)T)-1The problem of the operating speed calculated by the personalrank algorithm by using a continuous iteration method is solved.
The idea of the Gauss Jordan algorithm is: and solving the inverse of the matrix A, namely only putting the identity matrix I on the right side, and performing matrix transformation on the whole to obtain the identity matrix on the left side, namely the right side.
The following is a detailed design scheme based on CUDA parallelized personalrank:
the host (host) side pseudo-code is as follows:
a) allocating threads:
dim3thread(threads);
dim3rBlock((int)ceil(columns/threads)+1);
dim3cBlock((int)ceil(size*columns/threads)+1);
b) to (1-alpha M)T) The matrix and the identity matrix I allocate space in the global memory of the GPU:
cudaMalloc((void**)&devMatrix,size*columns*sizeof(float));
c) will be (1-alpha M)T) Copying the matrix and the unit matrix I from the memory to the video memory:
cudaMemcpy(d_A,L,ddsize,cudaMemcpyHostToDevice);
cudaMemcpy(dI,I,ddsize,cudaMemcpyHostToDevice);
d) to obtain (1-. alpha.M)T) Inverse of the matrix:
Figure BDA0000956344860000061
e) copying the matrix and the unit matrix from the video memory back to the memory:
cudaMemcpy(matrix,devMatrix,size*columns*sizeof(float),
cudaMemcpyDeviceToHost);
f) release of GPU global memory space:
cudaFree(devMatrix);
device (device) side pseudo code is as follows:
a) call rowExchange () in kernel, which swaps the row diagonal element not 0
b) FixRows () is called to make the diagonal element become 1 by dividing the whole line by the diagonal element
Figure BDA0000956344860000072
c) Calling fixColumns () causes the other elements of the line to become 0
Figure BDA0000956344860000073
Figure BDA0000956344860000081
(4) Top-N recommendation
And (4) performing descending order according to the access probability of the items which are obtained in the step (3) and have no scoring record for each user, and taking the top N items to form a recommendation list to recommend each user.

Claims (3)

1. A personalized recommendation method based on a graph is characterized by comprising the following steps:
step one, calculating historical scoring records of users by using a hidden semantic model to respectively obtain hidden relations between users and between articles;
secondly, respectively calculating the similarity between users and the similarity between articles by using the implicit relationship obtained in the first step, constructing graphs between similar users and between similar articles, wherein the users and the articles are used as nodes, and if the similarity between the articles of the users is higher than a preset threshold value, establishing an edge until graph models between the users and between the articles are constructed, and constructing a connection graph between the users and the articles by using historical scoring records;
thirdly, constructing a user-article graph model by using the user graph model and the article graph model obtained in the second step and bipartite graphs of the user and the article obtained from historical scoring records of the user;
and fourthly, predicting the articles which are possibly interested by the user by using a stochastic walk-based personalrank algorithm, performing descending order arrangement on the access probability of the articles which are not recorded by the user in a scoring manner, and taking the first N articles to form a recommendation list to recommend the users.
2. The method of claim 1, further comprising a method of combining a matrix-based solution method with CUDA parallelization to increase the running speed of the personalrank algorithm.
3. A graph-based personalized recommendation method according to claim 1 or 2, characterized in that further the personalrank algorithm is used to start node v for user uuStarting random walk, when walking to a certain random node, firstly deciding whether to continue or terminate the walk according to the probability alpha and starting from the starting node vuStarting to swim again; if the node is to continue, a node is randomly selected from the nodes pointed by the node with medium probability as the next node for walking, so that the probability that each item node is visited converges to a number at the end of walking, and the probability is used as the final visit probability of the item node.
CN201610202059.8A 2016-04-01 2016-04-01 Personalized recommendation method based on graph Active CN105913296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610202059.8A CN105913296B (en) 2016-04-01 2016-04-01 Personalized recommendation method based on graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610202059.8A CN105913296B (en) 2016-04-01 2016-04-01 Personalized recommendation method based on graph

Publications (2)

Publication Number Publication Date
CN105913296A CN105913296A (en) 2016-08-31
CN105913296B true CN105913296B (en) 2020-01-03

Family

ID=56745394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610202059.8A Active CN105913296B (en) 2016-04-01 2016-04-01 Personalized recommendation method based on graph

Country Status (1)

Country Link
CN (1) CN105913296B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107506480B (en) * 2017-09-13 2020-05-05 浙江工业大学 Double-layer graph structure recommendation method based on comment mining and density clustering
CN107657043B (en) * 2017-09-30 2021-04-16 北京工业大学 Content-based mixed graph model image recommendation method
CN109754274A (en) * 2017-11-06 2019-05-14 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining target object
CN108446297B (en) * 2018-01-24 2021-03-26 北京三快在线科技有限公司 Recommendation method and device and electronic equipment
CN108320218B (en) * 2018-02-05 2020-12-11 湖南大学 Personalized commodity recommendation method based on trust-score time evolution two-way effect
CN108681913A (en) * 2018-04-04 2018-10-19 淮阴工学院 A kind of digraph recommendation method based on AUC optimizations
CN109471978B (en) * 2018-11-22 2022-01-28 腾讯科技(深圳)有限公司 Electronic resource recommendation method and device
CN109885758B (en) * 2019-01-16 2022-07-26 西北工业大学 Random walk recommendation method based on bipartite graph
CN110162696A (en) * 2019-04-11 2019-08-23 北京三快在线科技有限公司 Recommended method, device, electronic equipment and storage medium based on figure
CN110275952A (en) * 2019-05-08 2019-09-24 平安科技(深圳)有限公司 News recommended method, device and medium based on user's short-term interest
CN110210944B (en) * 2019-06-05 2021-04-23 齐鲁工业大学 Multi-task recommendation method and system combining Bayesian inference and weighted rejection sampling
CN111104606B (en) * 2019-12-06 2022-10-21 成都理工大学 Weight-based conditional wandering chart recommendation method
CN111144976B (en) * 2019-12-10 2022-08-09 支付宝(杭州)信息技术有限公司 Training method and device for recommendation model
CN113516524B (en) * 2020-04-10 2024-06-18 北京沃东天骏信息技术有限公司 Method and device for pushing information
CN111723578B (en) * 2020-06-09 2023-11-17 平安科技(深圳)有限公司 Hot spot prediction method and device based on random walk model and computer equipment
CN112541407B (en) * 2020-08-20 2022-05-13 同济大学 Visual service recommendation method based on user service operation flow
CN112100489B (en) * 2020-08-27 2022-07-15 北京百度网讯科技有限公司 Object recommendation method, device and computer storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7490071B2 (en) * 2003-08-29 2009-02-10 Oracle Corporation Support vector machines processing system
CN103955535A (en) * 2014-05-14 2014-07-30 南京大学镇江高新技术研究院 Individualized recommending method and system based on element path
CN104346476B (en) * 2014-11-20 2017-07-04 西安电子科技大学 Personalized item recommendation method based on article similarity and network structure
CN104935963B (en) * 2015-05-29 2018-03-16 中国科学院信息工程研究所 A kind of video recommendation method based on timing driving

Also Published As

Publication number Publication date
CN105913296A (en) 2016-08-31

Similar Documents

Publication Publication Date Title
CN105913296B (en) Personalized recommendation method based on graph
Babbar et al. Dismec: Distributed sparse machines for extreme multi-label classification
Wu et al. Time matters: Multi-scale temporalization of social media popularity
Zhou et al. Subspace segmentation-based robust multiple kernel clustering
Cheng et al. HFS: Hierarchical feature selection for efficient image segmentation
Jia et al. Bagging-based spectral clustering ensemble selection
Truyen et al. Ordinal Boltzmann machines for collaborative filtering
US6260036B1 (en) Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
Rahman et al. Link prediction in dynamic networks using graphlet
EP3300002A1 (en) Method for determining the similarity of digital images
Zhao et al. Spectral clustering based on iterative optimization for large-scale and high-dimensional data
US20200226504A1 (en) Method and system for hierarchical forecasting
De Santo et al. A deep learning approach for semi-supervised community detection in online social networks
Xu et al. Bayesian deep matrix factorization network for multiple images denoising
Niu et al. One-step multi-view subspace clustering with incomplete views
Tong et al. A deep discriminative and robust nonnegative matrix factorization network method with soft label constraint
Liu et al. Multi-perspective User2Vec: Exploiting re-pin activity for user representation learning in content curation social network
Xiao et al. A survey of parallel clustering algorithms based on spark
Kim et al. Object synthesis by learning part geometry with surface and volumetric representations
Shi et al. CEGAT: A CNN and enhanced-GAT based on key sample selection strategy for hyperspectral image classification
Czech Invariants of distance k-graphs for graph embedding
Zhang et al. A GPU-accelerated non-negative sparse latent semantic analysis algorithm for social tagging data
Hu et al. Multi-view content-context information bottleneck for image clustering
CN116842267A (en) Personalized decoration scheme recommendation method, system and medium based on deep learning
Pandove et al. Local graph based correlation clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant