CN104391982A - Information recommendation method and information recommendation system - Google Patents

Information recommendation method and information recommendation system Download PDF

Info

Publication number
CN104391982A
CN104391982A CN201410746660.4A CN201410746660A CN104391982A CN 104391982 A CN104391982 A CN 104391982A CN 201410746660 A CN201410746660 A CN 201410746660A CN 104391982 A CN104391982 A CN 104391982A
Authority
CN
China
Prior art keywords
user
pagerank
matrix
vector
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410746660.4A
Other languages
Chinese (zh)
Other versions
CN104391982B (en
Inventor
黄通文
张俊林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANJET INFORMATION TECHNOLOGY Co Ltd
Original Assignee
CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANJET INFORMATION TECHNOLOGY Co Ltd filed Critical CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410746660.4A priority Critical patent/CN104391982B/en
Publication of CN104391982A publication Critical patent/CN104391982A/en
Application granted granted Critical
Publication of CN104391982B publication Critical patent/CN104391982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention provides an information recommendation method and an information recommendation system. The information recommendation method includes the processes: according to behavior log in a behavior log database of a server, generating a adjacency matrix; converting the adjacency matrix into a hyperlink matrix; according to the hyperlink matrix, selecting initial parameters for a preset PageRank model trainer of the server; according to the initial parameters, computing PageRank vectors through the preset PageRank model trainer, and recording iterations; outputting the iterated PageRank vectors from high to low. The calculation formula of the preset PageRank model trainer is shown in the following. Through the method and system, the problem about equal distribution of authority in an original classical PageRank algorithm and the problem of only consideration of outlink are solved, data iteration is quicker in practical application, different levels of authority for different users are considered, and accordingly the method and the system have higher search and recommendation quality in practical enterprise recommendation and search.

Description

Information recommendation method and information recommendation system
Technical field
The present invention relates to technical field of data processing, in particular to a kind of information recommendation method and a kind of information recommendation system.
Background technology
At present, user behaviors log in user job circle comprises many behavioural informations, comprise the interactive information of the interactive information of user and user, user and circle, but a large amount of behavioural informations is in the state initially do not excavated, we wish that excavating related data in subordinate act information goes improve search and recommend quality.What search of the prior art and recommendation mainly adopted is the integrated ordered mode that user behavior and query string participle mate with index.But there are following two shortcomings in recommendation of the prior art and search:
The first, for not having the recommendation of the user of behavioural information mainly or what adopt is the mode of index coupling, but which is not considered the behavioural information of colony or can not be recommended the user of " behavior often, have popularity, more authoritative ".
Second, although the data validity of enterprise is higher, redundance is little, when the data volume of searching for and recommend is larger time, user can adopt the method increasing the redundancies such as keyword to practise fraud in some field, then index entry is entered, thus deception search system.
Therefore need a kind of new technical scheme, the quality that user recommends can be promoted.
Summary of the invention
The present invention, just based on the problems referred to above, proposes a kind of new technical scheme, can promote the quality that user recommends.
In view of this, the embodiment of a first aspect of the present invention proposes a kind of information recommendation method, comprising: according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix; Described adjacency matrix is converted into hyperlink matrix; According to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter; According to described initial parameter, calculate PageRank vector by described default PageRank model trainer, and record iterations; The described PageRank vector after iteration is exported according to mode from high to low; Wherein, the computing formula of described default PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending described user A, C (T i) represent described arbitrary user T irecommend the total degree of other users, PR (T i) be described arbitrary user T idescribed PageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
PR ( A ) = 1 - α N + α · Σ i = 1 N PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T ii-th the source page (chain enters the page) pointing to webpage A, C (T i) be webpage T ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) ] 1 N + n Σ i = 1 n C ( T i ) Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T ithe user recommending user A, C (T i) represent user T irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described according to the user behaviors log in the user behaviors log database of described server, generate adjacency matrix, specifically comprise: from the described user behaviors log described user behaviors log database, extract all recommendation informations, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight; Described direct graph with weight is stored in described adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
PR ( i ) = ( 1 - α ) + α · Σ j ∈ B i PR ( j ) W ( j , i )
W(j,i)=W in(j,i)*W out(j,i)
W(j,i)=W in(j,i)*W out(j,i)
Wherein W in(j, i), W out(j, i) is defined as follows:
W in ( j , i ) = I i Σ k ∈ N j I k
W out ( j , i ) = O i Σ k ∈ N j O k
Wherein, N ithe set (namely chain goes out user's set) of all users that user i recommends, B iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W in(j,i)·W out(j,i)
Wherein, W in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link kdetermine, namely
W in ( j , i ) = I i Σ k ∈ N j I k
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, described described adjacency matrix is converted into hyperlink matrix, specifically comprise: described direct graph with weight is converted into described hyperlink matrix, wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, PageRank vector is calculated by described default PageRank model trainer described, and after recording iterations, comprise: judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with former described PageRank vector; When judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
The embodiment of a second aspect of the present invention proposes a kind of information recommendation system, comprising: information preprocessing unit, according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix; Matrix conversion unit, is converted into hyperlink matrix by described adjacency matrix; Parameter choose unit, according to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter; Training unit, according to described initial parameter, calculates PageRank vector by described default PageRank model trainer, and records iterations; Recommendation unit, exports the described PageRank vector after iteration according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending described user A, C (T i) represent described arbitrary user T irecommend the total degree of other users, PR (T i) be described arbitrary user T idescribed PageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
PR ( A ) = 1 - α N + α · Σ i = 1 N PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T ii-th the source page (chain enters the page) pointing to webpage A, C (T i) be webpage T ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) ] 1 N + n Σ i = 1 n C ( T i ) Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T ithe user recommending user A, C (T i) represent user T irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described information preprocessing unit comprises: direct graph with weight sets up unit, all recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, take the number of times recommended as the weights on described limit, set up a direct graph with weight; Storage unit, is stored in described direct graph with weight in described adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
PR ( i ) = ( 1 - α ) + α · Σ j ∈ B i PR ( j ) W ( j , i )
W(j,i)=W in(j,i)*W out(j,i)
W(j,i)=W in(j,i)*W out(j,i)
Wherein W in(j, i), W out(j, i) is defined as follows:
W in ( j , i ) = I i Σ k ∈ N j I k
W out ( j , i ) = O i Σ k ∈ N j O k
Wherein, N ithe set (namely chain goes out user's set) of all users that user i recommends, B iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W in(j,i)·W out(j,i)
Wherein, W in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link kdetermine, namely
W in ( j , i ) = I i Σ k ∈ N j I k
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, described matrix conversion unit specifically for: described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, also comprise: judging unit, after described training unit completes training, judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with the described PageRank vector of last iteration; And when judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
By technical scheme of the present invention, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
Accompanying drawing explanation
Fig. 1 shows the process flow diagram of information recommendation method according to an embodiment of the invention;
Fig. 2 shows the process flow diagram of information recommendation method according to another embodiment of the invention;
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
Embodiment
In order to more clearly understand above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the drawings and specific embodiments, the present invention is further described in detail.It should be noted that, when not conflicting, the feature in the embodiment of the application and embodiment can combine mutually.
Set forth a lot of detail in the following description so that fully understand the present invention; but; the present invention can also adopt other to be different from other modes described here and implement, and therefore, protection scope of the present invention is not by the restriction of following public specific embodiment.
Fig. 1 shows the process flow diagram of information recommendation method according to an embodiment of the invention.
As shown in Figure 1, information recommendation method according to an embodiment of the invention, comprising:
Step 102, according to the user behaviors log in the user behaviors log database of server, generates adjacency matrix.
Step 104, is converted into hyperlink matrix by adjacency matrix.
Step 106, according to hyperlink matrix, for the default PageRank model trainer of server chooses initial parameter.
Step 108, according to initial parameter, calculates PageRank vector by default PageRank model trainer, and records iterations.
Step 110, exports the vector of the PageRank after iteration according to mode from high to low; Wherein, the computing formula presetting PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending user A, C (T i) represent arbitrary user T irecommend the total degree of other users, PR (T i) be arbitrary user T ipageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
PR ( A ) = 1 - α N + α · Σ i = 1 N PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T ii-th the source page (chain enters the page) pointing to webpage A, C (T i) be webpage T ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) ] 1 N + n Σ i = 1 n C ( T i ) Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T ithe user recommending user A, C (T i) represent that user Ti recommends the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, step 102 specifically comprises: extract all recommendation informations in the user behaviors log in subordinate act log database, with each user of server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is limit, set up a direct graph with weight; Direct graph with weight is stored in adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
PR ( i ) = ( 1 - α ) + α · Σ j ∈ B i PR ( j ) W ( j , i )
W(j,i)=W in(j,i)*W out(j,i)
W(j,i)=W in(j,i)*W out(j,i)
Wherein W in(j, i), W out(j, i) is defined as follows:
W in ( j , i ) = I i Σ k ∈ N j I k
W out ( j , i ) = O i Σ k ∈ N j O k
Wherein, N ithe set (namely chain goes out user's set) of all users that user i recommends, B iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W in(j,i)·W out(j,i)
Wherein, W in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link kdetermine, namely
W in ( j , i ) = I i Σ k ∈ N j I k
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, step 104 specifically comprises: direct graph with weight is converted into hyperlink matrix, and wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is hyperlink matrix, and i is arbitrary user, and colSum (i) recommends the total degree of other users for the arbitrary user in adjacency matrix, and n is the total number of persons relating to recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, after step 108, comprising: judge whether iterations exceedes predetermined iterations threshold value, and judge whether PageRank vector exceeds predetermined iteration precision with former PageRank vector; When judged result is all for being, continues through default PageRank model trainer and carrying out iterative operation, otherwise, export the vector of the PageRank after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, initial parameter comprises iterative vectorized, random damping factor, predetermined iteration precision and predetermined iterations threshold value.
Fig. 2 shows the process flow diagram of information recommendation method according to another embodiment of the invention.
As shown in Figure 2, information recommendation method according to another embodiment of the invention, comprises the following steps:
Step 202, subordinate act log database obtains user behaviors log.
Step 204, carries out pre-service to the data of user behaviors log, then the data assembling processed is become chain matrice/connection table.
Step 206, is processed into hyperlink matrix the chain matrice/connection table after process.
Step 208, for the default PageRank model trainer of server chooses initial parameter, parameters comprises: ratio of damping alpha, iteration precision eps, iteration threshold thresHold and initial vector V0.
Step 210, is input to hyperlink matrix and each initial parameter in default PageRank model trainer.
Step 212, calculates PageRank vector V by default PageRank model trainer, and records iterations count.
Step 214, judge PageRank vector value whether within the scope of iteration precision or iterations whether exceed iterations, namely whether meet count=count+1, or whether meet V0=V, when judged result is for being, enter step 216, otherwise, return step 210.
Step 216, exports the PageRank vector value V after iteration according to mode from high to low.
Below in conjunction with the information recommendation method of embody rule scene description one embodiment of the present of invention.
Extract the building ring user behaviors log of on 07 10th, 2014 on 08 21st, 2014, two kinds of improvement projects are used to contrast with classical PageRank model, as shown in table 1, model after improvement at identical conditions, the iteration precision of specifying can be converged to quickly, the recommendation or the search that there is popularity, more authoritative user carries out building ring can be selected more accurately.Consider that relate to user uses recommendation behavior privacy in building ring, only lists in experimental situation about the selection of parameter and final iteration precision and number of times here.
Table 1
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
As shown in Figure 3, information recommendation system 300 according to an embodiment of the invention, comprising: information preprocessing unit 302, according to the user behaviors log in the user behaviors log database of server, generates adjacency matrix; Matrix conversion unit 304, is converted into hyperlink matrix by adjacency matrix; Parameter choose unit 306, according to hyperlink matrix, for the default PageRank model trainer of server chooses initial parameter; Training unit 308, according to initial parameter, calculates PageRank vector by default PageRank model trainer, and records iterations; Recommendation unit 310, exports the vector of the PageRank after iteration according to mode from high to low; Wherein,
The computing formula presetting PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending user A, C (T i) represent arbitrary user T irecommend the total degree of other users, PR (T i) be arbitrary user T ipageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
PR ( A ) = 1 - α N + α · Σ i = 1 N PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T ii-th the source page (chain enters the page) pointing to webpage A, C (T i) be webpage T ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) ] 1 N + n Σ i = 1 n C ( T i ) Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T ithe user recommending user A, C (T i) represent user T irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, information preprocessing unit 302 comprises: direct graph with weight sets up unit 3022, all recommendation informations are extracted in user behaviors log in subordinate act log database, with each user of server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, the weights being limit with the number of times recommended, set up a direct graph with weight; Storage unit 3024, is stored in direct graph with weight in adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
PR ( i ) = ( 1 - α ) + α · Σ j ∈ B i PR ( j ) W ( j , i )
W(j,i)=W in(j,i)*W out(j,i)
W(j,i)=W in(j,i)*W out(j,i)
Wherein W in(j, i), W out(j, i) is defined as follows:
W in ( j , i ) = I i Σ k ∈ N j I k
W out ( j , i ) = O i Σ k ∈ N j O k
Wherein, N ithe set (namely chain goes out user's set) of all users that user i recommends, B iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W in(j,i)·W out(j,i)
Wherein, W in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link kdetermine, namely
W in ( j , i ) = I i Σ k ∈ N j I k
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, matrix conversion unit 304 specifically for: direct graph with weight is converted into hyperlink matrix, wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is hyperlink matrix, and i is arbitrary user, and colSum (i) recommends the total degree of other users for the arbitrary user in adjacency matrix, and n is the total number of persons relating to recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, also comprise: judging unit 312, after training unit 308 completes training, judge whether iterations exceedes predetermined iterations threshold value, and judge whether PageRank vector exceeds predetermined iteration precision with the PageRank vector of last iteration; And when judged result is all for being, continues through default PageRank model trainer and carrying out iterative operation, otherwise, export the vector of the PageRank after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, initial parameter comprises iterative vectorized, random damping factor, predetermined iteration precision and predetermined iterations threshold value.
More than be described with reference to the accompanying drawings technical scheme of the present invention, by technical scheme of the present invention, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. an information recommendation method, is characterized in that, comprising:
According to the user behaviors log in the user behaviors log database of described server, generate adjacency matrix;
Described adjacency matrix is converted into hyperlink matrix;
According to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter;
According to described initial parameter, calculate PageRank vector by described default PageRank model trainer, and record iterations;
The described PageRank vector after iteration is exported according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending described user A, C (T i) represent described arbitrary user T irecommend the total degree of other users, PR (T i) be described arbitrary user T idescribed PageRank vector, i=1,2 ..., n.
2. information recommendation method according to claim 1, is characterized in that, described according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix, specifically comprises:
All recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight;
Described direct graph with weight is stored in described adjacency matrix.
3. information recommendation method according to claim 2, is characterized in that, described described adjacency matrix is converted into hyperlink matrix, specifically comprises:
Described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
4. information recommendation method according to any one of claim 1 to 3, is characterized in that, calculates PageRank vector, and after recording iterations, comprising described by described default PageRank model trainer:
Judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with former described PageRank vector;
When judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
5. information recommendation method according to claim 4, is characterized in that, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
6. an information recommendation system, is characterized in that, comprising:
Information preprocessing unit, according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix;
Matrix conversion unit, is converted into hyperlink matrix by described adjacency matrix;
Parameter choose unit, according to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter;
Training unit, according to described initial parameter, calculates PageRank vector by described default PageRank model trainer, and records iterations;
Recommendation unit, exports the described PageRank vector after iteration according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
PR ( A ) = [ 1 - n Σ i = 1 n C ( T i ) · α ] 1 N + n Σ i = 1 n C ( T i ) · α Σ i = 1 n PR ( T i ) C ( T i )
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T ithe arbitrary user recommending described user A, C (T i) represent described arbitrary user T irecommend the total degree of other users, PR (T i) be described arbitrary user T idescribed PageRank vector, i=1,2 ..., n.
7. information recommendation system according to claim 6, is characterized in that, described information preprocessing unit comprises:
Direct graph with weight sets up unit, all recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight;
Storage unit, is stored in described direct graph with weight in described adjacency matrix.
8. information recommendation system according to claim 7, is characterized in that, described matrix conversion unit specifically for:
Described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
H ( i , j ) = Adj ( i , j ) colSum ( i ) , when . . colSum ( i ) ! = 0 1 n , otherwise .
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
9. the information recommendation system according to any one of claim 6 to 8, is characterized in that, also comprises:
Judging unit, after described training unit completes training, judges whether described iterations exceedes predetermined iterations threshold value, and judges whether described PageRank vector exceeds predetermined iteration precision with the described PageRank vector of last iteration; And when judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
10. information recommendation system according to claim 9, is characterized in that, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
CN201410746660.4A 2014-12-08 2014-12-08 Information recommendation method and information recommendation system Active CN104391982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410746660.4A CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410746660.4A CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Publications (2)

Publication Number Publication Date
CN104391982A true CN104391982A (en) 2015-03-04
CN104391982B CN104391982B (en) 2018-07-20

Family

ID=52609886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410746660.4A Active CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Country Status (1)

Country Link
CN (1) CN104391982B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991592A (en) * 2017-03-22 2017-07-28 南京财经大学 A kind of personalized recommendation method based on purchase user behavior analysis
CN108536590A (en) * 2018-02-09 2018-09-14 武汉楚鼎信息技术有限公司 A kind of method and system device of system service significance level grading
TWI739359B (en) * 2019-03-28 2021-09-11 南韓商韓領有限公司 Computer-implemented system, computer-implemented method for arranging hyperlinks on a graphical user-interface and non-transitory computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040193698A1 (en) * 2003-03-24 2004-09-30 Sadasivuni Lakshminarayana Method for finding convergence of ranking of web page
CN102270246A (en) * 2011-09-08 2011-12-07 胡辉 Method for calculating importance of web page
CN102799671A (en) * 2012-07-17 2012-11-28 西安电子科技大学 Network individual recommendation method based on PageRank algorithm

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040193698A1 (en) * 2003-03-24 2004-09-30 Sadasivuni Lakshminarayana Method for finding convergence of ranking of web page
CN102270246A (en) * 2011-09-08 2011-12-07 胡辉 Method for calculating importance of web page
CN102799671A (en) * 2012-07-17 2012-11-28 西安电子科技大学 Network individual recommendation method based on PageRank algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
邵晶晶等: "PageRank的改进算法——调整阻尼因子", 《应用数学 增刊》 *
郭晔等: "基于海量数据挖掘的个性化推荐系统", 《西北大学学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991592A (en) * 2017-03-22 2017-07-28 南京财经大学 A kind of personalized recommendation method based on purchase user behavior analysis
CN108536590A (en) * 2018-02-09 2018-09-14 武汉楚鼎信息技术有限公司 A kind of method and system device of system service significance level grading
TWI739359B (en) * 2019-03-28 2021-09-11 南韓商韓領有限公司 Computer-implemented system, computer-implemented method for arranging hyperlinks on a graphical user-interface and non-transitory computer-readable storage medium
US11328328B2 (en) 2019-03-28 2022-05-10 Coupang Corp. Computer-implemented method for arranging hyperlinks on a grapical user-interface

Also Published As

Publication number Publication date
CN104391982B (en) 2018-07-20

Similar Documents

Publication Publication Date Title
Mei et al. Divrank: the interplay of prestige and diversity in information networks
CA2805391C (en) Determining relevant information for domains of interest
Xue et al. Scalable collaborative filtering using cluster-based smoothing
CA2716062C (en) Determining relevant information for domains of interest
US8412726B2 (en) Related links recommendation
CN102799647B (en) Method and device for webpage reduplication deletion
US7895195B2 (en) Method and apparatus for constructing a link structure between documents
CN102063469B (en) Method and device for acquiring relevant keyword message and computer equipment
Bendersky et al. Learning from user interactions in personal search via attribute parameterization
CN103164521A (en) Keyword calculation method and device based on user browse and search actions
CN103870505A (en) Query term recommending method and query term recommending system
CN104615779A (en) Method for personalized recommendation of Web text
Shakery et al. Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments.
Bayraktar et al. Equilibrium concepts for time‐inconsistent stopping problems in continuous time
CN103455487A (en) Extracting method and device for search term
Du et al. An approach for selecting seed URLs of focused crawler based on user-interest ontology
CN105389329A (en) Open source software recommendation method based on group comments
CN103530416A (en) Project data forecasting grading library generating and project data pushing method and project data forecasting grading library generating and project data pushing system
Hu et al. Hybrid recommendation algorithm based on latent factor model and PersonalRank
CN104391982A (en) Information recommendation method and information recommendation system
Pandey et al. Crawl ordering by search impact
Kang et al. Learning to re-rank web search results with multiple pairwise features
CN106599304B (en) Modular user retrieval intention modeling method for small and medium-sized websites
CN104794135A (en) Method and device for carrying out sorting on search results
CN103914490A (en) Webpage running method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant