CN104391982A - Information recommendation method and information recommendation system - Google Patents
Information recommendation method and information recommendation system Download PDFInfo
- Publication number
- CN104391982A CN104391982A CN201410746660.4A CN201410746660A CN104391982A CN 104391982 A CN104391982 A CN 104391982A CN 201410746660 A CN201410746660 A CN 201410746660A CN 104391982 A CN104391982 A CN 104391982A
- Authority
- CN
- China
- Prior art keywords
- user
- pagerank
- matrix
- vector
- sigma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention provides an information recommendation method and an information recommendation system. The information recommendation method includes the processes: according to behavior log in a behavior log database of a server, generating a adjacency matrix; converting the adjacency matrix into a hyperlink matrix; according to the hyperlink matrix, selecting initial parameters for a preset PageRank model trainer of the server; according to the initial parameters, computing PageRank vectors through the preset PageRank model trainer, and recording iterations; outputting the iterated PageRank vectors from high to low. The calculation formula of the preset PageRank model trainer is shown in the following. Through the method and system, the problem about equal distribution of authority in an original classical PageRank algorithm and the problem of only consideration of outlink are solved, data iteration is quicker in practical application, different levels of authority for different users are considered, and accordingly the method and the system have higher search and recommendation quality in practical enterprise recommendation and search.
Description
Technical field
The present invention relates to technical field of data processing, in particular to a kind of information recommendation method and a kind of information recommendation system.
Background technology
At present, user behaviors log in user job circle comprises many behavioural informations, comprise the interactive information of the interactive information of user and user, user and circle, but a large amount of behavioural informations is in the state initially do not excavated, we wish that excavating related data in subordinate act information goes improve search and recommend quality.What search of the prior art and recommendation mainly adopted is the integrated ordered mode that user behavior and query string participle mate with index.But there are following two shortcomings in recommendation of the prior art and search:
The first, for not having the recommendation of the user of behavioural information mainly or what adopt is the mode of index coupling, but which is not considered the behavioural information of colony or can not be recommended the user of " behavior often, have popularity, more authoritative ".
Second, although the data validity of enterprise is higher, redundance is little, when the data volume of searching for and recommend is larger time, user can adopt the method increasing the redundancies such as keyword to practise fraud in some field, then index entry is entered, thus deception search system.
Therefore need a kind of new technical scheme, the quality that user recommends can be promoted.
Summary of the invention
The present invention, just based on the problems referred to above, proposes a kind of new technical scheme, can promote the quality that user recommends.
In view of this, the embodiment of a first aspect of the present invention proposes a kind of information recommendation method, comprising: according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix; Described adjacency matrix is converted into hyperlink matrix; According to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter; According to described initial parameter, calculate PageRank vector by described default PageRank model trainer, and record iterations; The described PageRank vector after iteration is exported according to mode from high to low; Wherein, the computing formula of described default PageRank model trainer is:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending described user A, C (T
i) represent described arbitrary user T
irecommend the total degree of other users, PR (T
i) be described arbitrary user T
idescribed PageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
Wherein, the described PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T
ii-th the source page (chain enters the page) pointing to webpage A, C (T
i) be webpage T
ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T
ithe user recommending user A, C (T
i) represent user T
irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described according to the user behaviors log in the user behaviors log database of described server, generate adjacency matrix, specifically comprise: from the described user behaviors log described user behaviors log database, extract all recommendation informations, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight; Described direct graph with weight is stored in described adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T
i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
W(j,i)=W
in(j,i)*W
out(j,i)
W(j,i)=W
in(j,i)*W
out(j,i)
Wherein W
in(j, i), W
out(j, i) is defined as follows:
Wherein, N
ithe set (namely chain goes out user's set) of all users that user i recommends, B
iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i
i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W
in(j,i)·W
out(j,i)
Wherein, W
in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i
iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link
kdetermine, namely
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, described described adjacency matrix is converted into hyperlink matrix, specifically comprise: described direct graph with weight is converted into described hyperlink matrix, wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, PageRank vector is calculated by described default PageRank model trainer described, and after recording iterations, comprise: judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with former described PageRank vector; When judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
The embodiment of a second aspect of the present invention proposes a kind of information recommendation system, comprising: information preprocessing unit, according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix; Matrix conversion unit, is converted into hyperlink matrix by described adjacency matrix; Parameter choose unit, according to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter; Training unit, according to described initial parameter, calculates PageRank vector by described default PageRank model trainer, and records iterations; Recommendation unit, exports the described PageRank vector after iteration according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending described user A, C (T
i) represent described arbitrary user T
irecommend the total degree of other users, PR (T
i) be described arbitrary user T
idescribed PageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
Wherein, the described PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T
ii-th the source page (chain enters the page) pointing to webpage A, C (T
i) be webpage T
ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T
ithe user recommending user A, C (T
i) represent user T
irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described information preprocessing unit comprises: direct graph with weight sets up unit, all recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, take the number of times recommended as the weights on described limit, set up a direct graph with weight; Storage unit, is stored in described direct graph with weight in described adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T
i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
W(j,i)=W
in(j,i)*W
out(j,i)
W(j,i)=W
in(j,i)*W
out(j,i)
Wherein W
in(j, i), W
out(j, i) is defined as follows:
Wherein, N
ithe set (namely chain goes out user's set) of all users that user i recommends, B
iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i
i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W
in(j,i)·W
out(j,i)
Wherein, W
in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i
iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link
kdetermine, namely
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, described matrix conversion unit specifically for: described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, also comprise: judging unit, after described training unit completes training, judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with the described PageRank vector of last iteration; And when judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
By technical scheme of the present invention, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
Accompanying drawing explanation
Fig. 1 shows the process flow diagram of information recommendation method according to an embodiment of the invention;
Fig. 2 shows the process flow diagram of information recommendation method according to another embodiment of the invention;
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
Embodiment
In order to more clearly understand above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the drawings and specific embodiments, the present invention is further described in detail.It should be noted that, when not conflicting, the feature in the embodiment of the application and embodiment can combine mutually.
Set forth a lot of detail in the following description so that fully understand the present invention; but; the present invention can also adopt other to be different from other modes described here and implement, and therefore, protection scope of the present invention is not by the restriction of following public specific embodiment.
Fig. 1 shows the process flow diagram of information recommendation method according to an embodiment of the invention.
As shown in Figure 1, information recommendation method according to an embodiment of the invention, comprising:
Step 102, according to the user behaviors log in the user behaviors log database of server, generates adjacency matrix.
Step 104, is converted into hyperlink matrix by adjacency matrix.
Step 106, according to hyperlink matrix, for the default PageRank model trainer of server chooses initial parameter.
Step 108, according to initial parameter, calculates PageRank vector by default PageRank model trainer, and records iterations.
Step 110, exports the vector of the PageRank after iteration according to mode from high to low; Wherein, the computing formula presetting PageRank model trainer is:
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending user A, C (T
i) represent arbitrary user T
irecommend the total degree of other users, PR (T
i) be arbitrary user T
ipageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
Wherein, the PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T
ii-th the source page (chain enters the page) pointing to webpage A, C (T
i) be webpage T
ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T
ithe user recommending user A, C (T
i) represent that user Ti recommends the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, step 102 specifically comprises: extract all recommendation informations in the user behaviors log in subordinate act log database, with each user of server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is limit, set up a direct graph with weight; Direct graph with weight is stored in adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T
i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
W(j,i)=W
in(j,i)*W
out(j,i)
W(j,i)=W
in(j,i)*W
out(j,i)
Wherein W
in(j, i), W
out(j, i) is defined as follows:
Wherein, N
ithe set (namely chain goes out user's set) of all users that user i recommends, B
iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i
i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W
in(j,i)·W
out(j,i)
Wherein, W
in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i
iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link
kdetermine, namely
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, step 104 specifically comprises: direct graph with weight is converted into hyperlink matrix, and wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is hyperlink matrix, and i is arbitrary user, and colSum (i) recommends the total degree of other users for the arbitrary user in adjacency matrix, and n is the total number of persons relating to recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, after step 108, comprising: judge whether iterations exceedes predetermined iterations threshold value, and judge whether PageRank vector exceeds predetermined iteration precision with former PageRank vector; When judged result is all for being, continues through default PageRank model trainer and carrying out iterative operation, otherwise, export the vector of the PageRank after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, initial parameter comprises iterative vectorized, random damping factor, predetermined iteration precision and predetermined iterations threshold value.
Fig. 2 shows the process flow diagram of information recommendation method according to another embodiment of the invention.
As shown in Figure 2, information recommendation method according to another embodiment of the invention, comprises the following steps:
Step 202, subordinate act log database obtains user behaviors log.
Step 204, carries out pre-service to the data of user behaviors log, then the data assembling processed is become chain matrice/connection table.
Step 206, is processed into hyperlink matrix the chain matrice/connection table after process.
Step 208, for the default PageRank model trainer of server chooses initial parameter, parameters comprises: ratio of damping alpha, iteration precision eps, iteration threshold thresHold and initial vector V0.
Step 210, is input to hyperlink matrix and each initial parameter in default PageRank model trainer.
Step 212, calculates PageRank vector V by default PageRank model trainer, and records iterations count.
Step 214, judge PageRank vector value whether within the scope of iteration precision or iterations whether exceed iterations, namely whether meet count=count+1, or whether meet V0=V, when judged result is for being, enter step 216, otherwise, return step 210.
Step 216, exports the PageRank vector value V after iteration according to mode from high to low.
Below in conjunction with the information recommendation method of embody rule scene description one embodiment of the present of invention.
Extract the building ring user behaviors log of on 07 10th, 2014 on 08 21st, 2014, two kinds of improvement projects are used to contrast with classical PageRank model, as shown in table 1, model after improvement at identical conditions, the iteration precision of specifying can be converged to quickly, the recommendation or the search that there is popularity, more authoritative user carries out building ring can be selected more accurately.Consider that relate to user uses recommendation behavior privacy in building ring, only lists in experimental situation about the selection of parameter and final iteration precision and number of times here.
Table 1
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
As shown in Figure 3, information recommendation system 300 according to an embodiment of the invention, comprising: information preprocessing unit 302, according to the user behaviors log in the user behaviors log database of server, generates adjacency matrix; Matrix conversion unit 304, is converted into hyperlink matrix by adjacency matrix; Parameter choose unit 306, according to hyperlink matrix, for the default PageRank model trainer of server chooses initial parameter; Training unit 308, according to initial parameter, calculates PageRank vector by default PageRank model trainer, and records iterations; Recommendation unit 310, exports the vector of the PageRank after iteration according to mode from high to low; Wherein,
The computing formula presetting PageRank model trainer is:
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending user A, C (T
i) represent arbitrary user T
irecommend the total degree of other users, PR (T
i) be arbitrary user T
ipageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, and the PageRank model mentioning its classics is following form:
Wherein, the PageRank vector that PR (A) is recommended user A, N is the sum of webpage, wherein webpage T
ii-th the source page (chain enters the page) pointing to webpage A, C (T
i) be webpage T
ichain page-out out-degree sum, i=1,2 ..., n.The implication of this model refers to that user rests on certain page, may carry out browsing pages at random with the probability of 1-α/N, may with α probability follows links browsing pages.
In the inventive solutions, for the problem of the mean allocation of damping factor α random in above-mentioned prior art and for only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, different for its random damping factor (ratio of damping) of different webpages, such as, behavior often, there is popularity, more authoritative user is easier to be more recommended than those users that recommendation is less, reputation is lower, so, can be by classical PageRank model refinement:
Wherein, the PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending user A, and N is the total number of persons relating to recommendation behavior, user T
ithe user recommending user A, C (T
i) represent user T
irecommend the total degree of other users.
Like this, random damping factor becomes from a fixed value value constantly changed for different levels user.But such improved model also brings certain problem, do not meet random surfing mode during Google proposition algorithm, make user can not go the size of the random damping factor of manual control, so, can by this model refinement be further:
Above-mentioned improvement with the addition of random damping factor, adds the factor that PageRank proposes random surfing, also solves the problem that different user distributes different authority value simultaneously.
Therefore, in this technical scheme, by being adjacency matrix by data prediction, adjacency matrix is transferred to hyperlink matrix, choose initial parameter, set up the steps such as PageRank model trainer, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, information preprocessing unit 302 comprises: direct graph with weight sets up unit 3022, all recommendation informations are extracted in user behaviors log in subordinate act log database, with each user of server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, the weights being limit with the number of times recommended, set up a direct graph with weight; Storage unit 3024, is stored in direct graph with weight in adjacency matrix.
In this technical scheme, to only considering out-degree problem (C (T
i)) each AT user problem of giving equal weight improve.
In the middle of the classical model that Google proposes, only take into account the impact of the out-degree that chain goes out, i.e. C (T
i), wherein, each by user A the out-degree of user of recommending impart identical weight.Consider the impact of out-degree, classical model improved, following chain can be obtained and enter chain and go out model:
W(j,i)=W
in(j,i)*W
out(j,i)
W(j,i)=W
in(j,i)*W
out(j,i)
Wherein W
in(j, i), W
out(j, i) is defined as follows:
Wherein, N
ithe set (namely chain goes out user's set) of all users that user i recommends, B
iit is the set (set of chain access customer) of all users pointing to user i.Above-mentioned model has description below: compare the user i with popularity for one, and user j belongs to the chain access customer set B of user i
i, then the weight w (j, i) linking link (j, i) and should have all users of linking relationship relevant to user i with all users of user j recommendation, namely
W(j,i)=W
in(j,i)·W
out(j,i)
Wherein, W
in(j, i) refers to the associated weight that the user relevant with Link (j, i) recommends other users to link, the value I recommending other users to link by user i
iwith the value I that user k (k belongs to the set of all users of recommendation user j) recommends other users to link
kdetermine, namely
So, the chain that the application proposes enters chain and goes out model and to solve in digraph each problem recommending to impart identical weight, algorithm after improvement can calculate according to some recommendation behaviors of each associated user with recommending, for recommending to give different weights at every turn, like this, just by recommended often, high-quality user combines with PageRank algorithm, improves the validity of PageRank algorithm.
According to one embodiment of present invention, matrix conversion unit 304 specifically for: direct graph with weight is converted into hyperlink matrix, wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is hyperlink matrix, and i is arbitrary user, and colSum (i) recommends the total degree of other users for the arbitrary user in adjacency matrix, and n is the total number of persons relating to recommendation behavior.
In this technical scheme, adjacency matrix is converted into hyperlink matrix, is convenient to carry out PageRank model training according to initial parameter and the hyperlink matrix after transforming further.
According to one embodiment of present invention, also comprise: judging unit 312, after training unit 308 completes training, judge whether iterations exceedes predetermined iterations threshold value, and judge whether PageRank vector exceeds predetermined iteration precision with the PageRank vector of last iteration; And when judged result is all for being, continues through default PageRank model trainer and carrying out iterative operation, otherwise, export the vector of the PageRank after iteration according to mode from high to low.
In this technical scheme, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
According to one embodiment of present invention, initial parameter comprises iterative vectorized, random damping factor, predetermined iteration precision and predetermined iterations threshold value.
More than be described with reference to the accompanying drawings technical scheme of the present invention, by technical scheme of the present invention, to the authority value mean allocation problem in the PageRank algorithm of former classics with only consider that the problem chain is improved, iteration is carried out by the PageRank model trainer preset, make iterative data in actual applications faster, and more can consider that different user has the technorati authority of different levels, thus recommend in the enterprise of reality and there is in search higher search and recommend quality, improve work efficiency and the Consumer's Experience of system.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (10)
1. an information recommendation method, is characterized in that, comprising:
According to the user behaviors log in the user behaviors log database of described server, generate adjacency matrix;
Described adjacency matrix is converted into hyperlink matrix;
According to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter;
According to described initial parameter, calculate PageRank vector by described default PageRank model trainer, and record iterations;
The described PageRank vector after iteration is exported according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending described user A, C (T
i) represent described arbitrary user T
irecommend the total degree of other users, PR (T
i) be described arbitrary user T
idescribed PageRank vector, i=1,2 ..., n.
2. information recommendation method according to claim 1, is characterized in that, described according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix, specifically comprises:
All recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight;
Described direct graph with weight is stored in described adjacency matrix.
3. information recommendation method according to claim 2, is characterized in that, described described adjacency matrix is converted into hyperlink matrix, specifically comprises:
Described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
4. information recommendation method according to any one of claim 1 to 3, is characterized in that, calculates PageRank vector, and after recording iterations, comprising described by described default PageRank model trainer:
Judge whether described iterations exceedes predetermined iterations threshold value, and judge whether described PageRank vector exceeds predetermined iteration precision with former described PageRank vector;
When judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
5. information recommendation method according to claim 4, is characterized in that, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
6. an information recommendation system, is characterized in that, comprising:
Information preprocessing unit, according to the user behaviors log in the user behaviors log database of described server, generates adjacency matrix;
Matrix conversion unit, is converted into hyperlink matrix by described adjacency matrix;
Parameter choose unit, according to described hyperlink matrix, for the default PageRank model trainer of described server chooses initial parameter;
Training unit, according to described initial parameter, calculates PageRank vector by described default PageRank model trainer, and records iterations;
Recommendation unit, exports the described PageRank vector after iteration according to mode from high to low; Wherein,
The computing formula of described default PageRank model trainer is:
Wherein, the described PageRank vector that PR (A) is recommended user A, n is the sum of all users recommending described user A, and N is the total number of persons relating to recommendation behavior, T
ithe arbitrary user recommending described user A, C (T
i) represent described arbitrary user T
irecommend the total degree of other users, PR (T
i) be described arbitrary user T
idescribed PageRank vector, i=1,2 ..., n.
7. information recommendation system according to claim 6, is characterized in that, described information preprocessing unit comprises:
Direct graph with weight sets up unit, all recommendation informations are extracted from the described user behaviors log described user behaviors log database, with each user of described server for node, to recommend the user of business for starting point, and with recommended user for terminal sets up limit, with the weights that the number of times recommended is described limit, set up a direct graph with weight;
Storage unit, is stored in described direct graph with weight in described adjacency matrix.
8. information recommendation system according to claim 7, is characterized in that, described matrix conversion unit specifically for:
Described direct graph with weight is converted into described hyperlink matrix, and wherein, the computing formula carrying out transforming is:
Wherein, H (i, j) is described hyperlink matrix, and i is described arbitrary user, and colSum (i) recommends the total degree of other users described for the described arbitrary user in described adjacency matrix, and n is the total number of persons relating to described recommendation behavior.
9. the information recommendation system according to any one of claim 6 to 8, is characterized in that, also comprises:
Judging unit, after described training unit completes training, judges whether described iterations exceedes predetermined iterations threshold value, and judges whether described PageRank vector exceeds predetermined iteration precision with the described PageRank vector of last iteration; And when judged result is all for being, continues through described default PageRank model trainer and carrying out iterative operation, otherwise, export the described PageRank vector after iteration according to mode from high to low.
10. information recommendation system according to claim 9, is characterized in that, described initial parameter comprises iterative vectorized, random damping factor, described predetermined iteration precision and described predetermined iterations threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410746660.4A CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410746660.4A CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391982A true CN104391982A (en) | 2015-03-04 |
CN104391982B CN104391982B (en) | 2018-07-20 |
Family
ID=52609886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410746660.4A Active CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391982B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991592A (en) * | 2017-03-22 | 2017-07-28 | 南京财经大学 | A kind of personalized recommendation method based on purchase user behavior analysis |
CN108536590A (en) * | 2018-02-09 | 2018-09-14 | 武汉楚鼎信息技术有限公司 | A kind of method and system device of system service significance level grading |
TWI739359B (en) * | 2019-03-28 | 2021-09-11 | 南韓商韓領有限公司 | Computer-implemented system, computer-implemented method for arranging hyperlinks on a graphical user-interface and non-transitory computer-readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193698A1 (en) * | 2003-03-24 | 2004-09-30 | Sadasivuni Lakshminarayana | Method for finding convergence of ranking of web page |
CN102270246A (en) * | 2011-09-08 | 2011-12-07 | 胡辉 | Method for calculating importance of web page |
CN102799671A (en) * | 2012-07-17 | 2012-11-28 | 西安电子科技大学 | Network individual recommendation method based on PageRank algorithm |
-
2014
- 2014-12-08 CN CN201410746660.4A patent/CN104391982B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193698A1 (en) * | 2003-03-24 | 2004-09-30 | Sadasivuni Lakshminarayana | Method for finding convergence of ranking of web page |
CN102270246A (en) * | 2011-09-08 | 2011-12-07 | 胡辉 | Method for calculating importance of web page |
CN102799671A (en) * | 2012-07-17 | 2012-11-28 | 西安电子科技大学 | Network individual recommendation method based on PageRank algorithm |
Non-Patent Citations (2)
Title |
---|
邵晶晶等: "PageRank的改进算法——调整阻尼因子", 《应用数学 增刊》 * |
郭晔等: "基于海量数据挖掘的个性化推荐系统", 《西北大学学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991592A (en) * | 2017-03-22 | 2017-07-28 | 南京财经大学 | A kind of personalized recommendation method based on purchase user behavior analysis |
CN108536590A (en) * | 2018-02-09 | 2018-09-14 | 武汉楚鼎信息技术有限公司 | A kind of method and system device of system service significance level grading |
TWI739359B (en) * | 2019-03-28 | 2021-09-11 | 南韓商韓領有限公司 | Computer-implemented system, computer-implemented method for arranging hyperlinks on a graphical user-interface and non-transitory computer-readable storage medium |
US11328328B2 (en) | 2019-03-28 | 2022-05-10 | Coupang Corp. | Computer-implemented method for arranging hyperlinks on a grapical user-interface |
Also Published As
Publication number | Publication date |
---|---|
CN104391982B (en) | 2018-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mei et al. | Divrank: the interplay of prestige and diversity in information networks | |
CA2805391C (en) | Determining relevant information for domains of interest | |
Xue et al. | Scalable collaborative filtering using cluster-based smoothing | |
CA2716062C (en) | Determining relevant information for domains of interest | |
US8412726B2 (en) | Related links recommendation | |
CN102799647B (en) | Method and device for webpage reduplication deletion | |
US7895195B2 (en) | Method and apparatus for constructing a link structure between documents | |
CN102063469B (en) | Method and device for acquiring relevant keyword message and computer equipment | |
Bendersky et al. | Learning from user interactions in personal search via attribute parameterization | |
CN103164521A (en) | Keyword calculation method and device based on user browse and search actions | |
CN103870505A (en) | Query term recommending method and query term recommending system | |
CN104615779A (en) | Method for personalized recommendation of Web text | |
Shakery et al. | Relevance Propagation for Topic Distillation UIUC TREC 2003 Web Track Experiments. | |
Bayraktar et al. | Equilibrium concepts for time‐inconsistent stopping problems in continuous time | |
CN103455487A (en) | Extracting method and device for search term | |
Du et al. | An approach for selecting seed URLs of focused crawler based on user-interest ontology | |
CN105389329A (en) | Open source software recommendation method based on group comments | |
CN103530416A (en) | Project data forecasting grading library generating and project data pushing method and project data forecasting grading library generating and project data pushing system | |
Hu et al. | Hybrid recommendation algorithm based on latent factor model and PersonalRank | |
CN104391982A (en) | Information recommendation method and information recommendation system | |
Pandey et al. | Crawl ordering by search impact | |
Kang et al. | Learning to re-rank web search results with multiple pairwise features | |
CN106599304B (en) | Modular user retrieval intention modeling method for small and medium-sized websites | |
CN104794135A (en) | Method and device for carrying out sorting on search results | |
CN103914490A (en) | Webpage running method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |