CN104391982B - Information recommendation method and information recommendation system - Google Patents
Information recommendation method and information recommendation system Download PDFInfo
- Publication number
- CN104391982B CN104391982B CN201410746660.4A CN201410746660A CN104391982B CN 104391982 B CN104391982 B CN 104391982B CN 201410746660 A CN201410746660 A CN 201410746660A CN 104391982 B CN104391982 B CN 104391982B
- Authority
- CN
- China
- Prior art keywords
- user
- pagerank
- matrix
- vectors
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present invention provides a kind of information recommendation methods and a kind of information recommendation system, wherein the flow of information recommendation method includes:According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated;It will abut against matrix and be converted into hyperlink matrix;It is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix;According to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and record iterations;The vectors of the PageRank after iteration are exported in the way of from high to low;Wherein, the calculation formula of default PageRank model trainers is:Technical solution through the invention, to in former classical PageRank algorithms authority value mean allocation problem and the problem of only contemplating chain improved, so that iterative data in practical applications is faster, and it can more consider that different user has the technorati authority of different levels, to have higher search and recommendation quality in actual enterprise is recommended and searches for.
Description
Technical field
The present invention relates to technical field of data processing, are pushed away in particular to a kind of information recommendation method and a kind of information
Recommend system.
Background technology
Currently, the user behaviors log in user job circle includes many behavioural informations, include the interaction letter of user and user
It ceases, the interactive information of user and circle, but a large amount of behavioural information is in the initial state that do not excavate, it is intended that from row
It goes to improve search and recommendation quality to excavate related data in information.It is in the prior art search and recommendation mainly using with
Family behavior and query string participle and the matched integrated ordered mode of index.But recommendation in the prior art and search exist
Following two disadvantages:
First, it is main still using matched mode is indexed for the recommendation of the user of not behavioural information, still
Which do not account for group behavioural information or cannot to the user of " behavior often, authoritative with popularity, comparison " into
Row is recommended.
Second, although the data validity of enterprise is relatively high, redundancy is small, when search and the data volume recommended are compared
When big, user can be practised fraud in certain fields using the method for increasing the redundancies such as keyword, subsequently into index
, to cheat search system.
Therefore a kind of new technical solution is needed, the quality of user's recommendation can be promoted.
Invention content
The present invention is based on the above problem, it is proposed that a kind of new technical solution can promote the quality of user's recommendation.
In view of this, the embodiment of the first aspect of the present invention proposes a kind of information recommendation method, including:According to described
User behaviors log in the user behaviors log database of server generates adjacency matrix;Convert the adjacency matrix to hyperlink square
Battle array;It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix;According to institute
Initial parameter is stated, PageRank vectors are calculated by the default PageRank model trainers, and record iterations;According to
Mode from high to low exports the PageRank vectors after iteration;Wherein, the default PageRank model trainers
Calculation formula is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n user As' described for recommendation is useful
The sum at family, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate described
One user TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,
2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with
Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage Ti
It is directed to i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,
2 ..., n.The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, can
It can be with α probability follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of
And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs
Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier
It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are all users' of recommended user A
Sum, N are the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other
The total degree of user.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this
The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user
The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving
Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix
Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms
Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers
Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels,
To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved
It experiences at family.
According to one embodiment of present invention, the behavior day in the user behaviors log database according to the server
Will generates adjacency matrix, specifically includes:All recommendations are extracted from the user behaviors log in the user behaviors log database
Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node
Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight;The direct graph with weight is stored in
In the adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight
Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every
The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed
Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i
The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use
Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations
User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i
Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j)
Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain
Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time
Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve
The validity of PageRank algorithms.
According to one embodiment of present invention, described to convert the adjacency matrix to hyperlink matrix, it specifically includes:It will
The direct graph with weight is converted into the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is in the adjacency matrix
Any user recommend the total degrees of the other users, n is the total number of persons for being related to the recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn
Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, it is calculated by the default PageRank model trainers described
PageRank vectors, and after recording iterations, including:Judge whether the iterations are more than predetermined iterations threshold
Value, and judge whether the PageRank vectors exceed predetermined iteration precision with the former PageRank vectors;Work as judging result
All it is when being, to continue through the default PageRank model trainers and be iterated operation, otherwise, according to side from high to low
Formula exports the PageRank vectors after iteration.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate
The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly
Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and
There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, the initial parameter include iterative vectorized, random damping factor, it is described predetermined
Iteration precision and the predetermined iterations threshold value.
The embodiment of the second aspect of the present invention proposes a kind of information recommendation system, including:Information preprocessing unit, root
According to the user behaviors log in the user behaviors log database of the server, adjacency matrix is generated;Matrix conversion unit, by the adjoining
Matrix is converted into hyperlink matrix;Parameter selection unit is the default of the server according to the hyperlink matrix
PageRank model trainers choose initial parameter;Training unit passes through the default PageRank according to the initial parameter
Model trainer calculates PageRank vectors, and records iterations;Recommendation unit exports iteration in the way of from high to low
PageRank vectors afterwards;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n user As' described for recommendation is useful
The sum at family, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate described
One user TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,
2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with
Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage Ti
It is directed to i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,
2 ..., n.The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, can
It can be with α probability follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of
And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs
Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier
It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are all users' of recommended user A
Sum, N are the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other
The total degree of user.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this
The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user
The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving
Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix
Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms
Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers
Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels,
To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved
It experiences at family.
According to one embodiment of present invention, described information pretreatment unit includes:Direct graph with weight establishes unit, from institute
It states in the user behaviors log in user behaviors log database and extracts all recommendation informations, be section with each user of the server
Point establishes side, with the number of recommendation for the side to recommend the user of business as starting point, and using recommended user as terminal
Weights, establish a direct graph with weight;The direct graph with weight is stored in the adjacency matrix by storage unit.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight
Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every
The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed
Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i
The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use
Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations
User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i
Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j)
Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain
Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time
Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve
The validity of PageRank algorithms.
According to one embodiment of present invention, the matrix conversion unit is specifically used for:The direct graph with weight is converted
For the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is in the adjacency matrix
Any user recommend the total degrees of the other users, n is the total number of persons for being related to the recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn
Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, further include:Judging unit is completed after training, to sentence in the training unit
Whether the iterations that break are more than predetermined iterations threshold value, and judge the PageRank vectors and last iteration
Whether the PageRank vectors exceed predetermined iteration precision;And it when judging result is all to be, continues through described default
PageRank model trainers are iterated operation, otherwise, exported in the way of from high to low after iteration described in
PageRank vectors.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate
The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly
Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and
There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, the initial parameter include iterative vectorized, random damping factor, it is described predetermined
Iteration precision and the predetermined iterations threshold value.
Technical solution through the invention, to the authority value mean allocation problem and only in former classical PageRank algorithms
The problem of contemplating chain is improved, and is iterated by preset PageRank model trainers so that in practical application
In iterative data faster, and can more consider that different user has the technorati authority of different levels, in actual enterprise
Higher search and recommendation quality recommended and had in searching for, the working efficiency and user experience of system improved.
Description of the drawings
Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention;
Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention;
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
Specific implementation mode
To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real
Mode is applied the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application
Feature in example and embodiment can be combined with each other.
Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also
To be implemented different from other modes described here using other, therefore, protection scope of the present invention is not by described below
Specific embodiment limitation.
Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention.
As shown in Figure 1, information recommendation method according to an embodiment of the invention, including:
Step 102, according to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated.
Step 104, it will abut against matrix and be converted into hyperlink matrix.
Step 106, it is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix.
Step 108, according to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and remember
Record iterations.
Step 110, the vectors of the PageRank after iteration are exported in the way of from high to low;Wherein, PageRank is preset
The calculation formula of model trainer is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A,
N is the total number of persons for being related to recommendation behavior, TiIt is any user of recommended user A, C (Ti) indicate any user TiRecommend other
The total degree of user, PR (Ti) it is any user TiPageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with
Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage TiRefer to
To i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,2 ..., n.
The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, may be general with α
Rate follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of
And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs
Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier
It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A,
N is the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate that user Ti recommends other users
Total degree.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this
The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user
The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving
Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix
Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms
Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers
Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels,
To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved
It experiences at family.
According to one embodiment of present invention, step 102 specifically includes:In user behaviors log in subordinate act log database
All recommendation informations are extracted, using each user of server as node, to recommend the user of business as starting point, and with recommended
User be terminal establish side, using the number of recommendation as the weights on side, establish a direct graph with weight;Direct graph with weight is stored
In adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight
Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every
The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed
Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i
The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use
Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations
User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i
Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j)
Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain
Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time
Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve
The validity of PageRank algorithms.
According to one embodiment of present invention, step 104 specifically includes:Convert direct graph with weight to hyperlink matrix,
Wherein, the calculation formula converted is:
Wherein, H (i, j) is hyperlink matrix, and i is any user, and colSum (i) is that any user in adjacency matrix pushes away
The total degree of other users is recommended, n is the total number of persons for being related to recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn
Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, after step 108, including:Judge whether iterations are more than predetermined change
For frequency threshold value, and judge whether PageRank vectors exceed predetermined iteration precision with original PageRank vectors;Work as judging result
All it is when being, to continue through default PageRank model trainers and be iterated operation, it is otherwise, defeated in the way of from high to low
Go out the vectors of the PageRank after iteration.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate
The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly
Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and
There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, initial parameter includes iterative vectorized, random damping factor, predetermined iteration precision
With predetermined iterations threshold value.
Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention.
As shown in Fig. 2, information recommendation method according to another embodiment of the invention, includes the following steps:
Step 202, subordinate act log database obtains user behaviors log.
Step 204, the data of user behaviors log are pre-processed, then processed data assembling at chain matrice/
Connection table.
Step 206, treated, chain matrice/connection table is processed into hyperlink matrix.
Step 208, it is the default PageRank model trainers selection initial parameter of server, parameters include:Resistance
Buddhist nun's coefficient alpha, iteration precision eps, iteration threshold thresHold and initial vector V0.
Step 210, hyperlink matrix and each initial parameter are input in default PageRank model trainers.
Step 212, PageRank vector V are calculated by default PageRank model trainers, and records iterations
count。
Step 214, judge PageRank vector values whether within the scope of iteration precision or iterations whether beyond repeatedly
Whether generation number meets count=count+1, or whether meets V0=V, when judging result is to be, enter step 216,
Otherwise, return to step 210.
Step 216, the PageRank vector values V after iteration is exported in the way of from high to low.
The information recommendation method of one embodiment of the present of invention is described with reference to concrete application scene.
Extract on 07 10th, 2014 on 08 21st, 2014 user behaviors logs of building ring, using two kinds of improvement projects with
Classical PageRank models are compared, and as shown in table 1, improved model under the same conditions, can be received quickly
Hold back specified iteration precision, can more accurately select the user for having popularity, comparison authoritative carry out building ring recommendation or
Person searches for.In view of being related to user in building ring using recommendation behavior privacy, only enumerates in experimental situation close here
Selection in parameter and final iteration precision and number.
Table 1
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
As shown in figure 3, information recommendation system 300 according to an embodiment of the invention, including:Information preprocessing unit
302, according to the user behaviors log in the user behaviors log database of server, generate adjacency matrix;Matrix conversion unit 304, will be adjacent
It connects matrix and is converted into hyperlink matrix;Parameter selection unit 306 is the default PageRank of server according to hyperlink matrix
Model trainer chooses initial parameter;Training unit 308, according to initial parameter, by presetting PageRank model trainer meters
PageRank vectors are calculated, and record iterations;Recommendation unit 310, after exporting iteration in the way of from high to low
PageRank vectors;Wherein,
The calculation formula of default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A,
N is the total number of persons for being related to recommendation behavior, TiIt is any user of recommended user A, C (Ti) indicate any user TiRecommend other
The total degree of user, PR (Ti) it is any user TiPageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with
Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage TiRefer to
To i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,2 ..., n.
The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, may be general with α
Rate follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of
And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs
Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier
It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A,
N is the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other users
Total degree.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this
The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user
The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving
Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix
Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms
Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers
Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels,
To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved
It experiences at family.
According to one embodiment of present invention, information preprocessing unit 302 includes:Direct graph with weight establishes unit 3022,
All recommendation informations are extracted in user behaviors log in subordinate act log database, using each user of server as node, to push away
The user for recommending business is starting point, and establishes side as terminal using recommended user, using the number of recommendation as the weights on side, is established
One direct graph with weight;Storage unit 3024, direct graph with weight is stored in adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight
Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every
The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed
Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i
The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use
Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations
User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i
Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j)
Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain
Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time
Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve
The validity of PageRank algorithms.
According to one embodiment of present invention, matrix conversion unit 304 is specifically used for:Convert direct graph with weight to hyperlink
Connect matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is hyperlink matrix, and i is any user, and colSum (i) is that any user in adjacency matrix pushes away
The total degree of other users is recommended, n is the total number of persons for being related to recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn
Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, further include:Judging unit 312 is completed after training in training unit 308,
Judge whether iterations are more than predetermined iterations threshold value, and judges PageRank vectors and last iteration
Whether PageRank vectors exceed predetermined iteration precision;And when judging result is all to be, continue through default PageRank
Model trainer is iterated operation, otherwise, the vectors of the PageRank after iteration is exported in the way of from high to low.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate
The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly
Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and
There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, initial parameter includes iterative vectorized, random damping factor, predetermined iteration precision
With predetermined iterations threshold value.
Technical scheme of the present invention is described in detail above in association with attached drawing, technical solution through the invention, to former classical
PageRank algorithms in authority value mean allocation problem and the problem of only contemplating chain improved, by preset
PageRank model trainers are iterated so that iterative data in practical applications faster, and can more consider difference
User has the technorati authority of different levels, to have higher search and recommendation matter in actual enterprise is recommended and searches for
Amount, improves the working efficiency and user experience of system.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.
Claims (8)
1. a kind of information recommendation method, which is characterized in that including:
According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated;
Convert the adjacency matrix to hyperlink matrix;
It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix;
According to the initial parameter, PageRank vectors are calculated by the default PageRank model trainers, and record and change
Generation number;
The vectors of the PageRank after iteration are exported in the way of from high to low;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n is all users for recommending the user A
Sum, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate any use
Family TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,2 ...,
n;
User behaviors log in the user behaviors log database according to server generates adjacency matrix, specifically includes:
All recommendation informations are extracted from the user behaviors log in the user behaviors log database, with each of described server
User is node, side is established as terminal to recommend the user of business as starting point, and using recommended user, with the number of recommendation
For the weights on the side, a direct graph with weight is established;
The direct graph with weight is stored in the adjacency matrix.
2. information recommendation method according to claim 1, which is characterized in that described to convert the adjacency matrix to hyperlink
Matrix is connect, is specifically included:
Convert the direct graph with weight to the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is the institute in the adjacency matrix
The total degree that any user recommends the other users is stated, n is the total number of persons for being related to the recommendation behavior.
3. information recommendation method according to claim 1 or 2, which is characterized in that described by described default
PageRank model trainers calculate PageRank vectors, and after recording iterations, including:
Judge whether the iterations are more than predetermined iterations threshold value, and judges described in the PageRank vectors and original
Whether PageRank vectors exceed predetermined iteration precision;
When judging result is all to be, continues through the default PageRank model trainers and be iterated operation, otherwise, press
The vectors of the PageRank after iteration are exported according to mode from high to low.
4. information recommendation method according to claim 3, which is characterized in that the initial parameter include it is iterative vectorized, with
Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.
5. a kind of information recommendation system, which is characterized in that including:
Information preprocessing unit generates adjacency matrix according to the user behaviors log in the user behaviors log database of server;
Matrix conversion unit converts the adjacency matrix to hyperlink matrix;
Parameter selection unit is that the default PageRank model trainers of the server are chosen according to the hyperlink matrix
Initial parameter;
Training unit calculates PageRank vectors according to the initial parameter by the default PageRank model trainers,
And record iterations;
Recommendation unit exports the vectors of the PageRank after iteration in the way of from high to low;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n is all users for recommending the user A
Sum, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate any use
Family TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,2 ...,
n;
Described information pretreatment unit includes:
Direct graph with weight establishes unit, and all recommendations are extracted from the user behaviors log in the user behaviors log database
Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node
Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight;
The direct graph with weight is stored in the adjacency matrix by storage unit.
6. information recommendation system according to claim 5, which is characterized in that the matrix conversion unit is specifically used for:
Convert the direct graph with weight to the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is the institute in the adjacency matrix
The total degree that any user recommends the other users is stated, n is the total number of persons for being related to the recommendation behavior.
7. information recommendation system according to claim 5 or 6, which is characterized in that further include:
Judging unit is completed after training, to judge whether the iterations are more than predetermined iterations in the training unit
Threshold value, and judge whether the PageRank vectors and the PageRank vectors of last iteration are smart beyond predetermined iteration
Degree;And when judging result is all to be, continues through the default PageRank model trainers and be iterated operation, it is no
Then, the vectors of the PageRank after iteration are exported in the way of from high to low.
8. information recommendation system according to claim 7, which is characterized in that the initial parameter include it is iterative vectorized, with
Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410746660.4A CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410746660.4A CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391982A CN104391982A (en) | 2015-03-04 |
CN104391982B true CN104391982B (en) | 2018-07-20 |
Family
ID=52609886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410746660.4A Active CN104391982B (en) | 2014-12-08 | 2014-12-08 | Information recommendation method and information recommendation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391982B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106991592B (en) * | 2017-03-22 | 2021-01-01 | 南京财经大学 | Personalized recommendation method based on purchasing user behavior analysis |
CN108536590A (en) * | 2018-02-09 | 2018-09-14 | 武汉楚鼎信息技术有限公司 | A kind of method and system device of system service significance level grading |
US10460359B1 (en) * | 2019-03-28 | 2019-10-29 | Coupang, Corp. | Computer-implemented method for arranging hyperlinks on a graphical user-interface |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102270246A (en) * | 2011-09-08 | 2011-12-07 | 胡辉 | Method for calculating importance of web page |
CN102799671A (en) * | 2012-07-17 | 2012-11-28 | 西安电子科技大学 | Network individual recommendation method based on PageRank algorithm |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040193698A1 (en) * | 2003-03-24 | 2004-09-30 | Sadasivuni Lakshminarayana | Method for finding convergence of ranking of web page |
-
2014
- 2014-12-08 CN CN201410746660.4A patent/CN104391982B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102270246A (en) * | 2011-09-08 | 2011-12-07 | 胡辉 | Method for calculating importance of web page |
CN102799671A (en) * | 2012-07-17 | 2012-11-28 | 西安电子科技大学 | Network individual recommendation method based on PageRank algorithm |
Non-Patent Citations (2)
Title |
---|
PageRank的改进算法——调整阻尼因子;邵晶晶等;《应用数学 增刊》;20081231;正文第58页-第59页 * |
基于海量数据挖掘的个性化推荐系统;郭晔等;《西北大学学报》;20061231;第36卷(第6期);正文第899页-第901页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104391982A (en) | 2015-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2805391C (en) | Determining relevant information for domains of interest | |
CN104199965B (en) | Semantic information retrieval method | |
US7895195B2 (en) | Method and apparatus for constructing a link structure between documents | |
CN103186574B (en) | A kind of generation method and apparatus of Search Results | |
CN105183781B (en) | Information recommendation method and device | |
US20130066887A1 (en) | Determining relevant information for domains of interest | |
CN106776881A (en) | A kind of realm information commending system and method based on microblog | |
CN105809473B (en) | Training method for matching model parameters, service recommendation method and corresponding device | |
CN109189990B (en) | Search word generation method and device and electronic equipment | |
CN105975459B (en) | A kind of the weight mask method and device of lexical item | |
CN103049470A (en) | Opinion retrieval method based on emotional relevancy | |
CN112559895B (en) | Data processing method and device, electronic equipment and storage medium | |
CN105389329A (en) | Open source software recommendation method based on group comments | |
Bouadjenek et al. | Using social annotations to enhance document representation for personalized search | |
CN111639247A (en) | Method, apparatus, device and computer-readable storage medium for evaluating quality of review | |
CN104391982B (en) | Information recommendation method and information recommendation system | |
CN106407316B (en) | Software question and answer recommendation method and device based on topic model | |
CN105468649A (en) | Method and apparatus for determining matching of to-be-displayed object | |
CN110110218B (en) | Identity association method and terminal | |
Zhang et al. | An ensemble method for job recommender systems | |
CN108153735B (en) | Method and system for acquiring similar meaning words | |
Sajeev et al. | Effective web personalization system based on time and semantic relatedness | |
CN103914490B (en) | Webpage operation method and system | |
CN105069034A (en) | Recommendation information generation method and apparatus | |
CN107766419A (en) | A kind of TextRank file summarization methods and device based on threshold denoising |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |