CN104391982B - Information recommendation method and information recommendation system - Google Patents

Information recommendation method and information recommendation system Download PDF

Info

Publication number
CN104391982B
CN104391982B CN201410746660.4A CN201410746660A CN104391982B CN 104391982 B CN104391982 B CN 104391982B CN 201410746660 A CN201410746660 A CN 201410746660A CN 104391982 B CN104391982 B CN 104391982B
Authority
CN
China
Prior art keywords
user
pagerank
matrix
vectors
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410746660.4A
Other languages
Chinese (zh)
Other versions
CN104391982A (en
Inventor
黄通文
张俊林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHANJET INFORMATION TECHNOLOGY Co Ltd
Original Assignee
CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHANJET INFORMATION TECHNOLOGY Co Ltd filed Critical CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410746660.4A priority Critical patent/CN104391982B/en
Publication of CN104391982A publication Critical patent/CN104391982A/en
Application granted granted Critical
Publication of CN104391982B publication Critical patent/CN104391982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention provides a kind of information recommendation methods and a kind of information recommendation system, wherein the flow of information recommendation method includes:According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated;It will abut against matrix and be converted into hyperlink matrix;It is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix;According to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and record iterations;The vectors of the PageRank after iteration are exported in the way of from high to low;Wherein, the calculation formula of default PageRank model trainers is:Technical solution through the invention, to in former classical PageRank algorithms authority value mean allocation problem and the problem of only contemplating chain improved, so that iterative data in practical applications is faster, and it can more consider that different user has the technorati authority of different levels, to have higher search and recommendation quality in actual enterprise is recommended and searches for.

Description

Information recommendation method and information recommendation system
Technical field
The present invention relates to technical field of data processing, are pushed away in particular to a kind of information recommendation method and a kind of information Recommend system.
Background technology
Currently, the user behaviors log in user job circle includes many behavioural informations, include the interaction letter of user and user It ceases, the interactive information of user and circle, but a large amount of behavioural information is in the initial state that do not excavate, it is intended that from row It goes to improve search and recommendation quality to excavate related data in information.It is in the prior art search and recommendation mainly using with Family behavior and query string participle and the matched integrated ordered mode of index.But recommendation in the prior art and search exist Following two disadvantages:
First, it is main still using matched mode is indexed for the recommendation of the user of not behavioural information, still Which do not account for group behavioural information or cannot to the user of " behavior often, authoritative with popularity, comparison " into Row is recommended.
Second, although the data validity of enterprise is relatively high, redundancy is small, when search and the data volume recommended are compared When big, user can be practised fraud in certain fields using the method for increasing the redundancies such as keyword, subsequently into index , to cheat search system.
Therefore a kind of new technical solution is needed, the quality of user's recommendation can be promoted.
Invention content
The present invention is based on the above problem, it is proposed that a kind of new technical solution can promote the quality of user's recommendation.
In view of this, the embodiment of the first aspect of the present invention proposes a kind of information recommendation method, including:According to described User behaviors log in the user behaviors log database of server generates adjacency matrix;Convert the adjacency matrix to hyperlink square Battle array;It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix;According to institute Initial parameter is stated, PageRank vectors are calculated by the default PageRank model trainers, and record iterations;According to Mode from high to low exports the PageRank vectors after iteration;Wherein, the default PageRank model trainers Calculation formula is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n user As' described for recommendation is useful The sum at family, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate described One user TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1, 2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage Ti It is directed to i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1, 2 ..., n.The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, can It can be with α probability follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are all users' of recommended user A Sum, N are the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other The total degree of user.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels, To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved It experiences at family.
According to one embodiment of present invention, the behavior day in the user behaviors log database according to the server Will generates adjacency matrix, specifically includes:All recommendations are extracted from the user behaviors log in the user behaviors log database Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight;The direct graph with weight is stored in In the adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j) Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve The validity of PageRank algorithms.
According to one embodiment of present invention, described to convert the adjacency matrix to hyperlink matrix, it specifically includes:It will The direct graph with weight is converted into the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is in the adjacency matrix Any user recommend the total degrees of the other users, n is the total number of persons for being related to the recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, it is calculated by the default PageRank model trainers described PageRank vectors, and after recording iterations, including:Judge whether the iterations are more than predetermined iterations threshold Value, and judge whether the PageRank vectors exceed predetermined iteration precision with the former PageRank vectors;Work as judging result All it is when being, to continue through the default PageRank model trainers and be iterated operation, otherwise, according to side from high to low Formula exports the PageRank vectors after iteration.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, the initial parameter include iterative vectorized, random damping factor, it is described predetermined Iteration precision and the predetermined iterations threshold value.
The embodiment of the second aspect of the present invention proposes a kind of information recommendation system, including:Information preprocessing unit, root According to the user behaviors log in the user behaviors log database of the server, adjacency matrix is generated;Matrix conversion unit, by the adjoining Matrix is converted into hyperlink matrix;Parameter selection unit is the default of the server according to the hyperlink matrix PageRank model trainers choose initial parameter;Training unit passes through the default PageRank according to the initial parameter Model trainer calculates PageRank vectors, and records iterations;Recommendation unit exports iteration in the way of from high to low PageRank vectors afterwards;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n user As' described for recommendation is useful The sum at family, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate described One user TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1, 2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage Ti It is directed to i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1, 2 ..., n.The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, can It can be with α probability follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are all users' of recommended user A Sum, N are the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other The total degree of user.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels, To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved It experiences at family.
According to one embodiment of present invention, described information pretreatment unit includes:Direct graph with weight establishes unit, from institute It states in the user behaviors log in user behaviors log database and extracts all recommendation informations, be section with each user of the server Point establishes side, with the number of recommendation for the side to recommend the user of business as starting point, and using recommended user as terminal Weights, establish a direct graph with weight;The direct graph with weight is stored in the adjacency matrix by storage unit.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j) Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve The validity of PageRank algorithms.
According to one embodiment of present invention, the matrix conversion unit is specifically used for:The direct graph with weight is converted For the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is in the adjacency matrix Any user recommend the total degrees of the other users, n is the total number of persons for being related to the recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, further include:Judging unit is completed after training, to sentence in the training unit Whether the iterations that break are more than predetermined iterations threshold value, and judge the PageRank vectors and last iteration Whether the PageRank vectors exceed predetermined iteration precision;And it when judging result is all to be, continues through described default PageRank model trainers are iterated operation, otherwise, exported in the way of from high to low after iteration described in PageRank vectors.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, the initial parameter include iterative vectorized, random damping factor, it is described predetermined Iteration precision and the predetermined iterations threshold value.
Technical solution through the invention, to the authority value mean allocation problem and only in former classical PageRank algorithms The problem of contemplating chain is improved, and is iterated by preset PageRank model trainers so that in practical application In iterative data faster, and can more consider that different user has the technorati authority of different levels, in actual enterprise Higher search and recommendation quality recommended and had in searching for, the working efficiency and user experience of system improved.
Description of the drawings
Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention;
Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention;
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
Specific implementation mode
To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real Mode is applied the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application Feature in example and embodiment can be combined with each other.
Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also To be implemented different from other modes described here using other, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.
Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention.
As shown in Figure 1, information recommendation method according to an embodiment of the invention, including:
Step 102, according to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated.
Step 104, it will abut against matrix and be converted into hyperlink matrix.
Step 106, it is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix.
Step 108, according to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and remember Record iterations.
Step 110, the vectors of the PageRank after iteration are exported in the way of from high to low;Wherein, PageRank is preset The calculation formula of model trainer is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, TiIt is any user of recommended user A, C (Ti) indicate any user TiRecommend other The total degree of user, PR (Ti) it is any user TiPageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage TiRefer to To i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,2 ..., n. The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, may be general with α Rate follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate that user Ti recommends other users Total degree.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels, To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved It experiences at family.
According to one embodiment of present invention, step 102 specifically includes:In user behaviors log in subordinate act log database All recommendation informations are extracted, using each user of server as node, to recommend the user of business as starting point, and with recommended User be terminal establish side, using the number of recommendation as the weights on side, establish a direct graph with weight;Direct graph with weight is stored In adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j) Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve The validity of PageRank algorithms.
According to one embodiment of present invention, step 104 specifically includes:Convert direct graph with weight to hyperlink matrix, Wherein, the calculation formula converted is:
Wherein, H (i, j) is hyperlink matrix, and i is any user, and colSum (i) is that any user in adjacency matrix pushes away The total degree of other users is recommended, n is the total number of persons for being related to recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, after step 108, including:Judge whether iterations are more than predetermined change For frequency threshold value, and judge whether PageRank vectors exceed predetermined iteration precision with original PageRank vectors;Work as judging result All it is when being, to continue through default PageRank model trainers and be iterated operation, it is otherwise, defeated in the way of from high to low Go out the vectors of the PageRank after iteration.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, initial parameter includes iterative vectorized, random damping factor, predetermined iteration precision With predetermined iterations threshold value.
Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention.
As shown in Fig. 2, information recommendation method according to another embodiment of the invention, includes the following steps:
Step 202, subordinate act log database obtains user behaviors log.
Step 204, the data of user behaviors log are pre-processed, then processed data assembling at chain matrice/ Connection table.
Step 206, treated, chain matrice/connection table is processed into hyperlink matrix.
Step 208, it is the default PageRank model trainers selection initial parameter of server, parameters include:Resistance Buddhist nun's coefficient alpha, iteration precision eps, iteration threshold thresHold and initial vector V0.
Step 210, hyperlink matrix and each initial parameter are input in default PageRank model trainers.
Step 212, PageRank vector V are calculated by default PageRank model trainers, and records iterations count。
Step 214, judge PageRank vector values whether within the scope of iteration precision or iterations whether beyond repeatedly Whether generation number meets count=count+1, or whether meets V0=V, when judging result is to be, enter step 216, Otherwise, return to step 210.
Step 216, the PageRank vector values V after iteration is exported in the way of from high to low.
The information recommendation method of one embodiment of the present of invention is described with reference to concrete application scene.
Extract on 07 10th, 2014 on 08 21st, 2014 user behaviors logs of building ring, using two kinds of improvement projects with Classical PageRank models are compared, and as shown in table 1, improved model under the same conditions, can be received quickly Hold back specified iteration precision, can more accurately select the user for having popularity, comparison authoritative carry out building ring recommendation or Person searches for.In view of being related to user in building ring using recommendation behavior privacy, only enumerates in experimental situation close here Selection in parameter and final iteration precision and number.
Table 1
Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.
As shown in figure 3, information recommendation system 300 according to an embodiment of the invention, including:Information preprocessing unit 302, according to the user behaviors log in the user behaviors log database of server, generate adjacency matrix;Matrix conversion unit 304, will be adjacent It connects matrix and is converted into hyperlink matrix;Parameter selection unit 306 is the default PageRank of server according to hyperlink matrix Model trainer chooses initial parameter;Training unit 308, according to initial parameter, by presetting PageRank model trainer meters PageRank vectors are calculated, and record iterations;Recommendation unit 310, after exporting iteration in the way of from high to low PageRank vectors;Wherein,
The calculation formula of default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, TiIt is any user of recommended user A, C (Ti) indicate any user TiRecommend other The total degree of user, PR (Ti) it is any user TiPageRank vector, i=1,2 ..., n.
In the prior art, Google is once in the paper published, mention its classical PageRank model be with Lower form:
Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage TiRefer to To i-th of source page (chain enters the page) of webpage A, C (Ti) it is webpage TiChain page-out out-degree sum, i=1,2 ..., n. The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, may be general with α Rate follows links browsing pages.
In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of And for only consideration out-degree problem (C (Ti)) each AT user improved the problem of assigning equal weight.
Wherein, as follows to the improvement of authority value α:
For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier It is recommended, so, can be by classical PageRank model refinements:
Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, user TiIt is the user of recommended user A, C (Ti) indicate user TiRecommend other users Total degree.
In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further:
Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving Different user of having determined distributes the problem of different authority values.
Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels, To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved It experiences at family.
According to one embodiment of present invention, information preprocessing unit 302 includes:Direct graph with weight establishes unit 3022, All recommendation informations are extracted in user behaviors log in subordinate act log database, using each user of server as node, to push away The user for recommending business is starting point, and establishes side as terminal using recommended user, using the number of recommendation as the weights on side, is established One direct graph with weight;Storage unit 3024, direct graph with weight is stored in adjacency matrix.
In the technical scheme, to only considering out-degree problem (C (Ti)) each AT user assign asking for equal weight Topic is improved.
In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into accounti), wherein it is every The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed Into can obtain following chain and enter chain going out model:
W (j, i)=Win(j,i)*Wout(j,i)
W (j, i)=Win(j,i)*Wout(j,i)
Wherein Win(j, i), Wout(j, i) is defined as follows:
Wherein, NiIt is the set (i.e. chain goes out user's set) for all users that user i recommends, BiIt is directed to all of user i The set (chain access customer set) of user.Above-mentioned model has description below:There is the user i of popularity for a comparison, use Family j belongs to the chain access customer set B of user ii, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations User and there are all users of linking relationship related to user i, i.e.,
W (j, i)=Win(j,i)·Wout(j,i)
Wherein, Win(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i Recommend the value I of other users linkiRecommend other users link with user k (k belongs to the set of all users of recommended user j) Value IkIt determines, i.e.,
Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve The validity of PageRank algorithms.
According to one embodiment of present invention, matrix conversion unit 304 is specifically used for:Convert direct graph with weight to hyperlink Connect matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is hyperlink matrix, and i is any user, and colSum (i) is that any user in adjacency matrix pushes away The total degree of other users is recommended, n is the total number of persons for being related to recommendation behavior.
In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn Hyperlink matrix after change carries out PageRank model trainings.
According to one embodiment of present invention, further include:Judging unit 312 is completed after training in training unit 308, Judge whether iterations are more than predetermined iterations threshold value, and judges PageRank vectors and last iteration Whether PageRank vectors exceed predetermined iteration precision;And when judging result is all to be, continue through default PageRank Model trainer is iterated operation, otherwise, the vectors of the PageRank after iteration is exported in the way of from high to low.
In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.
According to one embodiment of present invention, initial parameter includes iterative vectorized, random damping factor, predetermined iteration precision With predetermined iterations threshold value.
Technical scheme of the present invention is described in detail above in association with attached drawing, technical solution through the invention, to former classical PageRank algorithms in authority value mean allocation problem and the problem of only contemplating chain improved, by preset PageRank model trainers are iterated so that iterative data in practical applications faster, and can more consider difference User has the technorati authority of different levels, to have higher search and recommendation matter in actual enterprise is recommended and searches for Amount, improves the working efficiency and user experience of system.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of information recommendation method, which is characterized in that including:
According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated;
Convert the adjacency matrix to hyperlink matrix;
It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix;
According to the initial parameter, PageRank vectors are calculated by the default PageRank model trainers, and record and change Generation number;
The vectors of the PageRank after iteration are exported in the way of from high to low;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n is all users for recommending the user A Sum, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate any use Family TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,2 ..., n;
User behaviors log in the user behaviors log database according to server generates adjacency matrix, specifically includes:
All recommendation informations are extracted from the user behaviors log in the user behaviors log database, with each of described server User is node, side is established as terminal to recommend the user of business as starting point, and using recommended user, with the number of recommendation For the weights on the side, a direct graph with weight is established;
The direct graph with weight is stored in the adjacency matrix.
2. information recommendation method according to claim 1, which is characterized in that described to convert the adjacency matrix to hyperlink Matrix is connect, is specifically included:
Convert the direct graph with weight to the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is the institute in the adjacency matrix The total degree that any user recommends the other users is stated, n is the total number of persons for being related to the recommendation behavior.
3. information recommendation method according to claim 1 or 2, which is characterized in that described by described default PageRank model trainers calculate PageRank vectors, and after recording iterations, including:
Judge whether the iterations are more than predetermined iterations threshold value, and judges described in the PageRank vectors and original Whether PageRank vectors exceed predetermined iteration precision;
When judging result is all to be, continues through the default PageRank model trainers and be iterated operation, otherwise, press The vectors of the PageRank after iteration are exported according to mode from high to low.
4. information recommendation method according to claim 3, which is characterized in that the initial parameter include it is iterative vectorized, with Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.
5. a kind of information recommendation system, which is characterized in that including:
Information preprocessing unit generates adjacency matrix according to the user behaviors log in the user behaviors log database of server;
Matrix conversion unit converts the adjacency matrix to hyperlink matrix;
Parameter selection unit is that the default PageRank model trainers of the server are chosen according to the hyperlink matrix Initial parameter;
Training unit calculates PageRank vectors according to the initial parameter by the default PageRank model trainers, And record iterations;
Recommendation unit exports the vectors of the PageRank after iteration in the way of from high to low;Wherein,
The calculation formula of the default PageRank model trainers is:
Wherein, the PageRank vectors that PR (A) is recommended user A, n is all users for recommending the user A Sum, N are the total number of persons for being related to recommendation behavior, TiIt is any user for recommending the user A, C (Ti) indicate any use Family TiRecommend the total degree of other users, PR (Ti) it is any user TiThe PageRank vector, i=1,2 ..., n;
Described information pretreatment unit includes:
Direct graph with weight establishes unit, and all recommendations are extracted from the user behaviors log in the user behaviors log database Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight;
The direct graph with weight is stored in the adjacency matrix by storage unit.
6. information recommendation system according to claim 5, which is characterized in that the matrix conversion unit is specifically used for:
Convert the direct graph with weight to the hyperlink matrix, wherein the calculation formula converted is:
Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is the institute in the adjacency matrix The total degree that any user recommends the other users is stated, n is the total number of persons for being related to the recommendation behavior.
7. information recommendation system according to claim 5 or 6, which is characterized in that further include:
Judging unit is completed after training, to judge whether the iterations are more than predetermined iterations in the training unit Threshold value, and judge whether the PageRank vectors and the PageRank vectors of last iteration are smart beyond predetermined iteration Degree;And when judging result is all to be, continues through the default PageRank model trainers and be iterated operation, it is no Then, the vectors of the PageRank after iteration are exported in the way of from high to low.
8. information recommendation system according to claim 7, which is characterized in that the initial parameter include it is iterative vectorized, with Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.
CN201410746660.4A 2014-12-08 2014-12-08 Information recommendation method and information recommendation system Active CN104391982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410746660.4A CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410746660.4A CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Publications (2)

Publication Number Publication Date
CN104391982A CN104391982A (en) 2015-03-04
CN104391982B true CN104391982B (en) 2018-07-20

Family

ID=52609886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410746660.4A Active CN104391982B (en) 2014-12-08 2014-12-08 Information recommendation method and information recommendation system

Country Status (1)

Country Link
CN (1) CN104391982B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106991592B (en) * 2017-03-22 2021-01-01 南京财经大学 Personalized recommendation method based on purchasing user behavior analysis
CN108536590A (en) * 2018-02-09 2018-09-14 武汉楚鼎信息技术有限公司 A kind of method and system device of system service significance level grading
US10460359B1 (en) * 2019-03-28 2019-10-29 Coupang, Corp. Computer-implemented method for arranging hyperlinks on a graphical user-interface

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270246A (en) * 2011-09-08 2011-12-07 胡辉 Method for calculating importance of web page
CN102799671A (en) * 2012-07-17 2012-11-28 西安电子科技大学 Network individual recommendation method based on PageRank algorithm

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040193698A1 (en) * 2003-03-24 2004-09-30 Sadasivuni Lakshminarayana Method for finding convergence of ranking of web page

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270246A (en) * 2011-09-08 2011-12-07 胡辉 Method for calculating importance of web page
CN102799671A (en) * 2012-07-17 2012-11-28 西安电子科技大学 Network individual recommendation method based on PageRank algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PageRank的改进算法——调整阻尼因子;邵晶晶等;《应用数学 增刊》;20081231;正文第58页-第59页 *
基于海量数据挖掘的个性化推荐系统;郭晔等;《西北大学学报》;20061231;第36卷(第6期);正文第899页-第901页 *

Also Published As

Publication number Publication date
CN104391982A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CA2805391C (en) Determining relevant information for domains of interest
CN104199965B (en) Semantic information retrieval method
US7895195B2 (en) Method and apparatus for constructing a link structure between documents
CN103186574B (en) A kind of generation method and apparatus of Search Results
CN105183781B (en) Information recommendation method and device
US20130066887A1 (en) Determining relevant information for domains of interest
CN106776881A (en) A kind of realm information commending system and method based on microblog
CN105809473B (en) Training method for matching model parameters, service recommendation method and corresponding device
CN109189990B (en) Search word generation method and device and electronic equipment
CN105975459B (en) A kind of the weight mask method and device of lexical item
CN103049470A (en) Opinion retrieval method based on emotional relevancy
CN112559895B (en) Data processing method and device, electronic equipment and storage medium
CN105389329A (en) Open source software recommendation method based on group comments
Bouadjenek et al. Using social annotations to enhance document representation for personalized search
CN111639247A (en) Method, apparatus, device and computer-readable storage medium for evaluating quality of review
CN104391982B (en) Information recommendation method and information recommendation system
CN106407316B (en) Software question and answer recommendation method and device based on topic model
CN105468649A (en) Method and apparatus for determining matching of to-be-displayed object
CN110110218B (en) Identity association method and terminal
Zhang et al. An ensemble method for job recommender systems
CN108153735B (en) Method and system for acquiring similar meaning words
Sajeev et al. Effective web personalization system based on time and semantic relatedness
CN103914490B (en) Webpage operation method and system
CN105069034A (en) Recommendation information generation method and apparatus
CN107766419A (en) A kind of TextRank file summarization methods and device based on threshold denoising

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant