CN104391982B

CN104391982B - Information recommendation method and information recommendation system

Info

Publication number: CN104391982B
Application number: CN201410746660.4A
Authority: CN
Inventors: 黄通文; 张俊林
Original assignee: CHANJET INFORMATION TECHNOLOGY Co Ltd
Current assignee: CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority date: 2014-12-08
Filing date: 2014-12-08
Publication date: 2018-07-20
Anticipated expiration: 2034-12-08
Also published as: CN104391982A

Abstract

The present invention provides a kind of information recommendation methods and a kind of information recommendation system, wherein the flow of information recommendation method includes：According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated；It will abut against matrix and be converted into hyperlink matrix；It is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix；According to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and record iterations；The vectors of the PageRank after iteration are exported in the way of from high to low；Wherein, the calculation formula of default PageRank model trainers is：Technical solution through the invention, to in former classical PageRank algorithms authority value mean allocation problem and the problem of only contemplating chain improved, so that iterative data in practical applications is faster, and it can more consider that different user has the technorati authority of different levels, to have higher search and recommendation quality in actual enterprise is recommended and searches for.

Description

Information recommendation method and information recommendation system

Technical field

The present invention relates to technical field of data processing, are pushed away in particular to a kind of information recommendation method and a kind of information Recommend system.

Background technology

Currently, the user behaviors log in user job circle includes many behavioural informations, include the interaction letter of user and user It ceases, the interactive information of user and circle, but a large amount of behavioural information is in the initial state that do not excavate, it is intended that from row It goes to improve search and recommendation quality to excavate related data in information.It is in the prior art search and recommendation mainly using with Family behavior and query string participle and the matched integrated ordered mode of index.But recommendation in the prior art and search exist Following two disadvantages：

First, it is main still using matched mode is indexed for the recommendation of the user of not behavioural information, still Which do not account for group behavioural information or cannot to the user of " behavior often, authoritative with popularity, comparison " into Row is recommended.

Second, although the data validity of enterprise is relatively high, redundancy is small, when search and the data volume recommended are compared When big, user can be practised fraud in certain fields using the method for increasing the redundancies such as keyword, subsequently into index , to cheat search system.

Therefore a kind of new technical solution is needed, the quality of user's recommendation can be promoted.

Invention content

The present invention is based on the above problem, it is proposed that a kind of new technical solution can promote the quality of user's recommendation.

In view of this, the embodiment of the first aspect of the present invention proposes a kind of information recommendation method, including：According to described User behaviors log in the user behaviors log database of server generates adjacency matrix；Convert the adjacency matrix to hyperlink square Battle array；It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix；According to institute Initial parameter is stated, PageRank vectors are calculated by the default PageRank model trainers, and record iterations；According to Mode from high to low exports the PageRank vectors after iteration；Wherein, the default PageRank model trainers Calculation formula is：

Wherein, the PageRank vectors that PR (A) is recommended user A, n user As' described for recommendation is useful The sum at family, N are the total number of persons for being related to recommendation behavior, T_iIt is any user for recommending the user A, C (T_i) indicate described One user T_iRecommend the total degree of other users, PR (T_i) it is any user T_iThe PageRank vector, i=1, 2 ..., n.

In the prior art, Google is once in the paper published, mention its classical PageRank model be with Lower form：

Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage T_i It is directed to i-th of source page (chain enters the page) of webpage A, C (T_i) it is webpage T_iChain page-out out-degree sum, i=1, 2 ..., n.The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, can It can be with α probability follows links browsing pages.

In the inventive solutions, for the above-mentioned prior art in random damping factor α mean allocation the problem of And for only consideration out-degree problem (C (T_i)) each AT user improved the problem of assigning equal weight.

Wherein, as follows to the improvement of authority value α：

For the mean allocation problem of authority value, for different webpages, its random damping factor (damped coefficient) differs Sample, for example, behavior often, reputation lower user more less than those recommendations with the authoritative user of popularity, comparison be easier It is recommended, so, can be by classical PageRank model refinements：

Wherein, the PageRank vectors that PR (A) is recommended user A, n are all users' of recommended user A Sum, N are the total number of persons for being related to recommendation behavior, user T_iIt is the user of recommended user A, C (T_i) indicate user T_iRecommend other The total degree of user.

In this way, random damping factor is become being directed to the continually changing value of different levels user from a fixed value.But this The improved model of sample also brings certain problem, does not meet random surfer model when Google proposes algorithm so that user The size that random damping factor cannot be removed artificially to control, thus it is possible to be by the model refinement further：

Above-mentioned improvement is added to random damping factor, adds PageRank and proposes the factor of random surfer, while also solving Different user of having determined distributes the problem of different authority values.

Therefore, in the technical scheme, switch to hyperlink square by being adjacency matrix by data prediction, will abut against matrix Battle array chooses initial parameter, establishes PageRank model trainers, to the authority in former classical PageRank algorithms Value mean allocation problem and the problem of only contemplating chain, are improved, and are changed by preset PageRank model trainers Generation so that iterative data in practical applications faster, and can more consider that different user has the technorati authority of different levels, To have higher search and recommendation quality in actual enterprise is recommended and searches for, the working efficiency and use of system are improved It experiences at family.

According to one embodiment of present invention, the behavior day in the user behaviors log database according to the server Will generates adjacency matrix, specifically includes：All recommendations are extracted from the user behaviors log in the user behaviors log database Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight；The direct graph with weight is stored in In the adjacency matrix.

In the technical scheme, to only considering out-degree problem (C (T_i)) each AT user assign asking for equal weight Topic is improved.

In the classical model that Google is proposed, the influence for the out-degree that chain goes out, i.e. C (T have been only taken into account_i), wherein it is every The out-degree of a user recommended by user A imparts identical weight.In view of the influence of out-degree, classical model is changed Into can obtain following chain and enter chain going out model：

W (j, i)=W_in(j,i)*W_out(j,i)

Wherein W_in(j, i), W_out(j, i) is defined as follows：

Wherein, N_iIt is the set (i.e. chain goes out user's set) for all users that user i recommends, B_iIt is directed to all of user i The set (chain access customer set) of user.Above-mentioned model has description below：There is the user i of popularity for a comparison, use Family j belongs to the chain access customer set B of user i_i, then the weight w (j, i) for linking link (j, i) should be all with user's j recommendations User and there are all users of linking relationship related to user i, i.e.,

W (j, i)=W_in(j,i)·W_out(j,i)

Wherein, W_in(j, i) refers to the associated weight that related user recommends other users to link with Link (j, i), by user i Recommend the value I of other users link_iRecommend other users link with user k (k belongs to the set of all users of recommended user j) Value I_kIt determines, i.e.,

Go out model and solve in digraph to recommend to impart identical weight every time so the chain that the application proposes enters chain Problem, improved algorithm can be according to each calculating with some recommendation behaviors of the associated user of recommendation, to recommend every time Assign different weights, in this way, will just be recommended often, the user of high quality be combined with PageRank algorithms, improve The validity of PageRank algorithms.

According to one embodiment of present invention, described to convert the adjacency matrix to hyperlink matrix, it specifically includes：It will The direct graph with weight is converted into the hyperlink matrix, wherein the calculation formula converted is：

Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is in the adjacency matrix Any user recommend the total degrees of the other users, n is the total number of persons for being related to the recommendation behavior.

In the technical scheme, will abut against matrix and be converted into hyperlink matrix, convenient for further according to initial parameter and turn Hyperlink matrix after change carries out PageRank model trainings.

According to one embodiment of present invention, it is calculated by the default PageRank model trainers described PageRank vectors, and after recording iterations, including：Judge whether the iterations are more than predetermined iterations threshold Value, and judge whether the PageRank vectors exceed predetermined iteration precision with the former PageRank vectors；Work as judging result All it is when being, to continue through the default PageRank model trainers and be iterated operation, otherwise, according to side from high to low Formula exports the PageRank vectors after iteration.

In the technical scheme, in former classical PageRank algorithms authority value mean allocation problem and only contemplate The problem of chain, is improved, and is iterated by preset PageRank model trainers so that in practical applications repeatedly Codes or data faster, and more can consider different user have different levels technorati authority, to actual enterprise recommend and There is higher search and recommendation quality in search, improve the working efficiency and user experience of system.

According to one embodiment of present invention, the initial parameter include iterative vectorized, random damping factor, it is described predetermined Iteration precision and the predetermined iterations threshold value.

The embodiment of the second aspect of the present invention proposes a kind of information recommendation system, including：Information preprocessing unit, root According to the user behaviors log in the user behaviors log database of the server, adjacency matrix is generated；Matrix conversion unit, by the adjoining Matrix is converted into hyperlink matrix；Parameter selection unit is the default of the server according to the hyperlink matrix PageRank model trainers choose initial parameter；Training unit passes through the default PageRank according to the initial parameter Model trainer calculates PageRank vectors, and records iterations；Recommendation unit exports iteration in the way of from high to low PageRank vectors afterwards；Wherein,

The calculation formula of the default PageRank model trainers is：

Wherein, as follows to the improvement of authority value α：

According to one embodiment of present invention, described information pretreatment unit includes：Direct graph with weight establishes unit, from institute It states in the user behaviors log in user behaviors log database and extracts all recommendation informations, be section with each user of the server Point establishes side, with the number of recommendation for the side to recommend the user of business as starting point, and using recommended user as terminal Weights, establish a direct graph with weight；The direct graph with weight is stored in the adjacency matrix by storage unit.

W (j, i)=W_in(j,i)*W_out(j,i)

Wherein W_in(j, i), W_out(j, i) is defined as follows：

W (j, i)=W_in(j,i)·W_out(j,i)

According to one embodiment of present invention, the matrix conversion unit is specifically used for：The direct graph with weight is converted For the hyperlink matrix, wherein the calculation formula converted is：

According to one embodiment of present invention, further include：Judging unit is completed after training, to sentence in the training unit Whether the iterations that break are more than predetermined iterations threshold value, and judge the PageRank vectors and last iteration Whether the PageRank vectors exceed predetermined iteration precision；And it when judging result is all to be, continues through described default PageRank model trainers are iterated operation, otherwise, exported in the way of from high to low after iteration described in PageRank vectors.

Technical solution through the invention, to the authority value mean allocation problem and only in former classical PageRank algorithms The problem of contemplating chain is improved, and is iterated by preset PageRank model trainers so that in practical application In iterative data faster, and can more consider that different user has the technorati authority of different levels, in actual enterprise Higher search and recommendation quality recommended and had in searching for, the working efficiency and user experience of system improved.

Description of the drawings

Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention；

Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention；

Fig. 3 shows the block diagram of information recommendation system according to an embodiment of the invention.

Specific implementation mode

To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real Mode is applied the present invention is further described in detail.It should be noted that in the absence of conflict, the implementation of the application Feature in example and embodiment can be combined with each other.

Many details are elaborated in the following description to facilitate a thorough understanding of the present invention, still, the present invention may be used also To be implemented different from other modes described here using other, therefore, protection scope of the present invention is not by described below Specific embodiment limitation.

Fig. 1 shows the flow chart of information recommendation method according to an embodiment of the invention.

As shown in Figure 1, information recommendation method according to an embodiment of the invention, including：

Step 102, according to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated.

Step 104, it will abut against matrix and be converted into hyperlink matrix.

Step 106, it is that the default PageRank model trainers of server choose initial parameter according to hyperlink matrix.

Step 108, according to initial parameter, PageRank vectors are calculated by default PageRank model trainers, and remember Record iterations.

Step 110, the vectors of the PageRank after iteration are exported in the way of from high to low；Wherein, PageRank is preset The calculation formula of model trainer is：

Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, T_iIt is any user of recommended user A, C (T_i) indicate any user T_iRecommend other The total degree of user, PR (T_i) it is any user T_iPageRank vector, i=1,2 ..., n.

Wherein, the PageRank vectors that PR (A) is recommended user A, N are the sum of webpage, wherein webpage T_iRefer to To i-th of source page (chain enters the page) of webpage A, C (T_i) it is webpage T_iChain page-out out-degree sum, i=1,2 ..., n. The meaning of the model refers to user and rests on some page, may carry out browsing pages at random with the probability of 1- α/N, may be general with α Rate follows links browsing pages.

Wherein, as follows to the improvement of authority value α：

Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, user T_iIt is the user of recommended user A, C (T_i) indicate that user Ti recommends other users Total degree.

According to one embodiment of present invention, step 102 specifically includes：In user behaviors log in subordinate act log database All recommendation informations are extracted, using each user of server as node, to recommend the user of business as starting point, and with recommended User be terminal establish side, using the number of recommendation as the weights on side, establish a direct graph with weight；Direct graph with weight is stored In adjacency matrix.

W (j, i)=W_in(j,i)*W_out(j,i)

Wherein W_in(j, i), W_out(j, i) is defined as follows：

W (j, i)=W_in(j,i)·W_out(j,i)

According to one embodiment of present invention, step 104 specifically includes：Convert direct graph with weight to hyperlink matrix, Wherein, the calculation formula converted is：

Wherein, H (i, j) is hyperlink matrix, and i is any user, and colSum (i) is that any user in adjacency matrix pushes away The total degree of other users is recommended, n is the total number of persons for being related to recommendation behavior.

According to one embodiment of present invention, after step 108, including：Judge whether iterations are more than predetermined change For frequency threshold value, and judge whether PageRank vectors exceed predetermined iteration precision with original PageRank vectors；Work as judging result All it is when being, to continue through default PageRank model trainers and be iterated operation, it is otherwise, defeated in the way of from high to low Go out the vectors of the PageRank after iteration.

According to one embodiment of present invention, initial parameter includes iterative vectorized, random damping factor, predetermined iteration precision With predetermined iterations threshold value.

Fig. 2 shows the flow charts of information recommendation method according to another embodiment of the invention.

As shown in Fig. 2, information recommendation method according to another embodiment of the invention, includes the following steps：

Step 202, subordinate act log database obtains user behaviors log.

Step 204, the data of user behaviors log are pre-processed, then processed data assembling at chain matrice/ Connection table.

Step 206, treated, chain matrice/connection table is processed into hyperlink matrix.

Step 208, it is the default PageRank model trainers selection initial parameter of server, parameters include：Resistance Buddhist nun's coefficient alpha, iteration precision eps, iteration threshold thresHold and initial vector V0.

Step 210, hyperlink matrix and each initial parameter are input in default PageRank model trainers.

Step 212, PageRank vector V are calculated by default PageRank model trainers, and records iterations count。

Step 214, judge PageRank vector values whether within the scope of iteration precision or iterations whether beyond repeatedly Whether generation number meets count=count+1, or whether meets V0=V, when judging result is to be, enter step 216, Otherwise, return to step 210.

Step 216, the PageRank vector values V after iteration is exported in the way of from high to low.

The information recommendation method of one embodiment of the present of invention is described with reference to concrete application scene.

Extract on 07 10th, 2014 on 08 21st, 2014 user behaviors logs of building ring, using two kinds of improvement projects with Classical PageRank models are compared, and as shown in table 1, improved model under the same conditions, can be received quickly Hold back specified iteration precision, can more accurately select the user for having popularity, comparison authoritative carry out building ring recommendation or Person searches for.In view of being related to user in building ring using recommendation behavior privacy, only enumerates in experimental situation close here Selection in parameter and final iteration precision and number.

Table 1

As shown in figure 3, information recommendation system 300 according to an embodiment of the invention, including：Information preprocessing unit 302, according to the user behaviors log in the user behaviors log database of server, generate adjacency matrix；Matrix conversion unit 304, will be adjacent It connects matrix and is converted into hyperlink matrix；Parameter selection unit 306 is the default PageRank of server according to hyperlink matrix Model trainer chooses initial parameter；Training unit 308, according to initial parameter, by presetting PageRank model trainer meters PageRank vectors are calculated, and record iterations；Recommendation unit 310, after exporting iteration in the way of from high to low PageRank vectors；Wherein,

The calculation formula of default PageRank model trainers is：

Wherein, as follows to the improvement of authority value α：

Wherein, the PageRank vectors that PR (A) is recommended user A, n are the sum of all users of recommended user A, N is the total number of persons for being related to recommendation behavior, user T_iIt is the user of recommended user A, C (T_i) indicate user T_iRecommend other users Total degree.

According to one embodiment of present invention, information preprocessing unit 302 includes：Direct graph with weight establishes unit 3022, All recommendation informations are extracted in user behaviors log in subordinate act log database, using each user of server as node, to push away The user for recommending business is starting point, and establishes side as terminal using recommended user, using the number of recommendation as the weights on side, is established One direct graph with weight；Storage unit 3024, direct graph with weight is stored in adjacency matrix.

W (j, i)=W_in(j,i)*W_out(j,i)

Wherein W_in(j, i), W_out(j, i) is defined as follows：

W (j, i)=W_in(j,i)·W_out(j,i)

According to one embodiment of present invention, matrix conversion unit 304 is specifically used for：Convert direct graph with weight to hyperlink Connect matrix, wherein the calculation formula converted is：

According to one embodiment of present invention, further include：Judging unit 312 is completed after training in training unit 308, Judge whether iterations are more than predetermined iterations threshold value, and judges PageRank vectors and last iteration Whether PageRank vectors exceed predetermined iteration precision；And when judging result is all to be, continue through default PageRank Model trainer is iterated operation, otherwise, the vectors of the PageRank after iteration is exported in the way of from high to low.

Technical scheme of the present invention is described in detail above in association with attached drawing, technical solution through the invention, to former classical PageRank algorithms in authority value mean allocation problem and the problem of only contemplating chain improved, by preset PageRank model trainers are iterated so that iterative data in practical applications faster, and can more consider difference User has the technorati authority of different levels, to have higher search and recommendation matter in actual enterprise is recommended and searches for Amount, improves the working efficiency and user experience of system.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of information recommendation method, which is characterized in that including：

According to the user behaviors log in the user behaviors log database of server, adjacency matrix is generated；

Convert the adjacency matrix to hyperlink matrix；

It is that the default PageRank model trainers of the server choose initial parameter according to the hyperlink matrix；

According to the initial parameter, PageRank vectors are calculated by the default PageRank model trainers, and record and change Generation number；

The vectors of the PageRank after iteration are exported in the way of from high to low；Wherein,

The calculation formula of the default PageRank model trainers is：

Wherein, the PageRank vectors that PR (A) is recommended user A, n is all users for recommending the user A Sum, N are the total number of persons for being related to recommendation behavior, T_iIt is any user for recommending the user A, C (T_i) indicate any use Family T_iRecommend the total degree of other users, PR (T_i) it is any user T_iThe PageRank vector, i=1,2 ..., n；

User behaviors log in the user behaviors log database according to server generates adjacency matrix, specifically includes：

All recommendation informations are extracted from the user behaviors log in the user behaviors log database, with each of described server User is node, side is established as terminal to recommend the user of business as starting point, and using recommended user, with the number of recommendation For the weights on the side, a direct graph with weight is established；

The direct graph with weight is stored in the adjacency matrix.

2. information recommendation method according to claim 1, which is characterized in that described to convert the adjacency matrix to hyperlink Matrix is connect, is specifically included：

Convert the direct graph with weight to the hyperlink matrix, wherein the calculation formula converted is：

Wherein, H (i, j) is the hyperlink matrix, and i is any user, and colSum (i) is the institute in the adjacency matrix The total degree that any user recommends the other users is stated, n is the total number of persons for being related to the recommendation behavior.

3. information recommendation method according to claim 1 or 2, which is characterized in that described by described default PageRank model trainers calculate PageRank vectors, and after recording iterations, including：

Judge whether the iterations are more than predetermined iterations threshold value, and judges described in the PageRank vectors and original Whether PageRank vectors exceed predetermined iteration precision；

When judging result is all to be, continues through the default PageRank model trainers and be iterated operation, otherwise, press The vectors of the PageRank after iteration are exported according to mode from high to low.

4. information recommendation method according to claim 3, which is characterized in that the initial parameter include it is iterative vectorized, with Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.

5. a kind of information recommendation system, which is characterized in that including：

Information preprocessing unit generates adjacency matrix according to the user behaviors log in the user behaviors log database of server；

Matrix conversion unit converts the adjacency matrix to hyperlink matrix；

Parameter selection unit is that the default PageRank model trainers of the server are chosen according to the hyperlink matrix Initial parameter；

Training unit calculates PageRank vectors according to the initial parameter by the default PageRank model trainers, And record iterations；

Recommendation unit exports the vectors of the PageRank after iteration in the way of from high to low；Wherein,

The calculation formula of the default PageRank model trainers is：

Described information pretreatment unit includes：

Direct graph with weight establishes unit, and all recommendations are extracted from the user behaviors log in the user behaviors log database Breath is eventually to recommend the user of business as starting point, and with recommended user using each user of the server as node Point establishes side, with the weights that the number of recommendation is the side, establishes a direct graph with weight；

The direct graph with weight is stored in the adjacency matrix by storage unit.

6. information recommendation system according to claim 5, which is characterized in that the matrix conversion unit is specifically used for：

7. information recommendation system according to claim 5 or 6, which is characterized in that further include：

Judging unit is completed after training, to judge whether the iterations are more than predetermined iterations in the training unit Threshold value, and judge whether the PageRank vectors and the PageRank vectors of last iteration are smart beyond predetermined iteration Degree；And when judging result is all to be, continues through the default PageRank model trainers and be iterated operation, it is no Then, the vectors of the PageRank after iteration are exported in the way of from high to low.

8. information recommendation system according to claim 7, which is characterized in that the initial parameter include it is iterative vectorized, with Machine damping factor, the predetermined iteration precision and the predetermined iterations threshold value.