CN1281027C

CN1281027C - Collaborative filtering recommendation approach for dealing with ultra-mass users

Info

Publication number: CN1281027C
Application number: CN 200310109063
Authority: CN
Inventors: 申瑞民; 谢波; 韩鹏; 杨帆
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2003-12-04
Filing date: 2003-12-04
Publication date: 2006-10-18
Anticipated expiration: 2023-12-04
Also published as: CN1547351A

Abstract

The present invention relates to a collaborative filtering recommendation method for processing oversize users, which belongs to the technical field of the network. The method comprises the following steps that firstly, the project scoring data from users is stored in a distributed way so that each user stores the project scoring data of own; then, a similar user scoring data is obtained by a distributed hash table, and the collaborative filtering method is locally used for obtaining a predicted value; a recommended value is further obtained, and simultaneously, the number of similar users returned form a distributed hash table coverage network in each project is limited in the process; the similar user influence is adjusted so that having the number of two-tuples users having the same item number and scoring is more, the influence of the two-tuples users is larger; consequently, the predicting accuracy is enhanced. A routing algorithm of the distributed hash table is introduced the collaborative filtering system in the present invention, and the routing algorithm is improved. Thus, the problem that the existing centralized collaborative filtering system has poor inherent extensibility is solved, and the recommended quality is also enhanced.

Description

Handle super amount user's collaborative filtering recommending method

Technical field

The present invention relates to a kind of collaborative filtering recommending method, particularly a kind of collaborative filtering recommending method of handling super amount user belongs to networking technology area.

Background technology

Information resources on the Internet are the index expansion and have brought so-called " information overload " and " information is isotropic " problem, and promptly people are difficult to find own information of interest, even found some, also often are mixed with a lot " noise ".Therefore technology such as information retrieval, information filtering and collaborative filtering towards the Internet have appearred.But it is intelligent that information retrieval does not have, and can not learn user's interest, especially to having the user of particular professional interest, imports identical keyword and can only obtain identical result for retrieval.Information filtering can not be distinguished the quality quality to the filter result of same theme, and along with the sharp increase of information resources, more effective filtration need be in conjunction with people's quality evaluation information.Collaborative filtering has then been considered evaluation of user information.Collaborative filtering analysis user interest finds similar (interest) user of designated user in customer group, comprehensively these similar users are to the evaluation of a certain information, and the formation system is to the prediction of this designated user to the fancy grade of this information.But the collaborative filtering method based on the user is to seek similar users on the existing information basis at present, each prediction all will be calculated similarity between all users, and along with the increase of customer data base, increasing of data entries, it is too big to calculate between all users the similarity resource consumption, and real-time and extensibility are all very poor.The centralized collaborative filtering recommending algorithm based on the user of tradition is 0 (M*M*N) for M user and N the needed computation complexity of project, when meeting with super amount user, when being 1,000,000 orders of magnitude such as M, the centralized collaborative filtering recommending algorithm based on the user can not be worked because computation complexity is too high.

Find by literature search, Stoca is at " Chord:A scalable peer-to-peer inquiry service for Internet applications (Chord: the scalable equity service of searching that is applicable to internet, applications) " (ACM SIGCOMM, San Diego, CA, USA, 2001, pp.149-160 " Association for Computing Machinery's data communication special interest group calendar year 2001 annual meeting meeting collection ", the 149-160 page or leaf) in this piece article, proposed a equity and searched algorithm based on distributed hashtable, Chord network for N peer node formation, each node only need be followed the tracks of the information of logN other node, when peer-to-peer adds or withdraws from, only need logN other peer node of notice change its application layer routing table.The benefit of this distributed hashtable routing algorithm is that scalability is good, but present this method is mainly used in the distribution type file access field.

Summary of the invention

The objective of the invention is to defective or deficiency, a kind of collaborative filtering recommending method of handling super amount user is provided, the problem when making the centralized collaborative filtering recommending method of its solution tradition meet with super amount user based on the user at the prior art existence.

The present invention is achieved by the following technical solutions, the present invention is at first the score data distributed earth storage of M user for N project, be the score data of each user storage oneself for N project, obtain the score data of similar users then by the distributed hashtable method for routing, use collaborative filtering method to obtain predicted value in this locality, further obtain recommendation again, make each user's computation complexity reduce to 0 (M*N) like this, improved the scalability of system, simultaneously by in the process that obtains the similar users score data, limiting each project is returned similar users from the distributed hashtable overlay network number, and the influence power of adjustment similar users, make and have multinomial more numbering, the user of identical two tuples of marking, its influence power is also corresponding big more.Thereby the method complexity is further reduced to 0 (N*N), itself and number of users are had nothing to do, further greatly improved the scalability of system, and test data proves that it also improves the quality of accuracy for predicting and recommendation.

Below the inventive method is further described, method step is as follows:

A, each user construct a distributed hashtable overlay network at Agent of computer background operation;

B, agency the user for certain project scoring constituted＜ITEM_ID, VOTE〉(promptly＜number, mark 〉) two tuples are hashing onto whole distributed hashtable overlay network;

C, agency for the user each＜numbering, scoring〉two tuples, from the distributed hashtable overlay network, obtain similarly user (think and have an identical＜numbering, scoring〉user of two tuples may be similar);

D, agency use collaborative filtering method to obtain predicted value for user's score data of fetching the similar users of coming in this locality, therefrom produce the recommendation to the user.

● described steps A, specific as follows:

A1, user activate the local agent program, allow it always at running background;

A2, Agent obtain a cryptographic Hash K by the Hash user name _Local(being local cryptographic Hash) is used for unique sign oneself;

A3, all agencies construct distributed hash value routing table by cryptographic Hash, thereby interconnected with other agencies, construct the distributed hashtable overlay network;

● described step B, after having constructed the distributed hashtable overlay network, specific as follows:

B1, user's Agent by the Hash user each＜the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;

B2, for each cryptographic Hash K, structure carries " PUT message " (i.e. " PUSH message ") of cryptographic Hash K and it is transmitted to the neighbours the most similar to K that (" the most similar " means K and neighbours' cryptographic Hash K here _LocalN position prefix the most close);

B3, receive " PUSH message " that other neighbours' routes are come as an agency, at first the user's score data in the message and cryptographic Hash K be placed on locally buffered in, if oneself be not the cryptographic Hash K the most similar neighbours entrained then, just this " PUSH message " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it with this " PUSH message ".

● described step C, specific as follows:

C1, user's Agent by the Hash user each＜the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;

C2, for each cryptographic Hash K, structure carry cryptographic Hash K " " (i.e. " query messages ") also is transmitted to the neighbours the most similar to K to it to LOOKUP message;

C3, receive " query messages " that other neighbours' routes are come as an agency, check at first whether own local buffer the inside has F user's score data (F is an individual constant, such as equals 5) that contains cryptographic Hash K.If have, then this F user and score data thereof are returned to the sender of " query messages ", on the way all agencies of process all this F user's score data is placed in the local buffer.If no, then continue this " query messages " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it;

C4, in any case, the sender of " query messages " finally can obtain the score data of the similar users that certain agency returns;

C5, agency compile all＜numbering, scoring〉score data of the similar users that obtains of two tuples, obtain a similar users score data collection in this locality.

● described step D, after this locality obtains a similar users score data collection, specific as follows:

D1, the relevant similitude of use or cosine similarity method obtain local user's score data collection;

D2, from the similitude that obtains, select N the most similar user (N is an individual constant, such as equaling 100);

D3, deviate from method according to this N user's similitude by average and obtain predicted value,, then recommend it to the user if predicted value surpasses certain threshold values.

The present invention has been proposed two kinds of improvement in order further to obtain better collaborative filtering recommending quality:

1) restriction distributed hashtable overlay network for each＜numbering, scoring〉two tuples return the number of similar users, test data of experiment shows, for each＜numbering, scoring〉two tuples return 5 similar users at most and can obtain reasonable recommendation quality.And this method can drop to 0 (1) order of magnitude to the sum that returns the user from 0 (N), and N represents whole users' sum.

2) after the distributed hashtable overlay network obtains the score data of similar users, adopt formula (1) to double the similarity weights of " high associated user ", reduce " low relevant " user's similarity weights, the experimental test data show like this can better be recommended quality largely.

w_{a, i}^{'} = \{\begin{matrix} w_{a, i} & N_{a, i} = 0 \\ w_{a, i} \cdot α & 0 {< N}_{a, i} \leq γ \\ w_{a, i} \cdot β & N_{a, i} > γ \end{matrix} - - - (1)

Here w _{A, i}And w _{A, i}' preceding and adjusted similarity weights, N are adjusted in expression respectively _{A, i}Represent that user a and user i have the number of the item of identical scoring.Factor alpha, β can regulate according to different experimental enviroments with γ, and factor alpha is taken as 2.0 in the reality test, and β is taken as 4.0, and γ is taken as 4, and the recommendation quality that obtains has improved 7.6%.

The present invention compared with prior art, the distributed hashtable method for routing is introduced the collaborative filtering system, and improve, solved the problem of the intrinsic poor expandability of existing centralized collaborative filtering system, and improved the recommendation quality, thereby solved super amount customer problem.The core of the inventive method is to use improved distributed hashtable method for routing to find similar users effectively and fetches its score data, and the utilization collaborative filtering calculates predicted value in this locality, and produces recommendation.Because the similarity of the similar users that improved distributed hashtable method for routing returns is very high, a lot of noise user score data have been removed, so the quality of collaborative filtering recommending also is better than traditional method.

Description of drawings

Fig. 1 is the data access of the inventive method and transmits schematic diagram

Fig. 2 is the collaborative filtering method flow chart that the present invention is based on distributed hashtable

Fig. 3 is the collaborative filtering method for pushing flow chart that the present invention is based on distributed hashtable

Fig. 4 is the collaborative filtering querying method flow chart that the present invention is based on distributed hashtable

Embodiment

Below in conjunction with accompanying drawing the execution mode of the inventive method is further described in detail.

As shown in Figure 1, data access of the present invention and forwarding schematic diagram, each user agent preserves two piece of data in this locality, and portion is user's oneself a score data, and another part is other users' of buffering score data.Link to each other by the distributed hashtable overlay network between user and the user, method for routing finds the most similar with it neighbours according to the incidental cryptographic Hash K of message, and gives these neighbours this forwards.

The flow process of the collaborative filtering method of whole distributed hashtable is as shown in Figure 2, and is specific as follows:

B, agency the user for certain project scoring constituted＜numbering, mark〉two tuples are hashing onto whole distributed hashtable overlay network;

As shown in Figure 3, the method flow of steps A is specific as follows:

A2, Agent obtain a local cryptographic Hash by the Hash user name and are used for unique sign oneself;

As shown in Figure 3, after having constructed the distributed hashtable overlay network, the method flow of step B is specific as follows:

B2, for each cryptographic Hash K, the structure carry " PUSH message " of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;

As shown in Figure 4, the querying method flow process of step C is specific as follows:

C2, for each cryptographic Hash K, structure " query messages " (carrying cryptographic Hash K) also is transmitted to the neighbours the most similar to K to it;

The score data of the similar users that the agency that the sender of C4, " query messages " obtains successfully hitting in buffering returns;

C5, agency compile all＜numbering, scoring〉score data of the similar users that obtains of two tuples, obtain the score data collection of a similar users in this locality.

Step C is the core of whole system, and its target is to find similar users effectively in distributed collaboration filtered recommendation method the inside, and fetches the score data of similar users.Its validity and extensibility be embodied in the distributed hashtable overlay network middle forward node buffer memory a lot of users' score data, by forwarding several times seldom just can find have identical＜the numbering, scoring〉similar users of two tuples, and it is returned.Such as a user's local cryptographic Hash is 123456, for certain＜the numbering, scoring〉cryptographic Hash of two tuples is 654321, needing for 6 steps could give be responsible for cryptographic Hash to " query messages " route under the worst case be 654321 node, local cryptographic Hash through node in the middle of supposing is respectively 698765,659823,654987,654398,654329.But because the existence of buffering area may be routed to the 2nd node at " query messages ", its local cryptographic Hash is 659823, just successfully inquires an identical＜numbering, scoring〉similar users of two tuples, returned.

Step D, after this locality obtains a similar users score data collection, specific as follows:

D2, from the similitude that obtains, select affinity constant user;

D3, deviate from method according to this constant user's similitude by average and obtain predicted value,, then recommend it to the user if predicted value surpasses mean value.

Step D can use traditional collaborative filtering method to produce predicted value and recommendation.In order to obtain higher recommendation precision, adopt formula (1) to double the similarity weights of " high associated user ", reduce " low relevant " user's similarity weights.Such as user P is for three similar users A, B, and C has obtained identical similarity weights according to relevant similitude or cosine similarity method, all be 0.2, the identical item but A and P do not mark, B has the identical item of 1 item rating with P, C has the identical item of 5 item ratings with P, be taken as 2.0 according to factor alpha in the experiment test, β is taken as 4.0, and γ is taken as 4, the similarity weights of A are constant, still be 0.2, the similarity weights of B become 0.4, and the similarity weights of C become 0.8.Therefrom as can be seen, the item number of identical scoring is many more, and its similarity just should be high more, therefore increases its similarity weights greatly.Test data of experiment shows can improve the quality of accuracy for predicting and recommendation so really.

Claims

1, a kind of collaborative filtering recommending method of handling super amount user, it is characterized in that, at first the score data distributed earth storage of user for project, be the score data of each user storage oneself for project, obtain the score data of similar users then by the distributed hashtable method for routing, use collaborative filtering method to obtain predicted value in this locality, further obtain recommendation again, simultaneously by in the process that obtains the similar users score data, limiting each project is returned similar users from the distributed hashtable overlay network number, and the influence power of adjustment similar users, make and have multinomial more numbering, the user of identical two tuples of marking, its influence power is also corresponding big more, thereby improves prediction accuracy.

2, the super amount user's of processing according to claim 1 collaborative filtering recommending method is characterized in that, method step is as follows:

B, user agent are hashing onto whole distributed hashtable overlay network to the user for the item numbering that scoring constituted, scoring two tuples of certain project;

C, user agent obtain user similarly for each numbering of user, two tuples of marking from the distributed hashtable overlay network;

D, user agent use collaborative filtering method to obtain predicted value for user's score data of fetching the similar users of coming in this locality, therefrom produce the recommendation to the user.

3, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described steps A is specific as follows:

A2, user agent obtain a local cryptographic Hash by the Hash user name and are used for unique sign oneself;

A3, all user agents construct distributed hash value routing table by cryptographic Hash, thereby interconnected with other agencies, construct the distributed hashtable overlay network.

4, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step B is after having constructed the distributed hashtable overlay network, specific as follows:

B1, user agent obtain its cryptographic Hash K by each numbering of Hash user, two tuples of marking;

B2, for each cryptographic Hash K, the structure carry the PUSH message of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;

B3, receive the PUSH message that other neighbours' routes are come as a user agent, at first user's score data in the message and cryptographic Hash K are placed in the local buffer, if have the cryptographic Hash K more similar neighbours entrained then, just this PUSH message be transmitted to the cryptographic Hash K the most similar neighbours entrained to it with this PUSH message.

5, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step C is specific as follows:

C1, user agent obtain its cryptographic Hash K by each numbering of Hash user, two tuples of marking;

C2, for each cryptographic Hash K, the structure carry the query messages of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;

C3, receive the query messages that other neighbours' routes are come as an agency, whether the local buffer the inside of at first checking oneself has F user's score data that contains cryptographic Hash K, if have, then this F user and score data thereof are returned to the sender of query messages, on the way all agencies of process all this F user's score data is placed in the local buffer, otherwise then continue this query messages is transmitted to the cryptographic Hash K the most similar neighbours entrained to it;

The sender of C4, query messages finally can obtain the score data of the similar users that certain agency returns;

C5, agency compile the score data of all numberings, the similar users that obtains of scoring two tuples, obtain a similar users score data collection in this locality.

6, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step D is after this locality obtains similar users score data collection, specific as follows:

D2, from the similitude that obtains, select N the most similar user;