CN1281027C - Collaborative filtering recommendation approach for dealing with ultra-mass users - Google Patents

Collaborative filtering recommendation approach for dealing with ultra-mass users Download PDF

Info

Publication number
CN1281027C
CN1281027C CN 200310109063 CN200310109063A CN1281027C CN 1281027 C CN1281027 C CN 1281027C CN 200310109063 CN200310109063 CN 200310109063 CN 200310109063 A CN200310109063 A CN 200310109063A CN 1281027 C CN1281027 C CN 1281027C
Authority
CN
China
Prior art keywords
user
score data
cryptographic hash
collaborative filtering
similar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 200310109063
Other languages
Chinese (zh)
Other versions
CN1547351A (en
Inventor
申瑞民
谢波
韩鹏
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN 200310109063 priority Critical patent/CN1281027C/en
Publication of CN1547351A publication Critical patent/CN1547351A/en
Application granted granted Critical
Publication of CN1281027C publication Critical patent/CN1281027C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention relates to a collaborative filtering recommendation method for processing oversize users, which belongs to the technical field of the network. The method comprises the following steps that firstly, the project scoring data from users is stored in a distributed way so that each user stores the project scoring data of own; then, a similar user scoring data is obtained by a distributed hash table, and the collaborative filtering method is locally used for obtaining a predicted value; a recommended value is further obtained, and simultaneously, the number of similar users returned form a distributed hash table coverage network in each project is limited in the process; the similar user influence is adjusted so that having the number of two-tuples users having the same item number and scoring is more, the influence of the two-tuples users is larger; consequently, the predicting accuracy is enhanced. A routing algorithm of the distributed hash table is introduced the collaborative filtering system in the present invention, and the routing algorithm is improved. Thus, the problem that the existing centralized collaborative filtering system has poor inherent extensibility is solved, and the recommended quality is also enhanced.

Description

Handle super amount user's collaborative filtering recommending method
Technical field
The present invention relates to a kind of collaborative filtering recommending method, particularly a kind of collaborative filtering recommending method of handling super amount user belongs to networking technology area.
Background technology
Information resources on the Internet are the index expansion and have brought so-called " information overload " and " information is isotropic " problem, and promptly people are difficult to find own information of interest, even found some, also often are mixed with a lot " noise ".Therefore technology such as information retrieval, information filtering and collaborative filtering towards the Internet have appearred.But it is intelligent that information retrieval does not have, and can not learn user's interest, especially to having the user of particular professional interest, imports identical keyword and can only obtain identical result for retrieval.Information filtering can not be distinguished the quality quality to the filter result of same theme, and along with the sharp increase of information resources, more effective filtration need be in conjunction with people's quality evaluation information.Collaborative filtering has then been considered evaluation of user information.Collaborative filtering analysis user interest finds similar (interest) user of designated user in customer group, comprehensively these similar users are to the evaluation of a certain information, and the formation system is to the prediction of this designated user to the fancy grade of this information.But the collaborative filtering method based on the user is to seek similar users on the existing information basis at present, each prediction all will be calculated similarity between all users, and along with the increase of customer data base, increasing of data entries, it is too big to calculate between all users the similarity resource consumption, and real-time and extensibility are all very poor.The centralized collaborative filtering recommending algorithm based on the user of tradition is 0 (M*M*N) for M user and N the needed computation complexity of project, when meeting with super amount user, when being 1,000,000 orders of magnitude such as M, the centralized collaborative filtering recommending algorithm based on the user can not be worked because computation complexity is too high.
Find by literature search, Stoca is at " Chord:A scalable peer-to-peer inquiry service for Internet applications (Chord: the scalable equity service of searching that is applicable to internet, applications) " (ACM SIGCOMM, San Diego, CA, USA, 2001, pp.149-160 " Association for Computing Machinery's data communication special interest group calendar year 2001 annual meeting meeting collection ", the 149-160 page or leaf) in this piece article, proposed a equity and searched algorithm based on distributed hashtable, Chord network for N peer node formation, each node only need be followed the tracks of the information of logN other node, when peer-to-peer adds or withdraws from, only need logN other peer node of notice change its application layer routing table.The benefit of this distributed hashtable routing algorithm is that scalability is good, but present this method is mainly used in the distribution type file access field.
Summary of the invention
The objective of the invention is to defective or deficiency, a kind of collaborative filtering recommending method of handling super amount user is provided, the problem when making the centralized collaborative filtering recommending method of its solution tradition meet with super amount user based on the user at the prior art existence.
The present invention is achieved by the following technical solutions, the present invention is at first the score data distributed earth storage of M user for N project, be the score data of each user storage oneself for N project, obtain the score data of similar users then by the distributed hashtable method for routing, use collaborative filtering method to obtain predicted value in this locality, further obtain recommendation again, make each user's computation complexity reduce to 0 (M*N) like this, improved the scalability of system, simultaneously by in the process that obtains the similar users score data, limiting each project is returned similar users from the distributed hashtable overlay network number, and the influence power of adjustment similar users, make and have multinomial more numbering, the user of identical two tuples of marking, its influence power is also corresponding big more.Thereby the method complexity is further reduced to 0 (N*N), itself and number of users are had nothing to do, further greatly improved the scalability of system, and test data proves that it also improves the quality of accuracy for predicting and recommendation.
Below the inventive method is further described, method step is as follows:
A, each user construct a distributed hashtable overlay network at Agent of computer background operation;
B, agency the user for certain project scoring constituted<ITEM_ID, VOTE〉(promptly<number, mark 〉) two tuples are hashing onto whole distributed hashtable overlay network;
C, agency for the user each<numbering, scoring〉two tuples, from the distributed hashtable overlay network, obtain similarly user (think and have an identical<numbering, scoring〉user of two tuples may be similar);
D, agency use collaborative filtering method to obtain predicted value for user's score data of fetching the similar users of coming in this locality, therefrom produce the recommendation to the user.
● described steps A, specific as follows:
A1, user activate the local agent program, allow it always at running background;
A2, Agent obtain a cryptographic Hash K by the Hash user name Local(being local cryptographic Hash) is used for unique sign oneself;
A3, all agencies construct distributed hash value routing table by cryptographic Hash, thereby interconnected with other agencies, construct the distributed hashtable overlay network;
● described step B, after having constructed the distributed hashtable overlay network, specific as follows:
B1, user's Agent by the Hash user each<the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;
B2, for each cryptographic Hash K, structure carries " PUT message " (i.e. " PUSH message ") of cryptographic Hash K and it is transmitted to the neighbours the most similar to K that (" the most similar " means K and neighbours' cryptographic Hash K here LocalN position prefix the most close);
B3, receive " PUSH message " that other neighbours' routes are come as an agency, at first the user's score data in the message and cryptographic Hash K be placed on locally buffered in, if oneself be not the cryptographic Hash K the most similar neighbours entrained then, just this " PUSH message " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it with this " PUSH message ".
● described step C, specific as follows:
C1, user's Agent by the Hash user each<the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;
C2, for each cryptographic Hash K, structure carry cryptographic Hash K " " (i.e. " query messages ") also is transmitted to the neighbours the most similar to K to it to LOOKUP message;
C3, receive " query messages " that other neighbours' routes are come as an agency, check at first whether own local buffer the inside has F user's score data (F is an individual constant, such as equals 5) that contains cryptographic Hash K.If have, then this F user and score data thereof are returned to the sender of " query messages ", on the way all agencies of process all this F user's score data is placed in the local buffer.If no, then continue this " query messages " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it;
C4, in any case, the sender of " query messages " finally can obtain the score data of the similar users that certain agency returns;
C5, agency compile all<numbering, scoring〉score data of the similar users that obtains of two tuples, obtain a similar users score data collection in this locality.
● described step D, after this locality obtains a similar users score data collection, specific as follows:
D1, the relevant similitude of use or cosine similarity method obtain local user's score data collection;
D2, from the similitude that obtains, select N the most similar user (N is an individual constant, such as equaling 100);
D3, deviate from method according to this N user's similitude by average and obtain predicted value,, then recommend it to the user if predicted value surpasses certain threshold values.
The present invention has been proposed two kinds of improvement in order further to obtain better collaborative filtering recommending quality:
1) restriction distributed hashtable overlay network for each<numbering, scoring〉two tuples return the number of similar users, test data of experiment shows, for each<numbering, scoring〉two tuples return 5 similar users at most and can obtain reasonable recommendation quality.And this method can drop to 0 (1) order of magnitude to the sum that returns the user from 0 (N), and N represents whole users' sum.
2) after the distributed hashtable overlay network obtains the score data of similar users, adopt formula (1) to double the similarity weights of " high associated user ", reduce " low relevant " user's similarity weights, the experimental test data show like this can better be recommended quality largely.
w a , i &prime; = w a , i N a , i = 0 w a , i &CenterDot; &alpha; 0 < N a , i &le; &gamma; w a , i &CenterDot; &beta; N a , i > &gamma; - - - ( 1 )
Here w A, iAnd w A, i' preceding and adjusted similarity weights, N are adjusted in expression respectively A, iRepresent that user a and user i have the number of the item of identical scoring.Factor alpha, β can regulate according to different experimental enviroments with γ, and factor alpha is taken as 2.0 in the reality test, and β is taken as 4.0, and γ is taken as 4, and the recommendation quality that obtains has improved 7.6%.
The present invention compared with prior art, the distributed hashtable method for routing is introduced the collaborative filtering system, and improve, solved the problem of the intrinsic poor expandability of existing centralized collaborative filtering system, and improved the recommendation quality, thereby solved super amount customer problem.The core of the inventive method is to use improved distributed hashtable method for routing to find similar users effectively and fetches its score data, and the utilization collaborative filtering calculates predicted value in this locality, and produces recommendation.Because the similarity of the similar users that improved distributed hashtable method for routing returns is very high, a lot of noise user score data have been removed, so the quality of collaborative filtering recommending also is better than traditional method.
Description of drawings
Fig. 1 is the data access of the inventive method and transmits schematic diagram
Fig. 2 is the collaborative filtering method flow chart that the present invention is based on distributed hashtable
Fig. 3 is the collaborative filtering method for pushing flow chart that the present invention is based on distributed hashtable
Fig. 4 is the collaborative filtering querying method flow chart that the present invention is based on distributed hashtable
Embodiment
Below in conjunction with accompanying drawing the execution mode of the inventive method is further described in detail.
As shown in Figure 1, data access of the present invention and forwarding schematic diagram, each user agent preserves two piece of data in this locality, and portion is user's oneself a score data, and another part is other users' of buffering score data.Link to each other by the distributed hashtable overlay network between user and the user, method for routing finds the most similar with it neighbours according to the incidental cryptographic Hash K of message, and gives these neighbours this forwards.
The flow process of the collaborative filtering method of whole distributed hashtable is as shown in Figure 2, and is specific as follows:
A, each user construct a distributed hashtable overlay network at Agent of computer background operation;
B, agency the user for certain project scoring constituted<numbering, mark〉two tuples are hashing onto whole distributed hashtable overlay network;
C, agency for the user each<numbering, scoring〉two tuples, from the distributed hashtable overlay network, obtain similarly user (think and have an identical<numbering, scoring〉user of two tuples may be similar);
D, agency use collaborative filtering method to obtain predicted value for user's score data of fetching the similar users of coming in this locality, therefrom produce the recommendation to the user.
As shown in Figure 3, the method flow of steps A is specific as follows:
A1, user activate the local agent program, allow it always at running background;
A2, Agent obtain a local cryptographic Hash by the Hash user name and are used for unique sign oneself;
A3, all agencies construct distributed hash value routing table by cryptographic Hash, thereby interconnected with other agencies, construct the distributed hashtable overlay network;
As shown in Figure 3, after having constructed the distributed hashtable overlay network, the method flow of step B is specific as follows:
B1, user's Agent by the Hash user each<the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;
B2, for each cryptographic Hash K, the structure carry " PUSH message " of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;
B3, receive " PUSH message " that other neighbours' routes are come as an agency, at first the user's score data in the message and cryptographic Hash K be placed on locally buffered in, if oneself be not the cryptographic Hash K the most similar neighbours entrained then, just this " PUSH message " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it with this " PUSH message ".
As shown in Figure 4, the querying method flow process of step C is specific as follows:
C1, user's Agent by the Hash user each<the numbering, the scoring〉two tuples, obtain its cryptographic Hash K;
C2, for each cryptographic Hash K, structure " query messages " (carrying cryptographic Hash K) also is transmitted to the neighbours the most similar to K to it;
C3, receive " query messages " that other neighbours' routes are come as an agency, check at first whether own local buffer the inside has F user's score data (F is an individual constant, such as equals 5) that contains cryptographic Hash K.If have, then this F user and score data thereof are returned to the sender of " query messages ", on the way all agencies of process all this F user's score data is placed in the local buffer.If no, then continue this " query messages " is transmitted to the cryptographic Hash K the most similar neighbours entrained to it;
The score data of the similar users that the agency that the sender of C4, " query messages " obtains successfully hitting in buffering returns;
C5, agency compile all<numbering, scoring〉score data of the similar users that obtains of two tuples, obtain the score data collection of a similar users in this locality.
Step C is the core of whole system, and its target is to find similar users effectively in distributed collaboration filtered recommendation method the inside, and fetches the score data of similar users.Its validity and extensibility be embodied in the distributed hashtable overlay network middle forward node buffer memory a lot of users' score data, by forwarding several times seldom just can find have identical<the numbering, scoring〉similar users of two tuples, and it is returned.Such as a user's local cryptographic Hash is 123456, for certain<the numbering, scoring〉cryptographic Hash of two tuples is 654321, needing for 6 steps could give be responsible for cryptographic Hash to " query messages " route under the worst case be 654321 node, local cryptographic Hash through node in the middle of supposing is respectively 698765,659823,654987,654398,654329.But because the existence of buffering area may be routed to the 2nd node at " query messages ", its local cryptographic Hash is 659823, just successfully inquires an identical<numbering, scoring〉similar users of two tuples, returned.
Step D, after this locality obtains a similar users score data collection, specific as follows:
D1, the relevant similitude of use or cosine similarity method obtain local user's score data collection;
D2, from the similitude that obtains, select affinity constant user;
D3, deviate from method according to this constant user's similitude by average and obtain predicted value,, then recommend it to the user if predicted value surpasses mean value.
Step D can use traditional collaborative filtering method to produce predicted value and recommendation.In order to obtain higher recommendation precision, adopt formula (1) to double the similarity weights of " high associated user ", reduce " low relevant " user's similarity weights.Such as user P is for three similar users A, B, and C has obtained identical similarity weights according to relevant similitude or cosine similarity method, all be 0.2, the identical item but A and P do not mark, B has the identical item of 1 item rating with P, C has the identical item of 5 item ratings with P, be taken as 2.0 according to factor alpha in the experiment test, β is taken as 4.0, and γ is taken as 4, the similarity weights of A are constant, still be 0.2, the similarity weights of B become 0.4, and the similarity weights of C become 0.8.Therefrom as can be seen, the item number of identical scoring is many more, and its similarity just should be high more, therefore increases its similarity weights greatly.Test data of experiment shows can improve the quality of accuracy for predicting and recommendation so really.

Claims (6)

1, a kind of collaborative filtering recommending method of handling super amount user, it is characterized in that, at first the score data distributed earth storage of user for project, be the score data of each user storage oneself for project, obtain the score data of similar users then by the distributed hashtable method for routing, use collaborative filtering method to obtain predicted value in this locality, further obtain recommendation again, simultaneously by in the process that obtains the similar users score data, limiting each project is returned similar users from the distributed hashtable overlay network number, and the influence power of adjustment similar users, make and have multinomial more numbering, the user of identical two tuples of marking, its influence power is also corresponding big more, thereby improves prediction accuracy.
2, the super amount user's of processing according to claim 1 collaborative filtering recommending method is characterized in that, method step is as follows:
A, each user construct a distributed hashtable overlay network at Agent of computer background operation;
B, user agent are hashing onto whole distributed hashtable overlay network to the user for the item numbering that scoring constituted, scoring two tuples of certain project;
C, user agent obtain user similarly for each numbering of user, two tuples of marking from the distributed hashtable overlay network;
D, user agent use collaborative filtering method to obtain predicted value for user's score data of fetching the similar users of coming in this locality, therefrom produce the recommendation to the user.
3, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described steps A is specific as follows:
A1, user activate the local agent program, allow it always at running background;
A2, user agent obtain a local cryptographic Hash by the Hash user name and are used for unique sign oneself;
A3, all user agents construct distributed hash value routing table by cryptographic Hash, thereby interconnected with other agencies, construct the distributed hashtable overlay network.
4, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step B is after having constructed the distributed hashtable overlay network, specific as follows:
B1, user agent obtain its cryptographic Hash K by each numbering of Hash user, two tuples of marking;
B2, for each cryptographic Hash K, the structure carry the PUSH message of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;
B3, receive the PUSH message that other neighbours' routes are come as a user agent, at first user's score data in the message and cryptographic Hash K are placed in the local buffer, if have the cryptographic Hash K more similar neighbours entrained then, just this PUSH message be transmitted to the cryptographic Hash K the most similar neighbours entrained to it with this PUSH message.
5, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step C is specific as follows:
C1, user agent obtain its cryptographic Hash K by each numbering of Hash user, two tuples of marking;
C2, for each cryptographic Hash K, the structure carry the query messages of cryptographic Hash K and it be transmitted to the neighbours the most similar to K;
C3, receive the query messages that other neighbours' routes are come as an agency, whether the local buffer the inside of at first checking oneself has F user's score data that contains cryptographic Hash K, if have, then this F user and score data thereof are returned to the sender of query messages, on the way all agencies of process all this F user's score data is placed in the local buffer, otherwise then continue this query messages is transmitted to the cryptographic Hash K the most similar neighbours entrained to it;
The sender of C4, query messages finally can obtain the score data of the similar users that certain agency returns;
C5, agency compile the score data of all numberings, the similar users that obtains of scoring two tuples, obtain a similar users score data collection in this locality.
6, the super amount user's of processing according to claim 2 collaborative filtering recommending method is characterized in that, described step D is after this locality obtains similar users score data collection, specific as follows:
D1, the relevant similitude of use or cosine similarity method obtain local user's score data collection;
D2, from the similitude that obtains, select N the most similar user;
D3, deviate from method according to this N user's similitude by average and obtain predicted value,, then recommend it to the user if predicted value surpasses certain threshold values.
CN 200310109063 2003-12-04 2003-12-04 Collaborative filtering recommendation approach for dealing with ultra-mass users Expired - Fee Related CN1281027C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200310109063 CN1281027C (en) 2003-12-04 2003-12-04 Collaborative filtering recommendation approach for dealing with ultra-mass users

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200310109063 CN1281027C (en) 2003-12-04 2003-12-04 Collaborative filtering recommendation approach for dealing with ultra-mass users

Publications (2)

Publication Number Publication Date
CN1547351A CN1547351A (en) 2004-11-17
CN1281027C true CN1281027C (en) 2006-10-18

Family

ID=34335005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200310109063 Expired - Fee Related CN1281027C (en) 2003-12-04 2003-12-04 Collaborative filtering recommendation approach for dealing with ultra-mass users

Country Status (1)

Country Link
CN (1) CN1281027C (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7636325B2 (en) * 2004-12-07 2009-12-22 Hewlett-Packard Development Company, L.P. Determining highest workloads for nodes in an overlay network
CN100364296C (en) * 2005-06-24 2008-01-23 清华大学 Routing method of sectional interaction to goal node according to scale value based on optimized diameter network
CN101339593B (en) * 2007-07-04 2012-05-09 联想(北京)有限公司 Software security evaluation system, user capability and confidence level evaluation system and method
CN101242365B (en) * 2008-03-11 2010-06-09 南京邮电大学 Peer network secure routing method based on multi-dimension distributed hash table
CN101369923B (en) * 2008-09-24 2010-12-29 中兴通讯股份有限公司 Method for improving cluster web service performance by using distributed hash table
CN103019860B (en) * 2012-12-05 2015-12-09 北京奇虎科技有限公司 Based on disposal route and the system of collaborative filtering
CN103049486B (en) * 2012-12-05 2015-10-07 北京奇虎科技有限公司 A kind of disposal route of collaborative filtering distance and system
CN103049488B (en) * 2012-12-05 2015-11-25 北京奇虎科技有限公司 A kind of collaborative filtering disposal route and system
CN104008193B (en) * 2014-06-12 2017-04-05 安徽融数信息科技有限责任公司 A kind of information recommendation method based on group of typical user discovery technique
CN106997381B (en) * 2017-03-21 2021-03-09 海信集团有限公司 Method and device for recommending movies to target user
CN107122411B (en) * 2017-03-29 2020-08-14 浙江大学 Collaborative filtering recommendation method based on discrete multi-view Hash

Also Published As

Publication number Publication date
CN1547351A (en) 2004-11-17

Similar Documents

Publication Publication Date Title
Wu et al. Identifying link farm spam pages
CN1281027C (en) Collaborative filtering recommendation approach for dealing with ultra-mass users
US8935274B1 (en) System and method for deriving user expertise based on data propagating in a network environment
JP2005353039A5 (en)
CN101272399A (en) Method for implementing full text retrieval system based on P2P network
CN104021125A (en) Search engine sorting method and system and search engine
US7765204B2 (en) Method of finding candidate sub-queries from longer queries
CN102378407B (en) Object name resolution system and method in internet of things
CN106909626A (en) Improved Decision Tree Algorithm realizes search engine optimization technology
Billah et al. Social network analysis for predicting emerging researchers
Li et al. Routing of XML and XPath queries in data dissemination networks
KR101556714B1 (en) Method, system and computer readable recording medium for providing search results
Nakashe et al. Smart approach to crawl web interfaces using a two stage framework of crawler
Roul et al. An effective approach for web document classification using the concept of association analysis of data mining
Yadav et al. Architecture for parallel crawling and algorithm for change detection in web pages
Zhou et al. Distributed query processing in an ad-hoc semantic web data sharing system
Gupta et al. Distributed popularity indices
Sardhara A flowchart to reduce mutual reinforcement effect on web page ranking based on web strucuture mining
JP2004152209A (en) Web access log analysis method
Zlamaniec et al. A framework for workload-aware views materialisation of semantic databases
Papapetrou et al. On the usage of global document occurrences in peer-to-peer information systems
Ren et al. haps: Supporting effective and efficient full-text p2p search with peer dynamics
de La Robertie et al. Identifying authoritative researchers in digital libraries using external a priori knowledge
Papapetrou Full-text indexing and Information Retrieval in P2P Systems
Zhou et al. Adaptive indexing for content-based search in P2P systems

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20061018

Termination date: 20151204

EXPY Termination of patent right or utility model