CN103019860B - Based on disposal route and the system of collaborative filtering - Google Patents

Based on disposal route and the system of collaborative filtering Download PDF

Info

Publication number
CN103019860B
CN103019860B CN201210518378.1A CN201210518378A CN103019860B CN 103019860 B CN103019860 B CN 103019860B CN 201210518378 A CN201210518378 A CN 201210518378A CN 103019860 B CN103019860 B CN 103019860B
Authority
CN
China
Prior art keywords
matrix
calculation server
component identity
row
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210518378.1A
Other languages
Chinese (zh)
Other versions
CN103019860A (en
Inventor
齐路
何锐邦
唐会军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201210518378.1A priority Critical patent/CN103019860B/en
Publication of CN103019860A publication Critical patent/CN103019860A/en
Application granted granted Critical
Publication of CN103019860B publication Critical patent/CN103019860B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of disposal route based on collaborative filtering and system, relate to field of computer technology.Described system comprises request receiving module and at least two calculation servers; Described request receiver module; Rely on calculation server and confirm module; Component sending/receiving module; Recommended project computing module, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation; Transmit and receive data by calling message passing interface between described each calculation server.The present invention can rapid pin to huge matrix data calculated recommendation project, and the requirement of computing system to hardware can be reduced, hardware cost can be reduced on the whole.

Description

Based on disposal route and the system of collaborative filtering
Technical field
The present invention relates to field of computer technology, be specifically related to a kind of disposal route based on collaborative filtering and system.
Background technology
Information resources on internet exponentially expand and bring so-called " information overload " and " information is isotropic " problem, and namely people are difficult to find oneself interested information, even if having found, are also often mixed with a lot " noise ".Therefore the technology such as the information retrieval of Internet, information filtering and collaborative filtering have been there is.But information retrieval does not have intelligent, the interest of user can not be learnt, especially to the user with particular professional interest, input identical keyword and can only obtain identical result for retrieval.Information filtering can not distinguish quality to the filter result of same subject, and along with the sharp increase of information resources, more effective filtration needs the quality assessment information in conjunction with people.Based on this kind of demand, occurred commending system, commending system is a kind of intelligent proxy system proposed for solving problem of information overload, can automatically recommend out the resource meeting its interest preference or demand from bulk information to user.Along with the universal of internet and develop rapidly, commending system has been widely used in various field, and especially in e-commerce field, commending system obtains increasing investigation and application.At present, nearly all electronic business web site all in various degree employ various forms of commending system, bookstore etc. on such as Amazon, CDNOW, eBay and Dangdang.com.Wherein, collaborative filtering obtains larger success in the application of current commending system.
Collaborative filtering is a class proposed algorithm.Collaborative filtering considers the evaluation information of user.Collaborative filtering analyzes user interest, and in customer group, find similar (interest) user of designated user, comprehensively these similar users are to the evaluation of a certain information, forms system to the prediction of this designated user to the fancy grade of this information.Mainly be divided into the collaborative filtering (Userbased) based on user and (Itembased) two kinds based on commodity.Such as based on the collaborative filtering of user, its basic thought is: if will be the commodity that a user recommends it not have, first calculate the similarity degree of the hobby of other user and this user, the commodity then finding this user not have from some users the most similar to its hobby are recommended.Project-based collaborative filtering principle is similar.
Under internet environment, user and commodity are all mass datas, simultaneously because actual effect has regular hour requirement to algorithm, and be carry out on a calculation server substantially for the calculating of collaborative filtering in prior art, when meeting with super amount user, such as when 1,000,000 order of magnitude, the data of its various matrix are also very huge, Single-Server is adopted to calculate then timeliness too slow, and higher for the requirement of hardware.
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of overcoming the problems referred to above or a kind of disposal system based on collaborative filtering solved the problem at least in part and a kind of disposal route based on collaborative filtering accordingly.
According to one aspect of the present invention, provide a kind of disposal route based on collaborative filtering, comprising:
Receive the request recommending at least one project for main body, start at least two calculation servers according to described request and carry out recommendation computation process, described process comprises:
For at least two calculation servers, the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
Each calculation server participating in calculating confirms according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence;
Each calculation server participating in calculating is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
Each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation;
Wherein, transmit and receive data by calling message passing interface between described each calculation server.
Optionally, described each calculation server participating in calculating confirms according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence, comprising:
Each calculation server participating in calculating obtains component identity in each partitioned matrix data of other all calculation server process;
Each calculation server participating in calculating is according to the partitioned matrix data in the collaborative filtering distance matrix of this locality, main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
According to another aspect of the present invention, provide a kind of disposal system based on collaborative filtering, comprising:
Request receiving module and at least two calculation servers;
Described request receiver module, is suitable for receiving the request recommending at least one project for main body, starts at least two calculation servers according to described request;
In at least two calculation servers, each described calculation server comprises:
Partitioned matrix data acquisition module, is suitable for each calculation server participating in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
Rely on calculation server and confirm module, be suitable for each calculation server participating in calculating and confirm according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence;
Component sending/receiving module, is suitable for each calculation server participating in calculating and is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
Recommended project computing module, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation;
Transmit and receive data by calling message passing interface between described each calculation server.
Optionally, described dependence calculation server confirms that module comprises:
Component identity acquisition module, is suitable for each calculation server participating in calculating and obtains component identity in each partitioned matrix data of other all calculation server process;
First reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in described collaborative filtering distance matrix, and/or main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, also comprise:
Mark sending module, is suitable for each calculation server participating in calculating by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Further, described component sending/receiving module comprises:
First component sending/receiving module, is suitable for each calculation server participating in calculating according to described by the component identity relied on, is sent to by corresponding component and rely on this component each calculation server; And receive the component of each calculation server transmission.
Optionally, described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix and user ID-weight equal value matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user mark-project main body-Term Weight matrix;
Further, the component that described component comprises user ID-project main body-Term Weight matrix enters and user ID-weight equal value matrix component.
Optionally, described first reliance server confirms that module comprises:
Beta pruning module, is suitable for each calculation server participating in calculating and carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Second reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, described beta pruning module comprises:
First beta pruning module, is suitable for, for each dimension of often row or every column matrix data in the partitioned matrix data in user ID similar matrix, the value of each dimension being sorted, at least one dimension retaining every row or often sort forward in row.
Optionally, described second reliance server confirms that module comprises:
First row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is carried out row component identity or row component identity transposition;
First row/column component identity alignment module, be suitable for aliging by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix, or align by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
First retains module, is suitable for the dimension retained for current each row or each row, corresponding row component identity or the row component identity retaining dimension of mark;
First judge module, be suitable for the row component identity according to described mark or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
3rd reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
Optionally, described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user mark-project main body-Term Weight matrix;
Further, described component comprises the component of project-project similar matrix.
Optionally, described reliance server confirms that module comprises:
First Candidate Recommendation collection computing module, is suitable for each calculation server participating in calculating according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
4th reliance server confirms module, be suitable for each calculation server participating in calculating and confirm according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, described 4th reliance server confirms that module comprises:
Second row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of described project-project similar matrix is carried out row component identity or row component identity transposition;
Second row/column component identity alignment module, be suitable for aliging by the result obtained after row component identity transposition with described user ID-project main body-Term Weight matrix column component identity, or align by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
Second judge module, be suitable for the row component identity according to described mark or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
5th reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
The recommendation of what a kind of disposal route based on collaborative filtering according to the present invention can utilize multiple computing node to walk abreast carry out collaborative filtering calculates, solve prior art thus slow for huge matrix data computational valid time, and for the problem that the requirement of hardware is higher, height achieves the demand for needing quick calculated recommendation project, can rapid pin to huge matrix data calculated recommendation project, and the requirement of computing system to hardware can be reduced, the beneficial effect of hardware cost can be reduced on the whole.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows a kind of according to an embodiment of the invention schematic flow sheet of the disposal route embodiment one based on collaborative filtering;
Fig. 2 shows a kind of according to an embodiment of the invention schematic flow sheet of the disposal route embodiment two based on collaborative filtering;
Fig. 3 shows the matrix contrast schematic diagram of Userbased Computing Principle according to an embodiment of the invention;
Fig. 4 shows two calculation server matrix comparative example of Userbased Computing Principle according to an embodiment of the invention;
Fig. 5 shows a kind of according to an embodiment of the invention schematic flow sheet of the disposal route embodiment three based on collaborative filtering;
Fig. 6 shows the Computing Principle schematic diagram of Itembased according to an embodiment of the invention;
Fig. 7 shows a kind of according to an embodiment of the invention structural representation of the disposal system embodiment one based on collaborative filtering;
Fig. 8 shows a kind of according to an embodiment of the invention structural representation of the disposal system embodiment two based on collaborative filtering; And
Fig. 9 shows a kind of according to an embodiment of the invention structural representation of the disposal system embodiment three based on collaborative filtering.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
With reference to Fig. 1, it illustrates the schematic flow sheet of a kind of disposal route embodiment one based on collaborative filtering of the present invention, specifically can comprise:
Step 100, receives the request recommending at least one project for main body, starts at least two calculation servers carry out recommendation computation process according to described request;
In embodiments of the present invention, described main body can user ID in such as network, so for user ID used or original various project in a network, system or user then can ask to recommend certain or certain several project for each user ID, such as the product bought in a network, recommend Related product to user.
At least two calculation servers so carry out recommendation computation process and comprise:
Step 110, at least two calculation servers, the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
In the present invention, for described main body-Term Weight matrix, can be the rating matrix of user ID-project, such as table one:
Table one
In table one, if with product classification for Item, user ID is File1, File2, File3, and it is 70,60,80,90, Item5 not give a mark that the user of so corresponding File1 gives a mark respectively to its used Item1 to Item4; The user of corresponding File2 to its used Item1 to Item3 give a mark respectively be 40,90,50, Item4 not give a mark, Item5 is 70; The user of corresponding File3 to its used Item2 to Item4 give a mark respectively be 70,80,80, Item1 and Item5 do not give a mark;
Collaborative filtering matrix be in table one matrix user for user's similarity matrix, such as calculate Sim (File1, File2), File1=(70,60,80,90,0), File2=(40,90,50,0,70), sim calculates and can be the vectorial cosine angle value of calculating two, also can be other functions, the matrix of the similarity composition between the component being every two row is as table two, or the item similarity matrix that project is corresponding, the similarity namely between every two row.
S(File1,File1) S(File1,File2) S(File1,File3)
S(File2,File1) S(File2,File2) S(File2,File3)
S(File3,File1 S(File3,File2) S(Filel3,File3)
Table two
So when calculating for user's calculated recommendation project, the matrix of table one and table two can be utilized to calculate.
In embodiments of the present invention, for N number of calculation server of concurrent computational system, each server then can obtain the various partitioned matrix data distributing to oneself, such as draws together main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data.
In embodiments of the present invention, the matrix by rows piecemeal of similar table one can be sent to a calculation server by matrix data, also the matrix of similar table one can be carried out transposition and send to calculation server by row piecemeal.
Before this step of the embodiment of the present invention, also comprise:
Each calculation server utilizes the true collaborative filtering matrix of described main body-Term Weight matrix computations.
In the embodiment of the present invention, the calculation server participating in calculating can comprise N number of, and N is more than or equal to 2.
Step 120, each calculation server participating in calculating confirms according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence;
Then each calculation server participating in calculating then confirm that it relies on based on main body-Term Weight matrix, collaborative filtering matrix certain or certain several calculation server in the component of partitioned matrix.
Optionally, described each calculation server participating in calculating confirms according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence, comprising:
Step S121, the calculation server that each participation calculates obtains component identity in each partitioned matrix data of other all calculation server process;
In embodiments of the present invention, each calculation server needs the calculation server confirming that it relies on, so it just needs the line identifier (if when the matrix data of the piecemeal by row that step 110 obtains) of each row component knowing each matrix, or just needs the row of each row component knowing each matrix to identify (if when matrix data by row piecemeal that step 110 obtains).
Therefore, in the present invention, the calculation server of some for each process partitioned matrix mark and respective components can be identified by the server of the source of storage matrix data the calculation server sending to each participation to calculate, also by each calculation server itself, the component identity that oneself processes can be sent to other all calculation servers, each calculation server can call MPI (MessagePassingInterface, message passing interface when sending; A kind of program message passing interface, provides the multilingual function library realizing one series interfaces simultaneously) send.
Step S122, each calculation server participating in calculating is according to the partitioned matrix data in described collaborative filtering distance matrix, and/or main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence;
If based on the situation of userbased, each calculation server participating in calculating is obtaining the partitioned matrix data by aforementioned user's similar matrix of self process, with user ID-project main body-Term Weight partitioning of matrix matrix data, and get in each partitioned matrix data of other all calculation server process after component identity, the component in certain or certain several calculation server that can confirm that it relies on according to this.
If based on Itembased, each calculation server participating in calculating is obtaining the partitioned matrix data by the aforementioned project similar matrix of self process, with user ID-project main body-Term Weight partitioning of matrix matrix data, and get in each partitioned matrix data of other all calculation server process after component identity, the component in certain or certain several calculation server that can confirm that it relies on according to this.
Step 130, each calculation server participating in calculating is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
After the component of each partitioned matrix in certain or certain several calculation server that each calculation server confirms that it relies on, it is then sent to each calculation server relying on this component by the server of corresponding dependence by the component relied on.
Optionally, also comprise:
Step S131, the calculation server that each participation calculates, by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
After each calculation server participating in calculating confirms the component in its calculation server relied on and this calculation server, notify that the calculation server that it relies on needs the vector by current calculation server relies on to be sent in current server.Namely each calculation server participating in calculating is by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on.
Wherein, also transmit and receive data by calling MPI when each calculation server participating in calculating sends and receives component identity.
Further, each calculation server participating in calculating is sent to each calculation server of this component of dependence by described by the component relied on; And the component receiving the transmission of each calculation server comprises:
Step S132, corresponding component by the component identity relied on according to described, send to and relies on each calculation server of this component by the calculation server that each participation calculates; And receive the component of each calculation server transmission;
Each calculation server participating in calculating calls MPI and sends by the component that relies on to the calculation server relying on this component, and receives the component that other servers send.
Step 160, each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
After each calculation server participating in calculating receives the component of needs, then carry out the process such as transposition, arrangement, summation and obtain final recommending data.
With reference to Fig. 2, it illustrates the schematic flow sheet of a kind of disposal route embodiment two based on collaborative filtering of the present invention, specifically can comprise:
Step 200, receives the request recommending at least one project for main body, starts at least two calculation servers carry out recommendation computation process according to described request;
In embodiments of the present invention, described main body can user ID in such as network, so for user ID used or original various project in a network, system or user then can ask to recommend certain or certain several project for each user ID, such as the product bought in a network, recommend Related product to user.
At least two calculation servers so carry out recommendation computation process and comprise:
Step 210, at least two servers, the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise the partitioned matrix data of user ID-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix data, user ID similar matrix;
The embodiment of the present invention, for the concrete parallel procedure of Userbased, for convenience of description computation process of Userbased, first introduces the collaborative filtering recommending computation process of Userbased:
With reference to Fig. 3, it is the compute matrix contrast figure of Userbased.Wherein 201 is user ID-project main body-Term Weight matrix R that user gives a mark to item, 202 is the transposed matrix R ' of R, 203 mean vectors of same item being given a mark for user, i.e. user ID-weight equal value matrix, also be A, 203 is the transposed matrix A ' of A, and 204 is user ID similar matrix S, the similarity also namely between every two users.So
As follows according to the formula that similar matrix is predicted useru and itemi: wherein sim (u, the u ') similarity that is user u and u ', can according to cosine, pearson coefficient scheduling algorithm calculates.
r u , i = r ‾ u + Σ u , ∈ U sim ( u , u , ) ( r u , i - r ‾ u , ) Σ u , ∈ U sim ( u , u , ) ... formula (1)
Its computation process is roughly as follows:
1. obtain a line corresponding to u according to R, find out the item that u does not carry out giving a mark, be i.e. the complete or collected works I of recommended candidate;
2. obtain row corresponding to u ' according to R ', obtain average corresponding to u ' according to A '.For each item in I, and calculating sim (u, u ') (r u ' i-avg (r u'));
3. calculate sim (u, u ') (r for all u ' ∈ U of u according to 2 u ' i-avg (r u')), and sim (u, u '), and sue for peace;
4. according to the average corresponding to u of A, and the result of 3 is substituted into, predicted values all in I can be obtained;
5. select u according to demand to some items the highest in the marking of I, final recommendation item can be obtained.
For above description, so step 210, then each calculation server participating in calculating will obtain the partitioned matrix data of user ID-project main body-Term Weight matrix of distributing to it and carrying out processing and user ID-weight equal value partitioning of matrix data, user ID similar matrix.
Wherein user ID-weight equal value partitioning of matrix data can be calculated by user ID-project main body-Term Weight partitioning of matrix data, also can precalculate and obtain.Wherein user ID similar matrix also can be obtained by user ID-project main body-Term Weight matrix computations.
Illustrate conveniently, the partitioned matrix that the present invention obtains the piecemeal by row in Fig. 3 is preferably set.
Step 220, each calculation server participating in calculating obtains component identity in each partitioned matrix data of other all calculation server process;
In embodiments of the present invention, the line identifier of each row component in each row user ID-project partitioned matrix that each calculation server participating in calculating then obtains in other calculation servers, and the line identifier of each user ID-weight partitioned matrix in other calculation servers.
Step 230, each calculation server participating in calculating carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Owing to there is mass users, in order to reduce the user not high with user's degree of correlation of current line to the impact of counting yield, this step then needs user not high for the degree of correlation to carry out beta pruning.
Optionally, this step comprises:
Step S11, for each dimension of often row or every column matrix data in the partitioned matrix data in user ID similar matrix, sorts the value of each dimension, at least one dimension retaining every row or often sort forward in row.
In such as aforementioned s-matrix, the first behavior respective user u 0similarity component, i.e. Sim (u 0, u 0), Sim (u 0, u 1) ... Sim (u 0, u m), so can retain in this row component the Similarity value of the forward n dimension that sorts, other values are set to sky.
Step 240, each calculation server participating in calculating is according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Then for current calculation server, there is all row marks in its user's similarity partitioned matrix S, but only there is the value of part rows in the component of every a line, so according to aforementioned Computing Principle, need user ID-project main body-Term Weight matrix and user ID-weight equal value matrix transpose, even if the row vector obtained after the line identifier transposition of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is alignd with the row vector of S, then some line identifier calculative is confirmed, then can confirm according to the corresponding relation of this line identifier and calculation server each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, this step comprises:
Step S21, each component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is carried out row component identity or row component identity transposition by the calculation server that each participation calculates;
With reference to Fig. 4, it confirms the calculation server of dependence for the embodiment of the present invention and relies on the example of vector.Be illustrated in figure 4 the schematic diagram of two calculation servers, data are scattered by row, as above illustrate, wherein odd-numbered line is assigned to calculation server N0, and even number line is assigned to calculation server N1.Wherein in calculating, subtract branch in relational matrix S, some data may be given up, make matrix be a sparse matrix.The figure left side is matrix R and vectorial A.Will use transposition R ' and A ' in calculating, because data do not change, be the change in location of data, stores so not extra in calculating data, procession conversion when just fetching data.For convenience of description, R and A is placed on the top of figure here with the form of transposition.
Be described to calculation server with distributed by row in the embodiment of the present invention, distribute to by row process that calculation server carries out calculating and distributed by row similar to the computation process of calculation server, only need transposition, do not described in detail at this.
As in Fig. 4, original user ID-project main body-Term Weight matrix R comprises 5 row, and often capable existence line identifier, i.e. line number, is u 0, u 1, u 2, u 3, u 4; User ID-weight equal value matrix A comprises and also correspondingly comprises 5 row, and often capable existence line identifier, i.e. line number, is u 0, u 1, u 2, u 3, u 4; User's similarity matrix S comprises 5 row, often row existence one row mark, and namely row number, are u 0, u 1, u 2, u 3, u 4;
So, the row component identity u of the R that N1 calculates is obtained at N0 1, u 3after, obtain all row component identity of R, N0 obtains the row component identity u of the A that N1 calculates 1, u 3after, obtain all row component identity of A; So N0 is by the row component identity (u of R 0, u 1, u 2, u 3, u 4) ', carry out transposition and obtain (u 0, u 1, u 2, u 3, u 4); By the row component identity (u of A 0, u 1, u 2, u 3, u 4) ', carry out transposition and obtain (u 0, u 1, u 2, u 3, u 4).
Step S22, aligns by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix, or aligns by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
The row component identity of R and A will be carried out to the row component identity (u of result that transposition obtains and S 0, u 1, u 2, u 3, u 4) alignment.
Step S23, for the dimension that current each row or each row retain, corresponding row component identity or the row component identity retaining dimension of mark;
Step S24, according to row component identity or the row component identity of described mark, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
Step S25, according to the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
For calculation server N0, behavior the first row, the third line and fifth line in the s-matrix of its process, often row all remains (u 0, u 1, u 2,-, u 4), the row vector in so calculative R matrix and A matrix is u 0, u 1, u 2, u 4oK, u is known by aforementioned 1in N1, so N0 relies on the row component u in N1 1.
For calculation server N0, its process s-matrix in behavior the first row, the third line and fifth line, often row all remain (-, u 1, u 2, u 3, u 4), the row vector in so calculative R matrix and A matrix is u 1, u 2, u 3, u 4oK, u is known by aforementioned 2, u 4in N0, so N1 relies on the row component u in N0 2, u 4.
Step 250, each calculation server participating in calculating, by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Then N0 notifies that N1 needs u 1row component amount sends to N0, and N1 notifies that N0 needs u 2, u 4row component sends to N1.
Step 260, corresponding component by the component identity relied on according to described, send to and relies on each calculation server of this component by each calculation server participating in calculating; And receive the component of each calculation server transmission;
As aforementioned N0 sends (u to N1 2, u 4), N1 receives (u 2, u 4); N1 sends (u to N0 1), N0 receives (u 1).
Step 270, each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
Each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation
Then each computing node can calculate the recommended project for each user ui in current computing node according to formula (1).Namely N0 calculates for u 0, u 1, u 3recommended project, N1 calculate for u 2, u 4recommended project.
Wherein, transmit and receive data by calling message passing interface between described each calculation server.
The present embodiment is the preferred embodiment for Userbased situation, and the sequencing of some step can change as the case may be, is not limited at this.
With reference to Fig. 5, it illustrates the schematic flow sheet of a kind of disposal route embodiment three based on collaborative filtering of the present invention, specifically can comprise:
Step 300, receives the request recommending at least one project for main body, starts at least two calculation servers carry out recommendation computation process according to described request;
In embodiments of the present invention, described main body can user ID in such as network, so for user ID used or original various project in a network, system or user then can ask to recommend certain or certain several project for each user ID, such as the product bought in a network, recommend Related product to user.
At least two calculation servers so carry out recommendation computation process and comprise:
Step 310, for N number of calculation server, the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise the partitioned matrix data of user ID-project main body-Term Weight partitioning of matrix matrix data, project-project similar matrix;
With reference to Fig. 6, it illustrates the Computing Principle schematic diagram of embodiment of the present invention Itembased.Wherein Figure 30 1 and 302 is user ID-project main body-Term Weight matrix R, and 303 is project-project similar matrix S.
Its computing formula is as follows:
r u , i = r ‾ u + Σ i , ∈ I sim ( i , i , ) r u , i , Σ i , ∈ I sim ( i , i , ) ... formula (2)
Computation process is as follows:
1. obtain a line corresponding to u according to R, find out the item that u does not carry out giving a mark, be i.e. the complete or collected works I of recommended candidate;
2. for each i in I, obtain row of sim matrix, obtain the item set similar with i, sue for peace according to formula, the marking value that can predict;
3. select u according to demand to some items the highest in the marking of I, final recommendation item can be obtained.
In embodiments of the present invention, project-project similar matrix S can be calculated by user ID-project main body-Term Weight matrix R.
For convenience of description, the embodiment of the present invention is also to carry out piecemeal and be supplied to each calculation server by being about to matrix and calculate, for by row to matrix carry out piecemeal be supplied to process that each calculation server calculates and aforementioned piecemeal by row carry out the Computing Principle of piecemeal and process similar, only need carry out corresponding transposition to calculate, no longer be described in detail in this and subsequent step.
Step 320, each calculation server participating in calculating obtains component identity in each partitioned matrix data of other all calculation server process;
Namely each line identifier in the S partitioned matrix that got of each calculation server sends to other N-1 server.
Step 330, each calculation server participating in calculating is according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
Each calculation server participating in calculating obtains a line corresponding to u according to current R partitioned matrix, finds out the item that u does not carry out giving a mark, i.e. the Candidate Recommendation collection I of recommended candidate.
Step 340, each calculation server participating in calculating confirms according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence;
Then vector corresponding for the line identifier of whole s-matrix can be carried out transposition, identify with R matrix column and align, according to the dimension of Candidate Recommendation collection I corresponding in R matrix, confirm which the row component needing s-matrix, and this row component be in which calculation server.
Optionally, this step comprises:
Step S31, each component identity of described project-project similar matrix is carried out row component identity or row component identity transposition by the calculation server that each participation calculates;
Such as calculation server N0 obtains the u of R 0, u 2, u 4oK, and the i of S 0, i 2, i 4oK, calculation server N1 obtains the u of R 1, u 3the i of row and S 1, i 3oK, after so N0 gets the row component identity of N1, by vector (i corresponding for each row component identity 0, i 1, i 2, i 3, i 4) ' transposition.
Step S32, align by the result obtained after row component identity transposition with described user ID-project main body-Term Weight matrix column component identity, or align by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
The transposition result of calculation server N0 by the row component identity of S and a row alignment of R matrix, namely with each row component (u of R 0, u 1, u 2, u 3, u 4) alignment.
Step S33, according to row component identity or the row component identity of described mark, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
Such as, for the u in N0 0oK, its corresponding recommended candidate collection is (i 2, i 3, i 4), so i 3in N1; For u 2oK, its corresponding recommended candidate collection is (i 2, i 3, i 4), so i 3in N1, for u 4oK, its corresponding recommended candidate collection is (i 0, i 3, i 4), so i 3in N1.
For the u in N1 1oK, its corresponding recommended candidate collection is (i 1, i 3, i 4), so i 4in N0; For u 3oK, its corresponding recommended candidate collection is (i 0, i 3, i 4), so i 0, i 4in N0;
Step S34, according to the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
So N0 relies on the i of N1 3, N1 relies on the i of N0 0, i 4.
Step 350, each calculation server participating in calculating, by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
So N0 notifies that N1 is by i 3row vector sends to N0, and N1 notifies that N0 is by i 0, i 4row vector sends to N1.
Step 360, corresponding component by the component identity relied on according to described, send to and relies on each calculation server of this component by each calculation server participating in calculating; And receive the component of each calculation server transmission;
So N1 is by i 3row vector sends to N0, and N0 is by i 0, i 4row vector sends to N1.
Step 370, each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
Then according to the component that formula (2) carries out weight partitioned matrix data according to this locality, collaborative filtering partitioned matrix data and receives, calculate for each u irecommended project.
The present embodiment is the preferred embodiment for Itembased situation, and the sequencing of some step can change as the case may be, is not limited at this.
With reference to Fig. 7, it illustrates the structural representation of a kind of disposal system embodiment one based on collaborative filtering of the present invention, specifically can comprise:
Request receiving module 700 and at least two calculation servers;
Described request receiver module 700, is suitable for receiving the request recommending at least one project for main body, starts at least two calculation servers according to described request;
In at least two calculation servers, each described calculation server comprises:
Partitioned matrix data acquisition module 710, is suitable for at least two calculation servers, and the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
Rely on calculation server and confirm module 720, be suitable for each calculation server participating in calculating and confirm according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence;
Component sending/receiving module 730, is suitable for each calculation server participating in calculating and is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
Recommended project computing module 740, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
Optionally, described dependence calculation server confirms that module comprises:
Component identity acquisition module, is suitable for each calculation server participating in calculating and obtains component identity in each partitioned matrix data of other all calculation server process;
First reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in described collaborative filtering distance matrix, and/or main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, also comprise:
Mark sending module, is suitable for each calculation server participating in calculating by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Further, described component sending/receiving module comprises:
First component sending/receiving module, is suitable for each calculation server participating in calculating according to described by the component identity relied on, is sent to by corresponding component and rely on this component each calculation server; And receive the component of each calculation server transmission.
Optionally, described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix and user ID-weight equal value matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user mark-project main body-Term Weight matrix;
Further, the component that described component comprises user ID-project main body-Term Weight matrix enters and user ID-weight equal value matrix component.
Optionally, described first reliance server confirms that module comprises:
Beta pruning module, is suitable for each calculation server participating in calculating and carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Second reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, described beta pruning module comprises:
First beta pruning module, is suitable for, for each dimension of often row or every column matrix data in the partitioned matrix data in user ID similar matrix, the value of each dimension being sorted, at least one dimension retaining every row or often sort forward in row.
Optionally, described second reliance server confirms that module comprises:
First row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is carried out row component identity or row component identity transposition;
First row/column component identity alignment module, be suitable for aliging by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix, or align by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
First retains module, is suitable for the dimension retained for current each row or each row, corresponding row component identity or the row component identity retaining dimension of mark;
First judge module, be suitable for the row component identity according to described mark or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
3rd reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
Optionally, described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user mark-project main body-Term Weight matrix;
Further, described component comprises the component of project-project similar matrix.
Optionally, described reliance server confirms that module comprises:
First Candidate Recommendation collection computing module, is suitable for each calculation server participating in calculating according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
4th reliance server confirms module, be suitable for each calculation server participating in calculating and confirm according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
Optionally, described 4th reliance server confirms that module comprises:
Second row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of described project-project similar matrix is carried out row component identity or row component identity transposition;
Second row/column component identity alignment module, be suitable for aliging by the result obtained after row component identity transposition with described user ID-project main body-Term Weight matrix column component identity, or align by the result obtained after row component identity transposition with the row component identity of described association user ID similar matrix;
Second judge module, be suitable for the row component identity according to described mark or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
5th reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
Optionally, transmit and receive data by calling message passing interface between described each calculation server.
With reference to Fig. 8, show a kind of according to an embodiment of the invention structural representation of the disposal system embodiment two based on collaborative filtering, specifically can comprise:
Request receiving module 800 and at least two calculation servers;
Described request receiver module 800, is suitable for receiving the request recommending at least one project for main body, starts at least two calculation servers according to described request;
In at least two calculation servers, each described calculation server comprises:
First partitioned matrix data acquisition module 810, is suitable for each calculation server participating in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise the partitioned matrix data of user ID-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix data, user ID similar matrix;
First component identity acquisition module 820, is suitable for each calculation server participating in calculating and obtains component identity in each partitioned matrix data of other all calculation server process;
Beta pruning module 830, is suitable for each calculation server participating in calculating and carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Second reliance server confirms module 840, be suitable for each calculation server participating in calculating according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence;
First mark sending module 850, is suitable for each calculation server participating in calculating by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
First component sending/receiving module 860, is suitable for each calculation server participating in calculating according to described by the component identity relied on, is sent to by corresponding component and rely on this component each calculation server; And receive the component of each calculation server transmission;
First recommended project computing module 870, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
With reference to Fig. 9, show a kind of according to an embodiment of the invention structural representation of the disposal system embodiment three based on collaborative filtering, specifically can comprise:
Request receiving module 900 and at least two calculation servers;
Described request receiver module 900, is suitable for receiving the request recommending at least one project for main body, starts at least two calculation servers according to described request;
In at least two calculation servers, each described calculation server comprises:
Second partitioned matrix data acquisition module 910, is suitable for each calculation server participating in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise the partitioned matrix data of user ID-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix data, user ID similar matrix;
Second component identifier acquisition module 920, is suitable for each calculation server participating in calculating and obtains component identity in each partitioned matrix data of other all calculation server process;
First Candidate Recommendation collection computing module 930, is suitable for each calculation server participating in calculating according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
4th reliance server confirms module 940, be suitable for each calculation server participating in calculating and confirm according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence;
Second mark sending module 950, is suitable for each calculation server participating in calculating by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Second component sending/receiving module 960, is suitable for each calculation server participating in calculating according to described by the component identity relied on, is sent to by corresponding component and rely on this component each calculation server; And receive the component of each calculation server transmission;
Second recommended project computing module 970, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation.
Intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with display at this algorithm provided.Various general-purpose system also can with use based on together with this teaching.According to description above, the structure constructed required by this type systematic is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in the description of single embodiment, figure or alignment sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions based on the some or all parts in the disposal system of collaborative filtering that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

Claims (20)

1., based on a disposal route for collaborative filtering, comprising:
Receive the request recommending at least one project for main body, start at least two calculation servers according to described request and carry out recommendation computation process, described process comprises:
For at least two calculation servers, the calculation server that each participation calculates obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
Each calculation server participating in calculating is according to the main body-Term Weight matrix of this locality and collaborative filtering matrix, component identity with the main body-Term Weight matrix component of other calculation servers, confirm each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence; Wherein, described component identity comprises row component identity and row component identity;
Each calculation server participating in calculating is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
Each calculation server participating in calculating is for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation;
Wherein, transmit and receive data by calling message passing interface between described each calculation server.
2. the method for claim 1, described each calculation server participating in calculating confirms according to described main body-Term Weight matrix and collaborative filtering matrix each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence, comprising:
Each calculation server participating in calculating obtains component identity in each partitioned matrix data of other all calculation server process;
Each calculation server participating in calculating is according to the partitioned matrix data in the collaborative filtering distance matrix of this locality, main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
3. method as claimed in claim 2, also comprises:
Each calculation server participating in calculating, by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Further, each calculation server participating in calculating is sent to each calculation server of this component of dependence by described by the component relied on; And the component receiving the transmission of each calculation server comprises:
Described each calculation server by the component identity relied on, relies on this component each calculation server by being sent to by the corresponding component of component identity relied on according to described; And receive the component of each calculation server transmission.
4. method as claimed in claim 3,
Described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix and user ID-weight equal value matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user mark-project main body-Term Weight matrix;
Further, the component that described component comprises user ID-project main body-Term Weight matrix enters and user ID-weight equal value matrix component.
5. method as claimed in claim 4, described each calculation server participating in calculating is according to the partitioned matrix data in the collaborative filtering distance matrix of this locality, main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence comprises:
Each calculation server participating in calculating carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Each calculation server participating in calculating is according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
6. method as claimed in claim 5, described each calculation server participating in calculating carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix and comprises:
For each dimension of often row or every column matrix data in the partitioned matrix data in user ID similar matrix, the value of each dimension is sorted, at least one dimension retaining every row or often sort forward in row.
7. method as claimed in claim 6, described each calculation server participating in calculating is according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence comprises:
Each component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is carried out row component identity or row component identity transposition by each calculation server participating in calculating;
Align by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix, or align by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix;
For the dimension that current each row or each row retain, corresponding row component identity or the row component identity retaining dimension of mark;
According to row component identity or the row component identity of described mark, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
According to the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
8. method as claimed in claim 3, described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user mark-project main body-Term Weight matrix;
Further, described component comprises the component of project-project similar matrix.
9. method as claimed in claim 8, described each calculation server participating in calculating is according to the partitioned matrix data in the collaborative filtering distance matrix of this locality, main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence comprises:
Each calculation server participating in calculating is according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
Each calculation server participating in calculating confirms according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
10. method as claimed in claim 9, described each calculation server participating in calculating confirms according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence comprises:
Each component identity of described project-project similar matrix is carried out row component identity or row component identity transposition by each calculation server participating in calculating;
Align by the result obtained after row component identity transposition with described user ID-project main body-Term Weight matrix column component identity, or align by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix;
According to row component identity or the row component identity of described Candidate Recommendation collection, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
According to the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
11. 1 kinds, based on the disposal system of collaborative filtering, comprising:
Request receiving module and at least two calculation servers;
Described request receiver module, is suitable for receiving the request recommending at least one project for main body, starts at least two calculation servers according to described request;
In at least two calculation servers, each described calculation server comprises:
Partitioned matrix data acquisition module, is suitable for each calculation server participating in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise main body-Term Weight partitioning of matrix matrix data, collaborative filtering partitioning of matrix matrix data;
Rely on calculation server and confirm module, be suitable for each calculation server participating in calculating according to the main body-Term Weight matrix of this locality and collaborative filtering matrix, with the component identity of the main body-Term Weight matrix component of other calculation servers, confirm each server that current calculation server relies on, and the component of each partitioned matrix data in each calculation server of described dependence; Wherein, described component identity comprises row component identity and row component identity;
Component sending/receiving module, is suitable for each calculation server participating in calculating and is sent to each calculation server of this component of dependence by described by the component relied on; And receive the component of each calculation server transmission;
Recommended project computing module, be suitable for each calculation server participating in calculating for each main body in the main body-Term Weight partitioning of matrix matrix data of this locality, main body-Term Weight partitioning of matrix the matrix data of utilization this locality, the corresponding relation between local collaborative filtering partitioning of matrix matrix data and the component received, give described main body by least one project recommendation;
Transmit and receive data by calling message passing interface between described each calculation server.
12. systems as claimed in claim 11, described dependence calculation server confirms that module comprises:
Component identity acquisition module, is suitable for each calculation server participating in calculating and obtains component identity in each partitioned matrix data of other all calculation server process;
First reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in described collaborative filtering distance matrix, and/or main body-Term Weight partitioning of matrix matrix data, with component identity in each partitioned matrix data of other all calculation server process, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
13. systems as claimed in claim 12, also comprise:
Mark sending module, is suitable for each calculation server participating in calculating by the component identity of each partitioned matrix data in relied on each calculation server, sends to each calculation server that current calculation server relies on;
Further, described component sending/receiving module comprises:
First component sending/receiving module, is suitable for each calculation server participating in calculating according to described by the component identity relied on, and will be sent to each calculation server of this component of dependence by the corresponding component of component identity relied on; And receive the component of each calculation server transmission.
14. systems as claimed in claim 13,
Described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix and user ID-weight equal value matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user mark-project main body-Term Weight matrix;
Further, the component that described component comprises user ID-project main body-Term Weight matrix enters and user ID-weight equal value matrix component.
15. systems as claimed in claim 14, described first reliance server confirms that module comprises:
Beta pruning module, is suitable for each calculation server participating in calculating and carries out beta pruning calculating for the partitioned matrix data in described user ID similar matrix;
Second reliance server confirms module, be suitable for each calculation server participating in calculating according to the partitioned matrix data in the described user ID similar matrix after described beta pruning, and component identity in each partitioned matrix data of other all calculation server process described, confirm each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
16. systems as claimed in claim 15, described beta pruning module comprises:
First beta pruning module, is suitable for, for each dimension of often row or every column matrix data in the partitioned matrix data in user ID similar matrix, the value of each dimension being sorted, at least one dimension retaining every row or often sort forward in row.
17. systems as claimed in claim 16, described second reliance server confirms that module comprises:
First row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix is carried out row component identity or row component identity transposition;
First row/column component identity alignment module, be suitable for aliging by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix, or align by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix;
First retains module, is suitable for the dimension retained for current each row or each row, corresponding row component identity or the row component identity retaining dimension of mark;
First judge module, be suitable for the row component identity according to described mark or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
3rd reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
18. systems as claimed in claim 13, is characterized in that,
Described main body-Term Weight matrix comprises: user ID-project main body-Term Weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user mark-project main body-Term Weight matrix;
Further, described component comprises the component of project-project similar matrix.
19. systems as claimed in claim 18, described reliance server confirms that module comprises:
First Candidate Recommendation collection computing module, is suitable for each calculation server participating in calculating according to user ID-project main body-Term Weight matrix computations Candidate Recommendation collection;
4th reliance server confirms module, be suitable for each calculation server participating in calculating and confirm according to the partitioned matrix data of described Candidate Recommendation collection, project-project similar matrix and user ID-project main body-Term Weight matrix each calculation server that current calculation server relies on, and the component identity of each partitioned matrix data in each calculation server of described dependence.
20. systems as claimed in claim 19, described 4th reliance server confirms that module comprises:
Second row/column component identity transpose modules, is suitable for each calculation server participating in calculating and each component identity of described project-project similar matrix is carried out row component identity or row component identity transposition;
Second row/column component identity alignment module, be suitable for aliging by the result obtained after row component identity transposition with described user ID-project main body-Term Weight matrix column component identity, or align by the row component identity of the result obtained after row component identity transposition with described user ID similar matrix;
Second judge module, be suitable for the row component identity according to described Candidate Recommendation collection or row component identity, row component identity in the user ID obtained with this locality-project main body-Term Weight matrix and user ID-weight equal value partitioning of matrix matrix data or row component identity compare, and judge local non-existent row component identity or row component identity;
5th reliance server confirms module, be suitable for the calculation server belonging to the non-existent row component identity in this locality or row component identity, confirm each calculation server that current server relies on, and the component identity of user ID-project main body-Term Weight matrix and user ID-weight equal value matrix in each calculation server relied on.
CN201210518378.1A 2012-12-05 2012-12-05 Based on disposal route and the system of collaborative filtering Expired - Fee Related CN103019860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210518378.1A CN103019860B (en) 2012-12-05 2012-12-05 Based on disposal route and the system of collaborative filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210518378.1A CN103019860B (en) 2012-12-05 2012-12-05 Based on disposal route and the system of collaborative filtering

Publications (2)

Publication Number Publication Date
CN103019860A CN103019860A (en) 2013-04-03
CN103019860B true CN103019860B (en) 2015-12-09

Family

ID=47968490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210518378.1A Expired - Fee Related CN103019860B (en) 2012-12-05 2012-12-05 Based on disposal route and the system of collaborative filtering

Country Status (1)

Country Link
CN (1) CN103019860B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049488B (en) * 2012-12-05 2015-11-25 北京奇虎科技有限公司 A kind of collaborative filtering disposal route and system
CN103309967B (en) * 2013-06-05 2016-10-26 清华大学 Collaborative filtering method based on similarity transmission and system
CN103888852B (en) * 2014-03-24 2017-05-31 清华大学 For the video recommendation method and device of social television
CN108270606B (en) * 2016-12-31 2021-06-04 中国移动通信集团湖北有限公司 Service quality prediction method and service quality prediction device
CN110659423A (en) * 2019-09-19 2020-01-07 辽宁工程技术大学 School side learning material recommendation method based on collaborative filtering

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547351A (en) * 2003-12-04 2004-11-17 上海交通大学 Collaborative filtering recommendation approach for dealing with ultra-mass users
CN102346751A (en) * 2010-08-03 2012-02-08 阿里巴巴集团控股有限公司 Information transmitting method and equipment
CN102385586A (en) * 2010-08-27 2012-03-21 日电(中国)有限公司 Multiparty cooperative filtering method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2447868A (en) * 2007-03-29 2008-10-01 Motorola Inc A distributed content item recommendation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547351A (en) * 2003-12-04 2004-11-17 上海交通大学 Collaborative filtering recommendation approach for dealing with ultra-mass users
CN102346751A (en) * 2010-08-03 2012-02-08 阿里巴巴集团控股有限公司 Information transmitting method and equipment
CN102385586A (en) * 2010-08-27 2012-03-21 日电(中国)有限公司 Multiparty cooperative filtering method and system

Also Published As

Publication number Publication date
CN103019860A (en) 2013-04-03

Similar Documents

Publication Publication Date Title
CN103019860B (en) Based on disposal route and the system of collaborative filtering
Petricek et al. The web structure of e-government-developing a methodology for quantitative evaluation
US8655949B2 (en) Correlated information recommendation
CN1716259B (en) Method and system for ranking objects based on intra-type and inter-type relationships
CN102663064B (en) A kind of disposal route of favorites data and device
CN102043862B (en) Directional web data extraction method
CN104268664A (en) Method and device for recommending carpooling route
CN105247507A (en) Influence score of a brand
CN103049488B (en) A kind of collaborative filtering disposal route and system
CN105740380A (en) Data fusion method and system
US8433758B2 (en) Method and system for user information processing and resource recommendation in a network environment
CN104346446A (en) Paper associated information recommendation method and device based on mapping knowledge domain
CN105095474A (en) Method and device for establishing recommendation relation between searching terms and application data
CN104462327A (en) Computing method, search processing method, computing device and search processing device for sentence similarity
CN102521341A (en) Web-relevance based query classification
CN110795471B (en) Data matching method and device, computer readable storage medium and electronic equipment
CN103049486B (en) A kind of disposal route of collaborative filtering distance and system
CN103530337B (en) Identify the device and method of Invalid parameter in uniform resource position mark URL
Li et al. Cross-domain recommendation via coupled factorization machines
CN106326270A (en) Data interaction processing method, device and system
CN104462556A (en) Method and device for recommending question and answer page related questions
CN107423382A (en) network crawling method and device
CN102789615A (en) Book information correlation recommendation method, server and system
CN110544140A (en) method and device for processing browsing data
CN106104618A (en) A kind of method setting up interactive relation and interactive device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220714

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20151209