Summary of the invention
In view of the above problems, the present invention has been proposed in order to a kind of a kind of disposal system and corresponding a kind of disposal route based on collaborative filtering based on collaborative filtering that overcomes the problems referred to above or address the above problem at least in part is provided.
According to one aspect of the present invention, a kind of disposal route based on collaborative filtering is provided, comprising:
Reception is recommended the request of at least one project for main body, starts at least two calculation servers according to described request and recommends computation process, and described process comprises:
For at least two calculation servers, each calculation server that participates in calculating obtains various partitioned matrix data; Described each partitioned matrix data comprise partitioned matrix data, the collaborative filtering partitioning of matrix matrix data of main body-project weight matrix;
Each calculation server that participates in calculating is confirmed each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence;
Each calculation server that participates in calculating sends to the described component that is relied on and relies on each calculation server of this component; And receive the component that each calculation server sends;
Each calculation server that participate in to calculate is for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation;
Wherein, transmit and receive data by calling the message passing interface between described each calculation server.
Optionally, described each calculation server that participates in calculating is confirmed each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence, comprising:
Each calculation server that participate in to calculate obtains minute quantitative character in each partitioned matrix data that other all calculation servers process;
Each calculation server that participates in calculating is according to the partitioned matrix data in the collaborative filtering distance matrix of this locality, the partitioned matrix data of main body-project weight matrix, minute quantitative character in each partitioned matrix data of processing with other all calculation servers, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
According to another aspect of the present invention, a kind of disposal system based on collaborative filtering is provided, comprising:
Request receiving module and at least two calculation servers;
The described request receiver module is suitable for receiving the request of recommending at least one project for main body, starts at least two calculation servers according to described request;
Each described calculation server comprises at least two calculation servers:
The partitioned matrix data acquisition module is suitable for each calculation server that participates in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise partitioned matrix data, the collaborative filtering partitioning of matrix matrix data of main body-project weight matrix;
Rely on calculation server and confirm module, be suitable for each calculation server that participates in calculating and confirm each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence;
The component sending/receiving module is suitable for each calculation server that participates in calculating the described component that is relied on is sent to each calculation server of this component of dependence; And receive the component that each calculation server sends;
Recommended project computing module, be suitable for each calculation server that participate in to calculate for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation;
Transmit and receive data by calling the message passing interface between described each calculation server.
Optionally, described dependence calculation server confirms that module comprises:
The component identifier acquisition module is suitable for each calculation server that participate in to calculate and obtains minute quantitative character in each partitioned matrix data that other all calculation servers process;
The first reliance server is confirmed module, be suitable for each calculation server that participates in calculating according to the partitioned matrix data in the described collaborative filtering distance matrix, and/or the partitioned matrix data of main body-project weight matrix, minute quantitative character in each partitioned matrix data of processing with other all calculation servers, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, also comprise:
The sign sending module is suitable for each calculation server that participates in calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
Further, described component sending/receiving module comprises:
The first component sending/receiving module is suitable for each calculation server that participates in calculating according to the described minute quantitative character that is relied on, and corresponding component is sent to rely on each calculation server of this component; And receive the component that each calculation server sends.
Optionally, described main body-project weight matrix comprises: user ID-project main body-project weight matrix and user ID-weight Mean Matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user sign-project main body-project weight matrix;
Further, described component comprises that the component of user ID-project main body-project weight matrix advances and user ID-weight Mean Matrix component.
Optionally, described the first reliance server confirms that module comprises:
The beta pruning module is suitable for each calculation server that participates in calculating and carries out beta pruning calculating for the partitioned matrix data in the described user ID similar matrix;
The second reliance server is confirmed module, be suitable for the partitioned matrix data in the described user ID similar matrix of each calculation server that participate in to calculate after according to described beta pruning, and minute quantitative character in each partitioned matrix data of described other all calculation servers processing, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, described beta pruning module comprises:
The first beta pruning module is suitable for each dimension for every row or every column matrix data in the partitioned matrix data in the user ID similar matrix, and the value of each dimension is sorted, and keeps at least one forward dimension of ordering in every row or the every row.
Optionally, described the second reliance server confirms that module comprises:
The first row/row minute quantitative character transposition module is suitable for each calculation server that participates in calculating with the capable minute quantitative character of each minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix or is listed as a minute quantitative character transposition;
The first row/row minute quantitative character alignment module, be suitable for and divide quantitative character to align by the result who obtains behind the row minute quantitative character transposition with the row of described user ID similar matrix, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
First keeps module, is suitable for the dimension for current each row or each row reservation, and the corresponding row that keeps dimension of mark divides quantitative character or row minute quantitative character;
The first judge module, be suitable for dividing quantitative character or row minute quantitative character according to the row of described mark, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
The 3rd reliance server is confirmed module, be suitable for dividing calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
Optionally, described main body-project weight matrix comprises: user ID-project main body-project weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user sign-project main body-project weight matrix;
Further, described component comprises the component of project-project similar matrix.
Optionally, described reliance server confirms that module comprises:
The first candidate recommends to collect computing module, is suitable for each calculation server that participates in calculating and recommends collection according to user ID-project main body-project weight matrix calculated candidate;
The 4th reliance server is confirmed module, be suitable for each calculation server that participate in to calculate according to described candidate recommend to collect, partitioned matrix data and the user ID-project main body-project weight matrix of project-project similar matrix confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, described the 4th reliance server confirms that module comprises:
Second row/row minute quantitative character transposition module is suitable for each calculation server that participates in calculating with the capable minute quantitative character of each minute quantitative character of described project-project similar matrix or is listed as a minute quantitative character transposition;
Second row/row minute quantitative character alignment module, be suitable for and divide quantitative character to align by the result who obtains behind the row minute quantitative character transposition with the row of described user ID-project main body-project weight matrix, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
The second judge module, be suitable for dividing quantitative character or row minute quantitative character according to the row of described mark, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
The 5th reliance server is confirmed module, be suitable for dividing calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
A kind of disposal route based on collaborative filtering according to the present invention can utilize the parallel recommendation of carrying out collaborative filtering of a plurality of computing nodes to calculate, it is slow for huge matrix data computational valid time to have solved thus prior art, and for the higher problem of the requirement of hardware, height has been obtained the demand for the quick calculated recommendation project of needs, but rapid pin is to huge matrix data calculated recommendation project, and can reduce computing system to the requirement of hardware, can reduce on the whole the beneficial effect of hardware cost.
Above-mentioned explanation only is the general introduction of technical solution of the present invention, for can clearer understanding technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in the accompanying drawing, yet should be appreciated that and to realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to understand the disclosure more thoroughly that these embodiment are provided, and can with the scope of the present disclosure complete convey to those skilled in the art.
With reference to Fig. 1, the schematic flow sheet that it shows a kind of disposal route embodiment one based on collaborative filtering of the present invention specifically can comprise:
Step 100 receives and recommends the request of at least one project for main body, starts at least two calculation servers according to described request and recommends computation process;
In embodiments of the present invention, described main body can be such as the user ID in the network, so for user ID used or original various project in network, system or user then can ask to recommend certain or certain several projects for each user ID, such as for the product of in network, buying, recommend Related product to the user.
So described at least two calculation servers recommend computation process to comprise:
Step 110, at least two calculation servers, each calculation server that participates in calculating obtains various partitioned matrix data; Described each partitioned matrix data comprise partitioned matrix data, the collaborative filtering partitioning of matrix matrix data of main body-project weight matrix;
In the present invention, for described main body-project weight matrix, can be the rating matrix of user ID-project, such as table one:
Table one
In the table one, if take the product classification as Item, user ID is File1, File2, File3, and it is that 70,60,80,90, Item5 does not give a mark that the user of so corresponding File1 gives a mark respectively to its used Item1 to Item4; It is 40,90,50 that the user of corresponding File2 gives a mark respectively to its used Item1 to Item3, and Item4 does not give a mark, Item5 is 70; It is 70,80,80 that the user of corresponding File3 gives a mark respectively to its used Item2 to Item4, and Item1 and Item5 do not give a mark;
The collaborative filtering matrix be in table one matrix user for user's similarity matrix, such as calculating Sim (File1, File2), File1=(70,60,80,90,0), File2=(40,90,50,0,70), sim calculate and can be the cosine angle value of calculating two vectors, also can be other functions, be matrix such as table two that the similarity between the component of per two row forms, perhaps project similarity matrix corresponding to project, i.e. similarity between per two row.
S(File1,File1) |
S(File1,File2) |
S(File1,File3) |
S(File2,File1) |
S(File2,File2) |
S(File2,File3) |
S(File3,File1 |
S(File3,File2) |
S(Filel3,File3) |
Table two
When calculating for user's calculated recommendation project, can utilize the matrix of table one and table two to calculate so.
In embodiments of the present invention, N calculation server for concurrent computational system, each server then can obtain the various partitioned matrix data of distributing to oneself, such as the partitioned matrix data of drawing together main body-project weight matrix, collaborative filtering partitioning of matrix matrix data.
In embodiments of the present invention, matrix data can send to a calculation server by the row piecemeal with the matrix of similar table one, also the matrix of similar table one can be carried out transposition and send to calculation server by the row piecemeal.
Before this step of the embodiment of the invention, also comprise:
Each calculation server utilizes described main body-project weight matrix to calculate true collaborative filtering matrix.
In the embodiment of the invention, the calculation server that participates in calculating can comprise N, and N is more than or equal to 2.
Step 120, each calculation server that participates in calculating is confirmed each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence;
Then each calculation server that participate in to calculate is then confirmed the component of the partitioned matrix in certain or certain several calculation servers of its dependence based on main body-project weight matrix, collaborative filtering matrix.
Optionally, described each calculation server that participates in calculating is confirmed each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence, comprising:
Step S121, each calculation server that participate in to calculate obtain minute quantitative character in each partitioned matrix data that other all calculation servers process;
In embodiments of the present invention, each calculation server need to be confirmed the calculation server that it relies on, it just need to know the line identifier (if in situation of matrix data by the row piecemeal that step 110 is obtained) of each row component of each matrix so, just need to know perhaps that the row of each row component of each matrix identify (if in situation by the matrix data of row piecemeal that step 110 is obtained).
Therefore, in the present invention, can each calculation server sign and respective components sign of processing some partitioned matrix be sent to the calculation server that each participates in calculating by the server of storage source matrix data, also can send to other all calculation servers by minute quantitative character that each calculation server itself is processed oneself, each calculation server can call MPI (Message Passing Interface, message passing interface when sending; A kind of program message passing interface provides the multilingual function library that realizes the one series interfaces simultaneously) send.
Step S122, each calculation server that participates in calculating is according to the partitioned matrix data in the described collaborative filtering distance matrix, and/or the partitioned matrix data of main body-project weight matrix, minute quantitative character in each partitioned matrix data of processing with other all calculation servers, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence;
If the situation based on user based, each calculation server that participates in calculating is in the partitioned matrix data that obtain by aforementioned user's similar matrix of self processing, and the partitioned matrix data of user ID-project main body-project weight matrix, and get access in each partitioned matrix data that other all calculation servers process behind minute quantitative character, can confirm component in certain or certain several calculation servers of its dependence according to this.
If based on Item based, each calculation server that participates in calculating is in the partitioned matrix data that obtain by the aforementioned project similar matrix of self processing, and the partitioned matrix data of user ID-project main body-project weight matrix, and get access in each partitioned matrix data that other all calculation servers process behind minute quantitative character, can confirm component in certain or certain several calculation servers of its dependence according to this.
Step 130, each calculation server that participates in calculating sends to the described component that is relied on and relies on each calculation server of this component; And receive the component that each calculation server sends;
After each calculation server was confirmed the component of each partitioned matrix in certain or certain several calculation servers of its dependence, the server of corresponding dependence then sent to its component that is relied on each calculation server that relies on this component.
Optionally, also comprise:
Step S131, each participates in the calculation server of calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
After each calculation server that participates in calculating is confirmed the calculation server and the component in this calculation server of its dependence, notify the calculation server of its dependence the vector that current calculation server relies on need to be sent in the current server.Be that each participates in the calculation server of calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, send to each calculation server that current calculation server relies on.
Also transmit and receive data by calling MPI when wherein, each calculation server sending and receiving that participates in calculating divides quantitative character.
Further, each calculation server that participates in calculating sends to the described component that is relied on and relies on each calculation server of this component; And the component that receives each calculation server transmission comprises:
Step S132, each calculation server that participates in calculating sends to each calculation server of this component of dependence according to the described minute quantitative character that is relied on corresponding component; And receive the component that each calculation server sends;
Each calculation server that participate in to calculate calls MPI and sends the component that relied on to the calculation server that relies on this component, and receives the component that other servers send.
Step 160, each calculation server that participate in to calculate is for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
After each calculation server that participates in calculating receives the component that needs, then carry out transposition, arrangement, summation etc. and process the final recommending data of acquisition.
With reference to Fig. 2, the schematic flow sheet that it shows a kind of disposal route embodiment two based on collaborative filtering of the present invention specifically can comprise:
Step 200 receives and recommends the request of at least one project for main body, starts at least two calculation servers according to described request and recommends computation process;
In embodiments of the present invention, described main body can be such as the user ID in the network, so for user ID used or original various project in network, system or user then can ask to recommend certain or certain several projects for each user ID, such as for the product of in network, buying, recommend Related product to the user.
So described at least two calculation servers recommend computation process to comprise:
Step 210, at least two servers, each calculation server that participates in calculating obtains various partitioned matrix data; Described each partitioned matrix data comprise the block data of user ID-project main body-project weight matrix and user ID-weight Mean Matrix, the partitioned matrix data of user ID similar matrix;
The embodiment of the invention is for the concrete parallel procedure of User based, the computation process of User based for convenience of description, and the collaborative filtering recommending computation process of paper User based:
With reference to Fig. 3, it is the compute matrix contrast figure of User based.201 user ID of item being given a mark for user-project main body-project weight matrix R wherein, 202 is the transposed matrix R ' of R, 203 mean vectors of same item being given a mark for user, it is user ID-weight Mean Matrix, also be A, 203 is the transposed matrix A ' of A, and 204 is user ID similar matrix S, also is the similarity between per two users.So
As follows to the formula that user u and item i predict according to similar matrix: wherein sim (u, u ') is the similarity of user u and u ', can be according to cosine, and pearson coefficient scheduling algorithm calculates.
... formula (1)
Its computation process is roughly as follows:
1. obtain delegation corresponding to u according to R, find out the item that u does not give a mark, be i.e. the complete or collected works I of recommended candidate;
2. obtain row corresponding to u ' according to R ', obtain average corresponding to u ' according to A '.For each item among the I, calculate sim (u, u ') (r
U ' i-avg (r
u'));
3. according to 2 sim (u, u ') (r that calculate for all u ' ∈ U of u
U ' i-avg (r
u')), and sim (u, u '), and summation;
According to A to average corresponding to u, and with 3 as a result substitution, can obtain predicted values all among the I;
5. select according to demand some the highest in the marking of u to I, can obtain final recommendation item.
For above description, step 210 so, then each calculation server that participates in calculating will obtain the block data of distributing to its user ID of processing-project main body-project weight matrix and user ID-weight Mean Matrix, the partitioned matrix data of user ID similar matrix.
Wherein the block data of user ID-weight Mean Matrix can be calculated by the block data of user ID-project main body-project weight matrix, also can calculate in advance.Wherein the user ID similar matrix also can be calculated by user ID-project main body-project weight matrix.
For explanation after convenient, the present invention preferably is set obtains the partitioned matrix by the row piecemeal among Fig. 3.
Step 220, each calculation server that participate in to calculate are obtained minute quantitative character in each partitioned matrix data that other all calculation servers process;
In embodiments of the present invention, each calculation server that participate in to calculate then obtains the line identifier of each row component in each row user ID-project partitioned matrix in other calculation servers, and the line identifier of each user ID-weight partitioned matrix in other calculation servers.
Step 230, each calculation server that participates in calculating carries out beta pruning for the partitioned matrix data in the described user ID similar matrix and calculates;
Owing to may there be mass users, in order to reduce the impact on counting yield with the not high user of user's degree of correlation of current line, this step then needs the user that the degree of correlation is not high to carry out beta pruning.
Optionally, this step comprises:
Step S11 for each dimension of every row or every column matrix data in the partitioned matrix data in the user ID similar matrix, sorts the value of each dimension, keeps at least one forward dimension of ordering in every row or the every row.
In aforementioned s-matrix, the first behavior respective user u
0The similarity component, i.e. Sim (u
0, u
0), Sim (u
0, u
1) ... Sim (u
0, u
M), can keep so the similarity value of n forward dimension of ordering in this row component, other values are set to sky.
Step 240, partitioned matrix data in the described user ID similar matrix of each calculation server that participate in to calculate after according to described beta pruning, and minute quantitative character in each partitioned matrix data of described other all calculation servers processing, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Then for current calculation server, there are all row signs in its user's similarity partitioned matrix S, but the value that only has part rows in the component of every delegation, so according to aforementioned Computing Principle, need to be with user ID-project main body-project weight matrix and user ID-weight Mean Matrix transposition, even the row vector that obtains behind the line identifier transposition of user ID-project main body-project weight matrix and user ID-weight Mean Matrix aligns with the capable vector of S, then confirm calculative some line identifier, then can confirm each calculation server that current calculation server relies on according to the corresponding relation of this line identifier and calculation server, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, this step comprises:
Step S21, each participates in the calculation server of calculating with capable quantitative character or the row minute quantitative character transposition of dividing of each minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix;
With reference to Fig. 4, it confirms the calculation server that relies on and the example that relies on vector for the embodiment of the invention.Be illustrated in figure 4 as the synoptic diagram of two calculation servers, data are scattered by row, as above signal, and wherein odd-numbered line is assigned to calculation server N0, and even number line is assigned to calculation server N1.Wherein in calculating, subtract branch among the relational matrix S, may give up some data, so that matrix is a sparse matrix.The figure left side is matrix R and vectorial A.Will use transposition R ' and A ' in calculating, because data do not change, be the change in location of data, so additionally data are not stored procession conversion when just fetching data in calculating.For convenience of description, R and A are placed on the top of figure here with the form of transposition.
Be described to calculation server with distributed by row in the embodiment of the invention, it is similar to the computation process of calculation server to distribute to process and distributed by row that calculation server calculates by row, only needs a transposition to get final product, and is not described in detail at this.
In Fig. 4, original user ID-project main body-project weight matrix R comprises 5 row, and there is a line identifier in every row, and namely line number is u
0, u
1, u
2, u
3, u
4User ID-weight Mean Matrix A comprises also corresponding 5 row that comprise, there is a line identifier in every row, and namely line number is u
0, u
1, u
2, u
3, u
4User's similarity matrix S comprises 5 row, and there is a row sign in every row, and namely row number are u
0, u
1, u
2, u
3, u
4
So, the row that obtains the R that N1 calculates at N0 divides quantitative character u
1, u
3After, all row that obtain R divide quantitative character, and the row that N0 obtains the A of N1 calculating divides quantitative character u
1, u
3After, all row that obtain A divide quantitative character; N0 divides quantitative character (u with the row of R so
0, u
1, u
2, u
3, u
4) ', carry out transposition and obtain (u
0, u
1, u
2, u
3, u
4); Divide quantitative character (u with the row of A
0, u
1, u
2, u
3, u
4) ', carry out transposition and obtain (u
0, u
1, u
2, u
3, u
4).
Step S22 will divide quantitative character to align with the row of described user ID similar matrix by the result who obtains behind the row minute quantitative character transposition, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
To divide the row of R and A quantitative character to carry out result that transposition obtains and the row of S divide quantitative character (u
0, u
1, u
2, u
3, u
4) alignment.
Step S23, for the dimension of current each row or each row reservation, the corresponding row that keeps dimension of mark divides quantitative character or row minute quantitative character;
Step S24, row according to described mark divides quantitative character or row minute quantitative character, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
Step S25, divide calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
For calculation server N0, behavior the first row, the third line and fifth line in the s-matrix of its processing, every row has all kept (u
0, u
1, u
2,-, u
4), the capable vector in so calculative R matrix and the A matrix is u
0, u
1, u
2, u
4OK, and by the aforementioned u that knows
1In N1, N0 relies on the capable component u among the N1 so
1
For calculation server N0, behavior the first row, the third line and fifth line in the s-matrix of its processing, every row all kept (, u
1, u
2, u
3, u
4), the capable vector in so calculative R matrix and the A matrix is u
1, u
2, u
3, u
4OK, and by the aforementioned u that knows
2, u
4In N0, N1 relies on the capable component u among the N0 so
2, u
4
Step 250, each participates in the calculation server of calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
Then N0 notice N1 need to be with u
1Row component amount sends to N0, and N1 notice N0 need to be with u
2, u
4The row component sends to N1.
Step 260, each calculation server that participates in calculating sends to each calculation server of this component of dependence according to the described minute quantitative character that is relied on corresponding component; And receive the component that each calculation server sends;
Send (u such as aforementioned N0 to N1
2, u
4), N1 receives (u
2, u
4); N1 sends (u to N0
1), N0 receives (u
1).
Step 270, each calculation server that participate in to calculate is for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
Each calculation server that participate in to calculate is for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation
Then each computing node can be according to the recommended project for each user ui in the current computing node of formula (1) calculating.Be that N0 calculates for u
0, u
1, u
3The recommended project, N1 calculates for u
2, u
4The recommended project.
Wherein, transmit and receive data by calling the message passing interface between described each calculation server.
Present embodiment is the preferred embodiment for User based situation, and the sequencing of some step can change as the case may be, is not limited at this.
With reference to Fig. 5, the schematic flow sheet that it shows a kind of disposal route embodiment three based on collaborative filtering of the present invention specifically can comprise:
Step 300 receives and recommends the request of at least one project for main body, starts at least two calculation servers according to described request and recommends computation process;
In embodiments of the present invention, described main body can be such as the user ID in the network, so for user ID used or original various project in network, system or user then can ask to recommend certain or certain several projects for each user ID, such as for the product of in network, buying, recommend Related product to the user.
So described at least two calculation servers recommend computation process to comprise:
Step 310, for N calculation server, each calculation server that participates in calculating obtains various partitioned matrix data; Described each partitioned matrix data comprise the partitioned matrix data of user ID-project main body-project weight matrix, the partitioned matrix data of project-project similar matrix;
With reference to Fig. 6, it shows the Computing Principle synoptic diagram of embodiment of the invention Item based.Wherein Figure 30 1 and 302 is user ID-project main body-project weight matrix R, and 303 is project-project similar matrix S.
Its computing formula is as follows:
... formula (2)
Computation process is as follows:
1. obtain delegation corresponding to u according to R, find out the item that u does not give a mark, be i.e. the complete or collected works I of recommended candidate;
2. for each i among the I, obtain row of sim matrix, obtain the item set similar with i, sue for peace the marking value that to predict according to formula;
3. select according to demand some the highest in the marking of u to I, can obtain final recommendation item.
In embodiments of the present invention, project-project similar matrix S can be calculated by user ID-project main body-project weight matrix R.
For convenience of description, the embodiment of the invention is also to carry out piecemeal and offer each calculation server and calculate by being about to matrix, to offer the process that each calculation server calculates similar with aforementioned Computing Principle and process of carrying out piecemeal by capable piecemeal for by row matrix being carried out piecemeal, only need carry out corresponding transposition and calculate and get final product, in this and subsequent step, no longer be described in detail.
Step 320, each calculation server that participate in to calculate are obtained minute quantitative character in each partitioned matrix data that other all calculation servers process;
Be that each calculation server sends to other N-1 server with each line identifier in its S partitioned matrix that gets access to.
Step 330, each calculation server that participates in calculating is recommended collection according to user ID-project main body-project weight matrix calculated candidate;
Each calculation server that participates in calculating obtains delegation corresponding to u according to current R partitioned matrix, finds out the item that u does not give a mark, and namely the candidate of recommended candidate recommends to collect I.
Step 340, each calculation server that participate in to calculate according to described candidate recommend to collect, partitioned matrix data and the user ID-project main body-project weight matrix of project-project similar matrix confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence;
Then the line identifier of whole s-matrix can be carried out transposition by corresponding vector, align with R matrix column sign, recommend to collect the dimension of I according to corresponding candidate in the R matrix, need to confirm which row component of s-matrix, and this row component be in which calculation server.
Optionally, this step comprises:
Step S31, each participates in the calculation server of calculating with capable quantitative character or the row minute quantitative character transposition of dividing of each minute quantitative character of described project-project similar matrix;
Obtain the u of R such as calculation server N0
0, u
2, u
4OK, and the i of S
0, i
2, i
4OK, calculation server N1 obtains the u of R
1, u
3The i of row and S
1, i
3OK, after the N0 row that gets access to N1 divides quantitative character so, vector (i corresponding to quantitative character divide each row
0, i
1, i
2, i
3, i
4) ' transposition.
Step S32, to divide quantitative character to align with the row of described user ID-project main body-project weight matrix by the result who obtains behind the row minute quantitative character transposition, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
Calculation server N0 divides the transposition result of quantitative character and a row alignment of R matrix with the row of S, namely with each row component (u of R
0, u
1, u
2, u
3, u
4) alignment.
Step S33, row according to described mark divides quantitative character or row minute quantitative character, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
Such as, for the u among the N0
0OK, its corresponding recommended candidate collection is (i
2, i
3, i
4), i so
3In N1; For u
2OK, its corresponding recommended candidate collection is (i
2, i
3, i
4), i so
3In N1, for u
4OK, its corresponding recommended candidate collection is (i
0, i
3, i
4), i so
3In N1.
For the u among the N1
1OK, its corresponding recommended candidate collection is (i
1, i
3, i
4), i so
4In N0; For u
3OK, its corresponding recommended candidate collection is (i
0, i
3, i
4), i so
0, i
4In N0;
Step S34, divide calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
N0 relies on the i of N1 so
3, N1 relies on the i of N0
0, i
4
Step 350, each participates in the calculation server of calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
N0 notice N1 is with i so
3The row vector sends to N0, and N0 is with i for the N1 notice
0, i
4The row vector sends to N1.
Step 360, each calculation server that participates in calculating sends to each calculation server of this component of dependence according to the described minute quantitative character that is relied on corresponding component; And receive the component that each calculation server sends;
N1 is with i so
3The row vector sends to N0, and N0 is with i
0, i
4The row vector sends to N1.
Step 370, each calculation server that participate in to calculate is for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
Then carry out according to weight partitioned matrix data, the collaborative filtering partitioned matrix data of this locality and the component that receives according to formula (2), calculate for each u
iThe recommended project.
Present embodiment is the preferred embodiment for Item based situation, and the sequencing of some step can change as the case may be, is not limited at this.
With reference to Fig. 7, the structural representation that it shows a kind of disposal system embodiment one based on collaborative filtering of the present invention specifically can comprise:
Request receiving module 700 and at least two calculation servers;
Described request receiver module 700 is suitable for receiving the request of recommending at least one project for main body, starts at least two calculation servers according to described request;
Each described calculation server comprises at least two calculation servers:
Partitioned matrix data acquisition module 710 is suitable for at least two calculation servers, and each calculation server that participates in calculating obtains various partitioned matrix data; Described each partitioned matrix data comprise partitioned matrix data, the collaborative filtering partitioning of matrix matrix data of main body-project weight matrix;
Rely on calculation server and confirm module 720, be suitable for each calculation server that participates in calculating and confirm each server that current calculation server relies on according to described main body-project weight matrix and collaborative filtering matrix, and the component of each the partitioned matrix data in each calculation server of described dependence;
Component sending/receiving module 730 is suitable for each calculation server that participates in calculating the described component that is relied on is sent to each calculation server of this component of dependence; And receive the component that each calculation server sends;
Recommended project computing module 740, be suitable for each calculation server that participate in to calculate for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
Optionally, described dependence calculation server confirms that module comprises:
The component identifier acquisition module is suitable for each calculation server that participate in to calculate and obtains minute quantitative character in each partitioned matrix data that other all calculation servers process;
The first reliance server is confirmed module, be suitable for each calculation server that participates in calculating according to the partitioned matrix data in the described collaborative filtering distance matrix, and/or the partitioned matrix data of main body-project weight matrix, minute quantitative character in each partitioned matrix data of processing with other all calculation servers, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, also comprise:
The sign sending module is suitable for each calculation server that participates in calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
Further, described component sending/receiving module comprises:
The first component sending/receiving module is suitable for each calculation server that participates in calculating according to the described minute quantitative character that is relied on, and corresponding component is sent to rely on each calculation server of this component; And receive the component that each calculation server sends.
Optionally, described main body-project weight matrix comprises: user ID-project main body-project weight matrix and user ID-weight Mean Matrix;
Described collaborative filtering matrix is the user ID similar matrix of respective user sign-project main body-project weight matrix;
Further, described component comprises that the component of user ID-project main body-project weight matrix advances and user ID-weight Mean Matrix component.
Optionally, described the first reliance server confirms that module comprises:
The beta pruning module is suitable for each calculation server that participates in calculating and carries out beta pruning calculating for the partitioned matrix data in the described user ID similar matrix;
The second reliance server is confirmed module, be suitable for the partitioned matrix data in the described user ID similar matrix of each calculation server that participate in to calculate after according to described beta pruning, and minute quantitative character in each partitioned matrix data of described other all calculation servers processing, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, described beta pruning module comprises:
The first beta pruning module is suitable for each dimension for every row or every column matrix data in the partitioned matrix data in the user ID similar matrix, and the value of each dimension is sorted, and keeps at least one forward dimension of ordering in every row or the every row.
Optionally, described the second reliance server confirms that module comprises:
The first row/row minute quantitative character transposition module is suitable for each calculation server that participates in calculating with the capable minute quantitative character of each minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix or is listed as a minute quantitative character transposition;
The first row/row minute quantitative character alignment module, be suitable for and divide quantitative character to align by the result who obtains behind the row minute quantitative character transposition with the row of described user ID similar matrix, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
First keeps module, is suitable for the dimension for current each row or each row reservation, and the corresponding row that keeps dimension of mark divides quantitative character or row minute quantitative character;
The first judge module, be suitable for dividing quantitative character or row minute quantitative character according to the row of described mark, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
The 3rd reliance server is confirmed module, be suitable for dividing calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
Optionally, described main body-project weight matrix comprises: user ID-project main body-project weight matrix;
Described collaborative filtering matrix is the project-project similar matrix of respective user sign-project main body-project weight matrix;
Further, described component comprises the component of project-project similar matrix.
Optionally, described reliance server confirms that module comprises:
The first candidate recommends to collect computing module, is suitable for each calculation server that participates in calculating and recommends collection according to user ID-project main body-project weight matrix calculated candidate;
The 4th reliance server is confirmed module, be suitable for each calculation server that participate in to calculate according to described candidate recommend to collect, partitioned matrix data and the user ID-project main body-project weight matrix of project-project similar matrix confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence.
Optionally, described the 4th reliance server confirms that module comprises:
Second row/row minute quantitative character transposition module is suitable for each calculation server that participates in calculating with the capable minute quantitative character of each minute quantitative character of described project-project similar matrix or is listed as a minute quantitative character transposition;
Second row/row minute quantitative character alignment module, be suitable for and divide quantitative character to align by the result who obtains behind the row minute quantitative character transposition with the row of described user ID-project main body-project weight matrix, perhaps will divide the result that obtains behind the quantitative character transposition to divide quantitative character to align with the row of described association user ID similar matrix by row;
The second judge module, be suitable for dividing quantitative character or row minute quantitative character according to the row of described mark, row in the partitioned matrix data of the user ID of obtaining with this locality-project main body-project weight matrix and user ID-weight Mean Matrix divides quantitative character or row minute quantitative character relatively, judges that local non-existent row divides quantitative character or a row minute quantitative character;
The 5th reliance server is confirmed module, be suitable for dividing calculation server under quantitative character or the row minute quantitative character according to the non-existent row in this locality, confirm each calculation server that current server relies on, and minute quantitative character of user ID-project main body-project weight matrix and user ID-weight Mean Matrix in each calculation server that relies on.
Optionally, transmit and receive data by calling the message passing interface between described each calculation server.
With reference to Fig. 8, show the structural representation of a kind of according to an embodiment of the invention disposal system embodiment two based on collaborative filtering, specifically can comprise:
Request receiving module 800 and at least two calculation servers;
Described request receiver module 800 is suitable for receiving the request of recommending at least one project for main body, starts at least two calculation servers according to described request;
Each described calculation server comprises at least two calculation servers:
The first partitioned matrix data acquisition module 810 is suitable for each calculation server that participates in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise the block data of user ID-project main body-project weight matrix and user ID-weight Mean Matrix, the partitioned matrix data of user ID similar matrix;
The first component identifier acquisition module 820 is suitable for each calculation server that participate in to calculate and obtains minute quantitative character in each partitioned matrix data that other all calculation servers process;
Beta pruning module 830 is suitable for each calculation server that participates in calculating and carries out beta pruning calculating for the partitioned matrix data in the described user ID similar matrix;
The second reliance server is confirmed module 840, be suitable for the partitioned matrix data in the described user ID similar matrix of each calculation server that participate in to calculate after according to described beta pruning, and minute quantitative character in each partitioned matrix data of described other all calculation servers processing, confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence;
The first sign sending module 850 is suitable for each calculation server that participates in calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
The first component sending/receiving module 860 is suitable for each calculation server that participates in calculating according to the described minute quantitative character that is relied on, and corresponding component is sent to rely on each calculation server of this component; And receive the component that each calculation server sends;
The first recommended project computing module 870, be suitable for each calculation server that participate in to calculate for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
With reference to Fig. 9, show the structural representation of a kind of according to an embodiment of the invention disposal system embodiment three based on collaborative filtering, specifically can comprise:
Request receiving module 900 and at least two calculation servers;
Described request receiver module 900 is suitable for receiving the request of recommending at least one project for main body, starts at least two calculation servers according to described request;
Each described calculation server comprises at least two calculation servers:
The second partitioned matrix data acquisition module 910 is suitable for each calculation server that participates in calculating and obtains various partitioned matrix data; Described each partitioned matrix data comprise the block data of user ID-project main body-project weight matrix and user ID-weight Mean Matrix, the partitioned matrix data of user ID similar matrix;
Second component identifier acquisition module 920 is suitable for each calculation server that participate in to calculate and obtains minute quantitative character in each partitioned matrix data that other all calculation servers process;
The first candidate recommends to collect computing module 930, is suitable for each calculation server that participates in calculating and recommends collection according to user ID-project main body-project weight matrix calculated candidate;
The 4th reliance server is confirmed module 940, be suitable for each calculation server that participate in to calculate according to described candidate recommend to collect, partitioned matrix data and the user ID-project main body-project weight matrix of project-project similar matrix confirm each calculation server that current calculation server relies on, and minute quantitative character of each the partitioned matrix data in each calculation server of described dependence;
The second sign sending module 950 is suitable for each calculation server that participates in calculating with minute quantitative character of each the partitioned matrix data in each calculation server that relies on, and sends to each calculation server that current calculation server relies on;
Second component sending/receiving module 960 is suitable for each calculation server that participates in calculating according to the described minute quantitative character that is relied on, and corresponding component is sent to rely on each calculation server of this component; And receive the component that each calculation server sends;
The second recommended project computing module 970, be suitable for each calculation server that participate in to calculate for each main body in the partitioned matrix data of the main body of this locality-project weight matrix, corresponding relation between the component that utilizes the partitioned matrix data of local main body-project weight matrix, local collaborative filtering partitioning of matrix matrix data and receive is given described main body with at least one project recommendation.
Intrinsic not relevant with any certain computer, virtual system or miscellaneous equipment with demonstration at this algorithm that provides.Various general-purpose systems also can be with using based on the teaching at this.According to top description, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.Should be understood that and to utilize various programming languages to realize content of the present invention described here, and the top description that language-specific is done is in order to disclose preferred forms of the present invention.
In the instructions that provides herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can be put into practice in the situation of these details not having.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the description to exemplary embodiment of the present invention, each feature of the present invention is grouped together in the description of single embodiment, figure or alignment sometimes in the above.Yet the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires the more feature of feature clearly put down in writing than institute in each claim.Or rather, as following claims reflected, inventive aspect was to be less than all features of the disclosed single embodiment in front.Therefore, follow claims of embodiment and incorporate clearly thus this embodiment into, wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change and they are arranged in one or more equipment different from this embodiment the module in the equipment among the embodiment.Can be combined into a module or unit or assembly to the module among the embodiment or unit or assembly, and can be divided into a plurality of submodules or subelement or sub-component to them in addition.In such feature and/or process or unit at least some are mutually repelling, and can adopt any combination to disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and so all processes or the unit of disclosed any method or equipment make up.Unless in addition clearly statement, disclosed each feature can be by providing identical, being equal to or the alternative features of similar purpose replaces in this instructions (comprising claim, summary and the accompanying drawing followed).
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included among other embodiment, the combination of the feature of different embodiment means and is within the scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, perhaps realizes with the software module of moving at one or more processor, and perhaps the combination with them realizes.It will be understood by those of skill in the art that can use in practice microprocessor or digital signal processor (DSP) realize according to the embodiment of the invention based on some or all some or repertoire of parts in the disposal system of collaborative filtering.The present invention can also be embodied as be used to part or all equipment or the device program (for example, computer program and computer program) of carrying out method as described herein.Such realization program of the present invention can be stored on the computer-readable medium, perhaps can have the form of one or more signal.Such signal can be downloaded from internet website and obtain, and perhaps provides at carrier signal, perhaps provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation of the scope that does not break away from claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed in element or step in the claim.Being positioned at word " " before the element or " one " does not get rid of and has a plurality of such elements.The present invention can realize by means of the hardware that includes some different elements and by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to come imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title with these word explanations.