CN106971053A

CN106971053A - A kind of recommendation method based on mixing collaborative filtering

Info

Publication number: CN106971053A
Application number: CN201610012329.9A
Authority: CN
Inventors: 车海莺
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-01-08
Filing date: 2016-01-08
Publication date: 2017-07-21

Abstract

The invention discloses a kind of recommendation method based on mixing collaborative filtering, comprise the following steps：A user items rating matrix is set up, using the method for optimization singular value decomposition by user items matrix decomposition into two recessive factor matrixs on user and on project；Using gradient descent method for two continuous iterative approach optimal objective parameters of recessive factor matrix；Two recessive factor matrixs are multiplied, the user items matrix of a full rank is obtained；On the user items matrix of full rank, introduce user's effort analysis and project effort analysis, then it is combined using two kinds of collaborative filterings based on KNN, project-based collaborative filtering prediction scoring and the collaborative filtering prediction scoring based on user are obtained respectively, two kinds of collaborative filtering prediction scoring weighted sums are obtained into prediction of the user to project to score, most the multiple projects of prediction scoring highest generate recommendation list to user at last.

Description

A kind of recommendation method based on mixing collaborative filtering

Technical field

The present invention relates to recommended technology field, and in particular to a kind of recommendation method based on mixing collaborative filtering.

Background technology

In fast changing Internet era, network technology high speed development, at the same time, information constantly increase in geometric progression Long, each user obtainable information in internet is more and more, and information content is far beyond the screening scope of manpower. This means although information content is increased, the efficiency that user obtains available information is reduced.Therefore, how in magnanimity information For user select its may it is interested or may be useful to user information just into an important research topic.For solution Certainly this problem, commending system arises at the historic moment.

Commending system is used for user's recommended products, and these products can be most popular products or be based on The result that the demography of user is calculated, certainly, the most frequently used mode are also based on the historical behavior of user to user Hobby be predicted.So, the work of commending system can regard the personalized process of an electric business as, because it causes electricity Business website is adapted to the demand of each user, and specific commodity are provided for it.In the past, commending system is only by part electric business Website is as novel, minority a application, and nowadays, commending system has had changed into a very important business tool, And in the general layout of change internet business slowly.

In industrial quarters, collaborative filtering recommending mode the most frequently used at present is namely based on KNN collaborative filtering recommending, so And, collaborative filtering based on KNN adjusts the decline that the problem of recommending maximum is exactly the openness precision that can cause to predict the outcome of matrix, And in actual application, the rating matrix of user-project is generally all that than sparse, therefore, many recommendation results are not It is preferable.

The content of the invention

In view of this, the invention provides a kind of recommendation method based on mixing collaborative filtering, it can be commented in user-project Matrix is made up in the case that sub-matrix is sparse openness and produce the recommendation results of better quality.

In order to achieve the above object, technical scheme comprises the following steps：

Step 1, a user-project rating matrix is set up, using the method for optimization singular value decomposition by user-project square Battle array resolves into two recessive factor matrixs on user and on project.

Step 2, using gradient descent method for two recessive continuous iterative approach optimal objective parameters of factor matrix；

Step 3, two recessive factor matrixs are multiplied, obtain user-project matrix of a full rank.

Step 4, on user-project matrix of full rank, introduce user's effort analysis and project effort analysis, then use Two kinds of collaborative filterings based on KNN are combined, and project-based collaborative filtering prediction scoring are obtained respectively and based on user Collaborative filtering prediction scoring, the prediction scoring weighted sums of two kinds of collaborative filterings are obtained into prediction of the user to project and scored, most The multiple projects of prediction scoring highest generate recommendation list to user at last.

Further, user-project rating matrix includes m user and n project in step 1, by the user-project Rating matrix carries out singular value decomposition, obtains the recessive factor matrix U on the user and recessive factor matrix M on project； Wherein U is F × m rank matrix, and M is F × n rank matrix, and F is singular value number, is the number of the recessive factor.

Further, optimal objective parameter is：

u_i=[u_i1, u_i2..., u_if..., u_iF], m_j=[m_j1, m_j2..., m_jf..., m_jF]；r_ijFor actual scoring；λ is pre- If regularization constraint coefficient of balance, d_iFor the user i of setting effort analysis, d_jFor the project j of setting effort analysis；f∈ [1, F], u_ifThe value arranged for the i-th row f of matrix U, m_ifThe value arranged for matrix M jth row f.

Further, prediction of the user to project, which is scored, is

Project j prediction is scored for user i, d_iFor the user i of setting effort analysis, d_jFor the project j of setting Effort analysis；F ∈ [1, F], u_ifThe value arranged for the i-th row f of matrix U, m_jfThe value arranged for matrix M jth row f.

Further, project-based collaborative filtering predicts that scoring is：Pass through u pairs of item similar to destination item i of user Scorings of the user u to project i is predicted in the actual scoring of mesh set；

Collaborative filtering based on user predicts that scoring is：User u is predicted to destination item i scoring by similar users Scoring to project i.

Further, the parallelization of KNN algorithms is completed by the way of MapReduce+Hadoop.

Beneficial effect：

The present invention provides a kind of recommendation method SVD＆KNN Hybrid Collaborative based on mixing collaborative filtering Filtering (SKHCF), comprising：Based on optimization singular value decomposition (Singular Value Decomposition, SVD) Matrix fill-in technology and collaborative filtering (the Hybrid KNN-Based Collaborative based on KNN of mixing Filtering, H-KNN), user-project matrix is resolved into user and project by the method first by optimization singular value decomposition Two recessive factor matrixs, using the continuous iterative approach optimal objective parameter of gradient descent method, then by two matrix multiples, are obtained User-project matrix of one full rank, then uses H-KNN algorithms on non-singular matrix, predicts scoring of the user to project, most The recommendation list to targeted customer is produced eventually.

One aspect of the present invention, using based on mixing collaborative filtering recommendation method, solve in traditional collaborative filtering due to User-project rating matrix is relatively sparse and recommendation results that cause are inaccurate, recommend the problem of precision is not high；On the other hand, A set of practicable system schema is formd, for qualified input, preferably recommendation results relatively can be produced.

Embodiment

With reference to embodiment, the present invention will be described in detail.

Embodiment 1, present embodiments provide it is a kind of based on mixing collaborative filtering recommendation method, comprise the following steps：

Step 1, a user-project rating matrix is set up, using the method for optimization singular value decomposition by user-project square Battle array resolves into two recessive factor matrixs on user and on project；In the present embodiment, wrapped in user-project rating matrix M user and n project are included, the user-project rating matrix is subjected to singular value decomposition, the recessive factor on user is obtained Matrix U and the recessive factor matrix M on project；Wherein U is F × m rank matrix, and M is F × n rank matrix, and F is strange Different value number, is the number of the recessive factor.

Step 2, using gradient descent method for two recessive continuous iterative approach optimal objective parameters of factor matrix；This reality Apply in example, optimal objective parameter is：

u_i=[u_i1, u_i2..., u_if..., u_iF], m_j=[m_j1, m_j2..., m_jf..., m_jF]；r_iiFor actual scoring；λ is pre- If regularization constraint coefficient of balance, d_iFor the user i of setting effort analysis, d_jFor the project j of setting effort analysis；f∈ [1, F], u_ifThe value arranged for the i-th row f of matrix U, m_jfThe value arranged for matrix M jth row f.

Step 3, two recessive factor matrixs are multiplied, obtain user-project matrix of a full rank；

Completed by the way of singular value decomposition after matrix fill-in, next will be produced using H-KNN algorithms and recommend knot Really.The groundwork of H-KNN algorithms has two, is the similar neighbours of inquiry and the scoring of prediction project respectively.Most important of which Step is exactly to inquire about similar neighborhood, needs to build the two-dimensional matrix of a user-project before inquiry similar neighborhood.

H-KNN algorithms are substantially the combination of two kinds of collaborative filterings based on KNN, i.e., project-based collaborative filtering With the collaborative filtering based on user.Wherein, project-based collaborative filtering passes through u pairs of project similar to destination item i of user Scorings of the user u to project i is predicted in the actual scoring of set；And the collaborative filtering based on user by similar users to target Project i's scores to predict scorings of the user u to project i.

Scorings of the user u of two kinds of different modes predictions to project i can act as final result and directly show, and be terrible To more accurately result, weighted factor t is introduced herein, summation is weighted to two kinds of prediction scorings, in all prediction score values Middle selection Top-N generates recommendation list.Its specific formula is as follows.

r_{U, i}=t × r_u+(1-t)×r_i

Wherein, t is the control parameter that introduces, and span is [0,1], and value is at intervals of 0.1, the t after experiment in theory The optimal value in specific set of data can be got, if experiment condition allows, can also suitably reduce value interval, more be managed The t values thought.r_uTo use scoring of the targeted customer that the collaborative filtering based on user is obtained to a certain project, r_iTo use Scoring of the targeted customer that project-based collaborative filtering is obtained to a certain project.When t values are 0, only consider based on use Influence of the collaborative filtering at family to result, and it is then completely opposite when t values are 1.

So far, final predictions of the user u to project i is obtained to score.In the present embodiment, prediction of the user to project is scored For：

In embodiment 2, above-described embodiment 1, in actual system, the expense of KNN algorithms is all generally very big, is thought As the once customer volume of millions and million grades of number of songs, the expense for calculating nearest-neighbors is very fearful.Especially originally UB-CF and IB-CF are combined by the H-KNN algorithms that text is proposed, result are obtained by way of weighted sum, this means that Need to calculate similarity twice.Therefore, it is necessary to take certain optimization means solve KNN algorithms time space complexity it is high, The problems such as arithmetic speed is undesirable.

In view of the limitation of one-of-a-kind system performance, realize that parallel optimization is undoubtedly a preferably choosing using distributed system Select.The parallelization of KNN algorithms is completed by the way of MapReduce+Hadoop herein, implementation is as follows.

1st, user-project rating matrix is inputted.The Map stages receive key-value pair ＜ key, the value ＞, wherein key of input It is the line number of data set, and value is then the content of current line, that is, scoring of the user to a song.Next The cutting to this content is completed, Shuffle processes is completed according to key value, produces the output key-value pair in Reduce stages, at this moment Key become user id, value becomes song id and rating.Then, output of the Reduce stages the Map stages Synthesize user-project rating matrix.

2nd, the similitude between project between the step 1 of H-KNN algorithmic procedures, calculating user is completed.This process Map ranks The input of section is user-project rating matrix, and scoring of each user of Shuffle procedure extractions to project forms key-value pair, key Be worth be project to (user id (a), user id (b)), value values are scoring to (rating (a), rating (b)), by key assignments To the input as the Reduce stages, Similarity Measure between Reduce stage finished items, and result is preserved exported.Return to step Rapid 1, calculate the similitude between user with similar method.

3rd, this step Map stages are inputted as the similitude between user and project, the N number of arest neighbors of Shuffle processes completion The calculating in residence forms key-value pair, and key values are that user id or song id, value value are nearest N number of neighbours, Reduce stages 3,4 steps of H-KNN algorithms are completed, prediction of the targeted customer to project is calculated and scores, and recommendation list is formed with this.

Embodiment 3, this method complete target component using stochastic gradient descent method and optimized.Stochastic gradient descent method is one Individual optimization algorithm, also commonly referred to as steepest descent method.This method is using negative gradient direction as the direction of search, under stochastic gradient Drop method is closer to desired value, and step-length is smaller, advances slower.Use comprising the following steps that for stochastic gradient descent method：

1st, certainty factor number F, punishment parameter λ and learning rate η, initialising subscriber stealth factor matrix U and project it is recessive because Submatrix M；

2nd, for each user-project scoring to (user u, project m belongs to matrix R)：

201st, error score is calculated

202nd, user u and project m recessive factor vector u is updated_ifAnd m_jf：

u_if+=η (e_ui·m_jf-λ·u_if)

m_jf+=η (e_ui·u_if-λ·m_jf)

3rd, target component E is calculated, if E value makes η=0.9 × η, then proceeded to than small, renewal learning rate before Step 2, until E values are in interval concussion by a small margin or reached default iterations.

To sum up, presently preferred embodiments of the present invention is these are only, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements made etc. should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of recommendation method based on mixing collaborative filtering, it is characterised in that comprise the following steps：

Step 1, a user-project rating matrix is set up, using the method for optimization singular value decomposition by user-project matrix point Solution is on user and on two of project recessive factor matrixs；

Step 4, on user-project matrix of full rank, user's effort analysis and project effort analysis are introduced, then using two kinds Collaborative filtering based on KNN is combined, and project-based collaborative filtering prediction scoring and the association based on user are obtained respectively With filtering prediction scoring, two kinds of collaborative filtering prediction scoring weighted sums are obtained into prediction of the user to project and scored, most at last The multiple projects of prediction scoring highest generate recommendation list to user.

2. a kind of recommendation method based on mixing collaborative filtering as claimed in claim 1, it is characterised in that described step 1 Described in user-project rating matrix include m user and n project, the user-project rating matrix is subjected to singular value Decompose, obtain the recessive factor matrix U on the user and recessive factor matrix M on project；Wherein U is F × m rank square Battle array, M is F × n rank matrix, and F is singular value number, is the number of the recessive factor.

3. a kind of recommendation method based on mixing collaborative filtering as claimed in claim 2, it is characterised in that

The optimal objective parameter is：

E = \frac{1}{2} \underset{(i, j) &Element; R}{Σ} {(r_{i j} - d_{i} - d_{j} - u_{i} m_{j}^{T})}^{2} + λ (| | u_{i} | |^{2} + | | m_{j} | |^{2} + {d_{a}}^{2} + {d_{b}}^{2});

u_i=[u_i1, u_i2..., u_if..., u_iF], m_j=[m_j1, m_j2..., m_jf..., m_jF]；r_ijFor actual scoring；λ is default Regularization constraint coefficient of balance, d_iFor the user i of setting effort analysis, d_fFor the project j of setting effort analysis；F ∈ [1, F], u_ifThe value arranged for the i-th row f of matrix U, m_jfThe value arranged for matrix M jth row f.

4. a kind of recommendation method based on mixing collaborative filtering as claimed in claim 3, it is characterised in that

Prediction of the user to project, which is scored, is

Project j prediction is scored for user i, d_iFor the user i of setting effort analysis, d_fFor the project j of setting scoring Deviation；F ∈ [1, F], u_ifThe value arranged for the i-th row f of matrix U, m_jfThe value arranged for matrix M jth row f.

5. a kind of recommendation method based on mixing collaborative filtering as claimed in claim 4, it is characterised in that described to be based on project Collaborative filtering prediction scoring be：User is predicted by the actual scoring of u pairs of project set similar to destination item i of user Scorings of the u to project i；

The collaborative filtering based on user predicts that scoring is：User u is predicted to destination item i scoring by similar users Scoring to project i.

6. a kind of recommendation method based on mixing collaborative filtering as claimed in claim 5, it is characterised in that use MapReduce+Hadoop mode completes the parallelization of KNN algorithms.