CN109636509A - A kind of score in predicting method based on non symmetrical distance building submatrix - Google Patents

A kind of score in predicting method based on non symmetrical distance building submatrix Download PDF

Info

Publication number
CN109636509A
CN109636509A CN201811382976.4A CN201811382976A CN109636509A CN 109636509 A CN109636509 A CN 109636509A CN 201811382976 A CN201811382976 A CN 201811382976A CN 109636509 A CN109636509 A CN 109636509A
Authority
CN
China
Prior art keywords
user
article
anchor point
distance
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811382976.4A
Other languages
Chinese (zh)
Other versions
CN109636509B (en
Inventor
王邦
杨雪娇
刘生昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201811382976.4A priority Critical patent/CN109636509B/en
Publication of CN109636509A publication Critical patent/CN109636509A/en
Application granted granted Critical
Publication of CN109636509B publication Critical patent/CN109636509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of score in predicting methods based on non symmetrical distance building submatrix, each user-article scoring is measured to the relationship between anchor point using non symmetrical distance, the interest of each anchor point neighborhood is allowed to concentrate near anchor point as far as possible, the submatrix of composition can reflect the concentration interest of a certain group user compared to using symmetry distance more, can be more preferable to the prediction result of the hollow lacuna of matrix;Anchor point is selected using the quick clustering method for finding density peak, each anchor point is made to have biggish neighborhood density, and is spaced each other farther out, the submatrix obtained so as to cause segmentation is evenly distributed, and it is representative, it can effectively cover original rating matrix;Each user-article in matrix is measured using non symmetrical distance to score to the relationship between anchor point, the number that scores under symmetry distance is reduced to punish to apart from bring very little, the inactive user or unexpected winner article less for scoring number, it has bigger probability to be divided into submatrix, improves submatrix to the level of coverage of data.

Description

A kind of score in predicting method based on non symmetrical distance building submatrix
Technical field
The invention belongs to personalized recommendation fields, construct submatrix based on non symmetrical distance more particularly, to a kind of Score in predicting method.
Background technique
With the arrival in web2.0 epoch and the significant increase of network bandwidth, various social network-i i-platforms start Existing, the information of fragmentation starts to be flooded with people's lives.In order to solve problem of information overload, personalized recommendation system is increasingly Show its important value.Such as in electric business field, recommender system is emerging according to the historical behavior information architecture user's of user Interesting model, calculate the article that user did not buy them likes degree, then recommends him that may like to user Article.
In practical applications, it generallys use collaborative filtering method to the prediction of user preference to carry out, basic thought is: base Neighbor user similar with its preference is found to the scoring of article in user, the article for then being liked the neighbor user is recommended To active user, wherein foremost is exactly matrix decomposition technology.Matrix decomposition technology is by assuming that original rating matrix is complete Office's low-rank, using the thought of SVD, disassemble is user's factor matrix and article factor matrix, allows the multiplying of the two matrixes Product restores original rating matrix as far as possible, while predicting the vacancy item of original rating matrix.It is based on different from these complete The matrix disassembling method of office's low-rank, proposes a kind of decomposition method based on local low-rank in recent years, selects at random in rating matrix Several anchor points are selected, different submatrixs is assigned it to according to the distance of each data to different anchor points and is predicted.
However, this method relies on each point to the symmetry distance of anchor point to carry out submatrix division, obtained submatrix Can not be well reflected the concentration interest of one group of similar users, at the same the division mode of submatrix by default anchor point (anchor point Number and distribution) be affected, and that there are low volume data points to become because not being divided into any one submatrix is " isolated Point ", these problems cause score in predicting result precision limited.
Summary of the invention
In view of the drawbacks of the prior art, it is an object of the invention to solve the score in predicting knot of recommender system in the prior art The technical issues of fruit accuracy.
To achieve the above object, in a first aspect, the embodiment of the invention provides one kind to construct sub- square based on non symmetrical distance The score in predicting method of battle array, method includes the following steps:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtain user characteristics matrix and Article characteristics matrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, structure Build the non symmetrical distance matrix between article;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to asymmetric between article Distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance, according to Article characteristics matrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article between anchor point article Non symmetrical distance, calculates the similarity of user-article pair and anchor point, and determines the anchor neighborhood of a point, the anchor point according to similarity It constitutes with its whole neighborhoods using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the son User scores to the prediction of article in matrix;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
Specifically, in step S2, other users are calculated to the asymmetric of it using user characteristics matrix for user i ' Distance:
Wherein, Dkl(Pj′||Pi′) indicate that user j ' arrives the distance of user i ', Pi′Indicate that the row of user characteristics matrix the i-th ', K are User characteristics matrix column number, i '=1,2 ... M, j '=1,2 ... M, M indicate the quantity of user;As between user The i-th ' row jth ' column value of non symmetrical distance matrix;
For article i ", using article characteristics matrix, other articles are calculated to its non symmetrical distance:
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K are Article characteristics matrix column number, i "=1,2 ... N, j "=1,2 ... N, N indicate the quantity of article,As between article The value of i-th " row jth " column of non symmetrical distance matrix.
Specifically, step S3 includes following sub-step:
S301. it is less than distance threshold in the i-th row of the non symmetrical distance matrix between counting userElement number, make For the density of user iIt counts and is less than distance threshold in the jth row of the non symmetrical distance matrix between articleElement Number, the density as article j
S302. it for user i, finds density and is greater thanAll users, calculate separately in these users each user with The average distance of user i, the minimum value of all average distances are the separating distance of user iFor article j, it is big to find density InAll items, calculate separately the average distance of each article and article j in these articles, the minimum of all average distances Value is the separating distance of article j
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen as anchor point User calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article;
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair, anchor are obtained Point user-anchor point article is to composition anchor point;
Wherein, i=1,2 ... M, j=1,2 ... N, M indicate the quantity of user, and N indicates the quantity of article.
Specifically, the average distance of point a and point b is equal to (disab+disba)/2, disabFor the non symmetrical distance of point a to b, disbaFor the non symmetrical distance of point b to a.
Specifically, threshold valueSelection allow as far as possible neighbours' number average value of user close to total number of users 4%, threshold valueSelection allow neighbours' number average value of article close to the 4% of total number of items as far as possible.
Specifically, anchor point number q is automatically determined according to data set situation, specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculate i-th user representative's degree and come it If difference is greater than preset threshold candidate anchor point user list is added in the user by the difference of 10 user representative's degree mean values afterwards, after It is continuous to calculate i+1 user;If difference is not more than preset threshold, (2) are entered step;
(2) all items representative degree descending is arranged, in the arrangement, calculate j-th article representative degree and come it If difference is greater than preset threshold candidate anchor point item lists are added in the article by the difference of 10 article representative degree mean values afterwards, after It is continuous to calculate+1 article of jth;If difference is not more than preset threshold, (3) are entered step;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value As anchor point number q.
Specifically, step S5 includes following sub-step:
S501. it calculates user-article and anchor point (u is arrived to (I, J)t,vt) similarity beCalculation formula is such as Under:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, t=1,2 ... q;
S502. judgeWhether 0 is not equal to, if so, user-article is anchor point (u to (I, J)t,vt) Neighborhood, otherwise, user-article are not anchor point (u to (I, J)t,vt) neighborhood;
S503. anchor point (ut,vt) and its whole neighborhoods constitute the submatrix using the anchor point as core.
Specifically, step S6 specifically: by anchor point (ut,vt) be core submatrix be decomposed into user's factor matrix UtWith Article factor matrix Vt, by the continuous repetitive exercise of gradient descent method, objective function is as follows:
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, as matrix decomposition Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, U when training restrainstAnd VtIt is exactly me The user's factor matrix and article factor matrix to be obtained, t=1,2 ... q;
User I scores to the prediction of article J in the submatrix
Specifically, user is as follows to the calculation formula of the final prediction scoring of article in step S7:
Wherein,Indicate that user I scores to the final prediction of article J,Indicate that user I is to object in t-th of submatrix The prediction of product J is scored,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, (ut,vt) indicate the T anchor point.
Second aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums Computer program is stored in matter, which realizes score in predicting described in above-mentioned first aspect when being executed by processor Method.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect Fruit:
1. the present invention is measured each user-article in matrix using non symmetrical distance and scored to the pass between anchor point System, allow the interest of each anchor point neighborhood to concentrate near anchor point as far as possible, therefore constitute submatrix compared to use symmetrically away from , can be more preferable to the prediction result of the hollow lacuna of matrix from the concentration interest that can more reflect a certain group user, prediction accuracy is more It is high.
2. the present invention selects anchor point using a kind of clustering method for quickly finding density peak, have each anchor point larger Neighborhood density, and be spaced farther out, be evenly distributed so as to cause the obtained submatrix of segmentation each other, it is representative, Original rating matrix can effectively be covered.
3. each neighborhood of a point density and spacing that the present invention is measured during selecting anchor point have certain point Cloth characteristic enables anchor point number to be adjusted according to data set size adaptation, improves so as to automatically determine anchor point number Training effectiveness.
4. the present invention is measured each user-article in matrix using non symmetrical distance and scored to the pass between anchor point System reduces the number that scores under symmetry distance and punishes very little to apart from bring, the inactive user less for scoring number Or unexpected winner article, it has bigger probability and is divided into submatrix, improve submatrix to the level of coverage of data, prediction essence Exactness is higher.
Detailed description of the invention
Fig. 1 is a kind of score in predicting method flow that submatrix is constructed based on non symmetrical distance provided in an embodiment of the present invention Figure;
Fig. 2 is that index representative degree γ descending during anchor point is selected to arrange schematic diagram in the embodiment of the present invention;
Fig. 3 is the anchor point neighborhood schematic diagram provided in an embodiment of the present invention determined using non symmetrical distance;
Fig. 4 is the anchor point neighborhood schematic diagram determined in the prior art using symmetry distance;
Fig. 5 is the schematic diagram provided in an embodiment of the present invention that submatrix is constructed according to anchor point.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Integral Thought of the invention is, first according to user each in data set, the density size and and density of article More a little bigger distance automatically determines best anchor point number, and picks out suitable user and article composition anchor point;Then according to number According to concentrating other users or article to find neighborhood to the non symmetrical distance of anchor point for each anchor point, to form sub- square one by one Battle array;The matrix decomposition for carrying out Weight in each submatrix later carries out score in predicting to target item.Finally by every sub- square Prediction result in battle array is combined, and forms final prediction result.
As shown in Figure 1, a kind of score in predicting method based on non symmetrical distance building submatrix, this method includes following step It is rapid:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtain user characteristics matrix and Article characteristics matrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, structure Build the non symmetrical distance matrix between article;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to asymmetric between article Distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance, according to Article characteristics matrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article between anchor point article Non symmetrical distance, calculates the similarity of user-article pair and anchor point, and determines the anchor neighborhood of a point, the anchor point according to similarity It constitutes with its whole neighborhoods using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the son User scores to the prediction of article in matrix;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
Step S1. decomposes user-article rating matrix using non-negative matrix factorization method, obtains user characteristics square Battle array and article characteristics matrix.
According to user data, product data, user to the data of the scoring of article, user-article rating matrix is constructed RM×N
Wherein, M indicates the quantity of user, and N indicates the quantity of article, RijScoring of the expression user i to article j, i=1, 2 ... M, j=1,2 ... N.
User-article rating matrix R is decomposed using non-negative matrix factorization methodM×N, obtain user characteristics matrix PM×KAnd object Product eigenmatrix QN×K
It is required when same:
Wherein, PM×KIndicate the relationship between M user and K theme, QN×KIt indicates between N number of article and K theme Relationship.PM×KEvery a line represent a user K dimension feature vector, QN×KEvery a line represent an article K dimension feature Vector.
The feature vector that user, article are extracted by Non-negative Matrix Factorization avoids directly using one in rating matrix Row or a column are as problem excessively sparse caused by feature vector.Meanwhile Non-negative Matrix Factorization ensure that feature vector is every A dimension is all positive value, is conducive to next processing.
Step S2. constructs the non symmetrical distance matrix between user, according to article characteristics square according to user characteristics matrix Battle array constructs the non symmetrical distance matrix between article.
Non symmetrical distance is the distance for giving directions the distance of a to point b to be not equal to point b to point a.The distance of a to b of setting up an office is disab, the distance of point b to a is disba, disab≠disba
For user i ', user characteristics matrix P is usedM×K, non symmetrical distance of the calculating other users to it
Wherein, Dkl(Pj′||Pi′) indicate that user j ' arrives the distance of user i ', Pi′Indicate that the row of user characteristics matrix the i-th ', K are User characteristics matrix column number.Non symmetrical distance matrix is finally constitutedThe row of matrix the i-th ' indicates other users to the The non symmetrical distance of a user of i 'Wherein, i '=1,2 ... M, j '=1,2 ... M.
For article i ", article characteristics matrix Q is usedN×K, other articles are calculated to its non symmetrical distance
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K are Article characteristics matrix column number.Non symmetrical distance matrix is finally constituted" row indicates other articles to the to matrix i-th The non symmetrical distance of a article of i "Wherein ,=1,2 ...=1,2 ... N of N, j " of i ".
Step S3. chooses q anchor point user, according to non-between article according to the non symmetrical distance matrix between user Symmetry distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point.
S301. non symmetrical distance matrix is countedIt is less than distance threshold in i-th rowElement number, as with The density of family iCount non symmetrical distance matrixIt is less than distance threshold in jth rowElement number, as article The density of jI=1,2 ... M, j=1,2 ... N.
If the distance of user j ' to user i ' is less than distance threshold, user j ' is the neighbours of user i '.Non symmetrical distance MatrixIt is less than distance threshold in i-th rowElement number, as neighbours' number of user i.Threshold valueSelection it is most It is about the 4% of total number of users that neighbours' number average value of user, which may be allowed,.
If the distance of article j " arriving article i " is less than distance threshold, the neighbours of article j " being article i ".Non symmetrical distance MatrixIt is less than distance threshold in jth rowElement number, as neighbours' number of article j.Threshold valueSelection it is most It is about the 4% of total number of items that neighbours' number average value of article, which may be allowed,.
S302. it for user i, finds density and is greater thanAll users, calculate separately in these users each user with The average distance of user i, the minimum value of all average distances are the separating distance of user iFor article j, it is big to find density InAll items, calculate separately the average distance of each article and article j in these articles, all average distances are most The separating distance that small value is article j
The average distance of point a and point b is equal to (disab+disba)/2, disabFor the non symmetrical distance of point a to b, disbaFor The non symmetrical distance of point b to a.
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen as anchor point User calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article.
ρ and δ are bigger, and γ is also bigger, illustrate that the point is more suitable as anchor point.The selection standard of anchor point are as follows: anchor point has larger Neighborhood density, anchor point and density it is bigger point distance farther out.By limit anchor point to have larger neighborhood density and with density more Big point distance farther out, not only has enough training datas using the submatrix that each anchor point is constituted as core, while each other Also there is certain difference, the submatrix that this obtains segmentation is more reasonable, also allows submatrix that can larger cover as far as possible original Matrix.
Anchor point number q is automatically determined according to data set situation, specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculate i-th user representative's degree and come it If difference is greater than preset threshold candidate anchor point user list is added in the user by the difference of 10 user representative's degree mean values afterwards, after It is continuous to calculate i+1 user;If difference is not more than preset threshold, stop calculating, i=1,2 ... M;
(2) all items representative degree descending is arranged, in the arrangement, calculate j-th article representative degree and come it If difference is greater than preset threshold candidate anchor point item lists are added in the article by the difference of 10 article representative degree mean values afterwards, after It is continuous to calculate+1 article of jth;If difference is not more than preset threshold, stop calculating, j=1,2 ... N;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value As anchor point number q.
Preset threshold is preferably 0.1.If difference is not more than threshold value, then it is assumed that γ value has tended towards stability.According to every number According to the difference of collection, the minimum anchor point number for being suitble to each data set can be adaptively determined by automatically selecting anchor point counting method, The building that useless submatrix can be reduced, improves efficiency.
As shown in Fig. 2, abscissa n indicates the sequence serial number of these points after the arrangement of γ descending, ordinate γ indicates these points Representative degree.In the descending arrangement of γ, more forward point is more suitable as anchor point.Be suitable as anchor point user γ variation compared with Greatly, the user γ for being not suitable as anchor point is smoother, and there is certain jump between the two.By the jump for finding γ Place can determine anchor point number q.The point and remaining each point for being suitable for anchor point are separated by dotted line, it can be seen that automatically select Anchor point can select the anchor point of suitable number according to data set actual conditions, improve efficiency.
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair, anchor are obtained Point user-anchor point article is to composition anchor point.
Step S4. is for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance, According to article characteristics matrix, article is calculated to the non symmetrical distance between anchor point article.
The non symmetrical distance of user to anchor point user are calculated using formula (4), and article is to asymmetric between anchor point article Distance is calculated using formula (5).
Step S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article to anchor point article it Between non symmetrical distance, calculate the similarity of user-article pair and anchor point, and the anchor neighborhood of a point determined according to similarity, this Anchor point and its whole neighborhoods are constituted using the anchor point as the submatrix of core.
S501. calculate user-article is to the similarity of (I, J) to anchor point (u, v)Calculation formula is as follows:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, value 8.
S502. judgeWhether 0 is not equal to, if so, user-article is the neighbour of anchor point (u, v) to (I, J) Domain, otherwise, user-article are not the neighborhood of anchor point (u, v) to (I, J).
Only meeting user I simultaneously is the neighbours that the neighbours of anchor point user u, article J are anchor point article v, user-article pair (I, J) is only the neighborhood of anchor point (u, v).
As shown in figure 3, using the distance of non symmetrical distance measure user A and user B, if the interest distribution of A is B interest point The subset of cloth, then the distance of A to B is small, but the distance of B to A is not necessarily small.Anchor point user and other use are measured with non symmetrical distance The distance at family, the interest distribution of obtained neighborhood user is substantially all within the interest distribution of anchor point user, therefore neighborhood The interest of user can compare concentration.So no matter how anchor neighborhood of a point article characteristics vector is distributed, user in the submatrix Interest is all almost consistent.As shown in figure 4, interest of the A in B is distributed it using the distance of symmetry distance measure user A and B Outside, it is more likely that there are also other interest to be distributed.Therefore the interest of anchor neighborhood of a point user is distributed in the interest distribution of anchor point user Interior consistent degree is big, and it is small to be distributed outer consistent degree in the interest of anchor point user.If the interest of neighborhood article is distributed just in anchor point user Interest distribution except, the interest of these neighborhoods user just cannot keep unanimously, thus submatrix also different surely generation of composition The concentration interest of the table group user.
Neighborhood is found for each anchor point it follows that using non symmetrical distance, the submatrix constituted can be than using pair The concentration interest of a certain group user, the prediction result of matrix decomposition can more be reacted by claiming distance to find the submatrix that neighborhood is constituted It can be more preferable.Non symmetrical distance reduces the number that scores under symmetry distance and punishes very little to apart from bring simultaneously, a for scoring Number less inactive user or unexpected winner article, have bigger probability and are divided into submatrix, and this considerably reduce isolated The presence of point.
S503. anchor point (u, v) and its whole neighborhoods constitute the submatrix using the anchor point as core.
Each anchor point and its neighborhood constitute the sub- rating matrix using the anchor point as core, and q anchor point just will form q Submatrix.For example, for anchor point (ut,vt), ownPoint (I, J) not equal to 0 is constituted together with (ut,vt) it is core The submatrix that the heart is constituted.
Step S6. is trained the submatrix using the matrix disassembling method of Weight, obtains for each submatrix User scores to the prediction of article in the submatrix.
Submatrix is decomposed into user's factor matrix U and article factor matrix V, passes through the continuous iteration of gradient descent method Training, so that the product of U and V becomes closer to ewal matrix.Predicted value is exactly to keep its loss function minimum close to true value, this It is our objective function.
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, as matrix decomposition Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, generally takes 0.001-0.1.
U when training convergencetAnd VtIt is exactly our user's factor matrixs and article factor matrix to be obtained, t=1,2 ... q;
User I scores to the prediction of article J in the submatrixUtIndicate the use of t-th of submatrix Family factor matrix, VtIndicate the article factor matrix of t-th of submatrix, t=1,2 ... q.
As shown in figure 5, finding adjacent region data after q anchor point of selection for each anchor point, and then it may be constructed q sub- squares Battle array.Matrix decomposition is carried out to each submatrix, to obtain the prediction scoring of each submatrix.
The prediction scoring of q submatrix is weighted and averaged by step S7., obtains final pre- assessment of the user to article Point.
Wherein,Indicate that user I scores to the final prediction of article J, (ut,vt) indicate t-th of anchor point.
For the prediction effect for verifying prediction technique proposed by the present invention, selection know Live, movielens-100k and Tri- data sets of movielens-1m compare the method for the present invention, the global score in predicting side using big matrix as research object Method chooses the score in predicting method of building submatrix based on symmetry distance and random anchor point, is based on symmetry distance and preference anchor point The prediction error of the score in predicting method of building submatrix is chosen, comparing result is as shown in table 1-3.
Table 1
Table 2
Table 3
By above-mentioned comparing result it is found that the score in predicting side proposed by the present invention based on non symmetrical distance building submatrix Method, more traditional score in predicting method is significantly improved in the evaluation index of RMSE and MAE, shows in test set coverage rate Writing improves the problem of test data can not be completely covered in submatrix in other score in predicting methods based on building submatrix.This It is that neighborhood has been more, interval farther away anchor point again due to having selected when constructing submatrix, while is sought using non symmetrical distance Anchor neighborhood of a point is looked for, so that the submatrix discrimination constituted is larger, more fully to the covering of big matrix, also can more react group The concentration interest of user, therefore reduce prediction error.
More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.

Claims (10)

1. a kind of score in predicting method based on non symmetrical distance building submatrix, which is characterized in that this method includes following step It is rapid:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtains user characteristics matrix and article Eigenmatrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, construction Non symmetrical distance matrix between product;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to the non symmetrical distance between article Matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, the non symmetrical distance of calculating user to anchor point user, according to article Eigenmatrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article to non-right between anchor point article Claim distance, calculate the similarity of user-article pair and anchor point, and the anchor neighborhood of a point is determined according to similarity, the anchor point and it Whole neighborhoods constitute using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the submatrix Middle user scores to the prediction of article;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
2. score in predicting method as described in claim 1, which is characterized in that in step S2, for user i ', use user spy Matrix is levied, non symmetrical distance of the calculating other users to it:
Wherein, Dkl(Pj′||Pi) indicate that user j ' arrives the distance of user i ', Pi′Indicate the row of user characteristics matrix the i-th ', K is user The columns of eigenmatrix, i '=1,2 ... M, j '=1,2 ... M, M indicate the quantity of user;It is as asymmetric between user The i-th ' row jth ' column value of distance matrix;
For article i ", using article characteristics matrix, other articles are calculated to its non symmetrical distance:
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K is article The columns of eigenmatrix, i "=1,2 ... N, j "=1,2 ... N, N indicate the quantity of article,It is non-right as between article Claim the value of i-th " row jth " column of distance matrix.
3. score in predicting method as described in claim 1, which is characterized in that step S3 includes following sub-step:
S301. it is less than distance threshold in the i-th row of the non symmetrical distance matrix between counting userElement number, as with The density of family iIt counts and is less than distance threshold in the jth row of the non symmetrical distance matrix between articleElement number, make For the density of article j
S302. it for user i, finds density and is greater thanAll users, calculate separately each user and user i in these users Average distance, the minimum values of all average distances is the separating distance of user iFor article j, finds density and be greater than All items, calculate separately the average distance of each article and article j in these articles, the minimum value of all average distances is The separating distance of article j
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen to use as anchor point Family calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article;
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair is obtained, anchor point is used Family-anchor point article is to composition anchor point;
Wherein, i=1,2 ... M, j=1,2 ... N, M indicate the quantity of user, and N indicates the quantity of article.
4. score in predicting method as claimed in claim 3, which is characterized in that the average distance of point a and point b is equal to (disab+ disba)/2, disabFor the non symmetrical distance of point a to b, disbaFor the non symmetrical distance of point b to a.
5. score in predicting method as claimed in claim 3, which is characterized in that threshold valueSelection allow the neighbours of user as far as possible Number average value close to total number of users 4%, threshold valueSelection allow neighbours' number average value of article close to article as far as possible The 4% of sum.
6. score in predicting method as claimed in claim 3, which is characterized in that anchor point number q is automatic according to data set situation Determining, it is specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculates i-th user representative's degree and row after it 10 The user is added candidate anchor point user list, continues to count by the difference of a user representative's degree mean value if difference is greater than preset threshold Calculate i+1 user;If difference is not more than preset threshold, (2) are entered step;
(2) all items representative degree descending is arranged, in the arrangement, calculates j-th article representative degree and row after it 10 The article is added candidate anchor point item lists, continues to count by the difference of a article representative degree mean value if difference is greater than preset threshold Calculate+1 article of jth;If difference is not more than preset threshold, (3) are entered step;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value as Anchor point number q.
7. score in predicting method as described in claim 1, which is characterized in that step S5 includes following sub-step:
S501. it calculates user-article and anchor point (u is arrived to (I, J)t,vt) similarity beCalculation formula is as follows:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, t=1,2 ... q;
S502. judgeWhether 0 is not equal to, if so, user-article is anchor point (u to (I, J)t,vt) neighborhood, Otherwise, user-article is not anchor point (u to (I, J)t,vt) neighborhood;
S503. anchor point (ut,vt) and its whole neighborhoods constitute the submatrix using the anchor point as core.
8. score in predicting method as described in claim 1, which is characterized in that step S6 specifically: by anchor point (ut,vt) it is core The submatrix of the heart is decomposed into user's factor matrix UtWith article factor matrix Vt, by the continuous repetitive exercise of gradient descent method, Objective function is as follows:
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, the power as matrix decomposition Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, U when training restrainstAnd VtIt is exactly that we want The user's factor matrix and article factor matrix of acquisition, t=1,2 ... q;
User I scores to the prediction of article J in the submatrix
9. score in predicting method as described in claim 1, which is characterized in that final pre- assessment of the user to article in step S7 The calculation formula divided is as follows:
Wherein,Indicate that user I scores to the final prediction of article J,Indicate that user I is to article J's in t-th of submatrix Prediction scoring,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, (ut,vt) indicate t-th of anchor Point.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize score in predicting method as described in any one of claim 1 to 9 when being executed by processor.
CN201811382976.4A 2018-11-20 2018-11-20 Scoring prediction method for constructing submatrix based on asymmetric distance Active CN109636509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811382976.4A CN109636509B (en) 2018-11-20 2018-11-20 Scoring prediction method for constructing submatrix based on asymmetric distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811382976.4A CN109636509B (en) 2018-11-20 2018-11-20 Scoring prediction method for constructing submatrix based on asymmetric distance

Publications (2)

Publication Number Publication Date
CN109636509A true CN109636509A (en) 2019-04-16
CN109636509B CN109636509B (en) 2020-12-18

Family

ID=66068390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811382976.4A Active CN109636509B (en) 2018-11-20 2018-11-20 Scoring prediction method for constructing submatrix based on asymmetric distance

Country Status (1)

Country Link
CN (1) CN109636509B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008334A (en) * 2019-12-04 2020-04-14 华中科技大学 Top-K recommendation method and system based on local pairwise ordering and global decision fusion
CN112148584A (en) * 2019-06-28 2020-12-29 北京达佳互联信息技术有限公司 Account information processing method and device, electronic equipment and storage medium
CN113239266A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on local matrix decomposition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151441A1 (en) * 2011-12-13 2013-06-13 Xerox Corporation Multi-task learning using bayesian model with enforced sparsity and leveraging of task correlations
CN106021298A (en) * 2016-05-03 2016-10-12 广东工业大学 Asymmetrical weighing similarity based collaborative filtering recommendation method and system
CN107885778A (en) * 2017-10-12 2018-04-06 浙江工业大学 A kind of personalized recommendation method based on dynamic point of proximity spectral clustering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151441A1 (en) * 2011-12-13 2013-06-13 Xerox Corporation Multi-task learning using bayesian model with enforced sparsity and leveraging of task correlations
CN106021298A (en) * 2016-05-03 2016-10-12 广东工业大学 Asymmetrical weighing similarity based collaborative filtering recommendation method and system
CN107885778A (en) * 2017-10-12 2018-04-06 浙江工业大学 A kind of personalized recommendation method based on dynamic point of proximity spectral clustering

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148584A (en) * 2019-06-28 2020-12-29 北京达佳互联信息技术有限公司 Account information processing method and device, electronic equipment and storage medium
CN111008334A (en) * 2019-12-04 2020-04-14 华中科技大学 Top-K recommendation method and system based on local pairwise ordering and global decision fusion
CN111008334B (en) * 2019-12-04 2023-04-18 华中科技大学 Top-K recommendation method and system based on local pairwise ordering and global decision fusion
CN113239266A (en) * 2021-04-07 2021-08-10 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on local matrix decomposition
CN113239266B (en) * 2021-04-07 2023-03-14 中国人民解放军战略支援部队信息工程大学 Personalized recommendation method and system based on local matrix decomposition

Also Published As

Publication number Publication date
CN109636509B (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN109636509A (en) A kind of score in predicting method based on non symmetrical distance building submatrix
CN104123398B (en) A kind of information-pushing method and device
CN106776930B (en) A kind of location recommendation method incorporating time and geographical location information
CN110322053A (en) A kind of score in predicting method constructing local matrix based on figure random walk
CN107256241B (en) Movie recommendation method for improving multi-target genetic algorithm based on grid and difference replacement
CN108197285A (en) A kind of data recommendation method and device
CN107885778A (en) A kind of personalized recommendation method based on dynamic point of proximity spectral clustering
CN110059616A (en) Pedestrian's weight identification model optimization method based on fusion loss function
Guo et al. Feature selection based on Rough set and modified genetic algorithm for intrusion detection
CN106980659A (en) A kind of doings based on isomery graph model recommend method
CN107103336A (en) A kind of mixed attributes data clustering method based on density peaks
CN108563749B (en) Online education system resource recommendation method based on multi-dimensional information and knowledge network
CN110032682A (en) A kind of information recommendation list generation method, device and equipment
US20200327598A1 (en) Method and apparatus for interacting with information distribution system
CN110532351A (en) Recommend word methods of exhibiting, device, equipment and computer readable storage medium
CN108171535A (en) A kind of personalized dining room proposed algorithm based on multiple features
CN110532429A (en) It is a kind of based on cluster and correlation rule line on user group's classification method and device
CN105260460B (en) One kind is towards multifarious recommendation method
CN105989005B (en) A kind of method for pushing and device of information
CN108875071A (en) A kind of education resource recommended method based on multi-angle of view interest
CN110008411A (en) It is a kind of to be registered the deep learning point of interest recommended method of sparse matrix based on user
CN111008334B (en) Top-K recommendation method and system based on local pairwise ordering and global decision fusion
CN109544261A (en) A kind of intelligent perception motivational techniques based on diffusion and the quality of data
CN108280548A (en) Intelligent processing method based on network transmission
Eom et al. Improving image tag recommendation using favorite image context

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant