CN109636509A - A kind of score in predicting method based on non symmetrical distance building submatrix - Google Patents
A kind of score in predicting method based on non symmetrical distance building submatrix Download PDFInfo
- Publication number
- CN109636509A CN109636509A CN201811382976.4A CN201811382976A CN109636509A CN 109636509 A CN109636509 A CN 109636509A CN 201811382976 A CN201811382976 A CN 201811382976A CN 109636509 A CN109636509 A CN 109636509A
- Authority
- CN
- China
- Prior art keywords
- user
- article
- anchor point
- distance
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of score in predicting methods based on non symmetrical distance building submatrix, each user-article scoring is measured to the relationship between anchor point using non symmetrical distance, the interest of each anchor point neighborhood is allowed to concentrate near anchor point as far as possible, the submatrix of composition can reflect the concentration interest of a certain group user compared to using symmetry distance more, can be more preferable to the prediction result of the hollow lacuna of matrix;Anchor point is selected using the quick clustering method for finding density peak, each anchor point is made to have biggish neighborhood density, and is spaced each other farther out, the submatrix obtained so as to cause segmentation is evenly distributed, and it is representative, it can effectively cover original rating matrix;Each user-article in matrix is measured using non symmetrical distance to score to the relationship between anchor point, the number that scores under symmetry distance is reduced to punish to apart from bring very little, the inactive user or unexpected winner article less for scoring number, it has bigger probability to be divided into submatrix, improves submatrix to the level of coverage of data.
Description
Technical field
The invention belongs to personalized recommendation fields, construct submatrix based on non symmetrical distance more particularly, to a kind of
Score in predicting method.
Background technique
With the arrival in web2.0 epoch and the significant increase of network bandwidth, various social network-i i-platforms start
Existing, the information of fragmentation starts to be flooded with people's lives.In order to solve problem of information overload, personalized recommendation system is increasingly
Show its important value.Such as in electric business field, recommender system is emerging according to the historical behavior information architecture user's of user
Interesting model, calculate the article that user did not buy them likes degree, then recommends him that may like to user
Article.
In practical applications, it generallys use collaborative filtering method to the prediction of user preference to carry out, basic thought is: base
Neighbor user similar with its preference is found to the scoring of article in user, the article for then being liked the neighbor user is recommended
To active user, wherein foremost is exactly matrix decomposition technology.Matrix decomposition technology is by assuming that original rating matrix is complete
Office's low-rank, using the thought of SVD, disassemble is user's factor matrix and article factor matrix, allows the multiplying of the two matrixes
Product restores original rating matrix as far as possible, while predicting the vacancy item of original rating matrix.It is based on different from these complete
The matrix disassembling method of office's low-rank, proposes a kind of decomposition method based on local low-rank in recent years, selects at random in rating matrix
Several anchor points are selected, different submatrixs is assigned it to according to the distance of each data to different anchor points and is predicted.
However, this method relies on each point to the symmetry distance of anchor point to carry out submatrix division, obtained submatrix
Can not be well reflected the concentration interest of one group of similar users, at the same the division mode of submatrix by default anchor point (anchor point
Number and distribution) be affected, and that there are low volume data points to become because not being divided into any one submatrix is " isolated
Point ", these problems cause score in predicting result precision limited.
Summary of the invention
In view of the drawbacks of the prior art, it is an object of the invention to solve the score in predicting knot of recommender system in the prior art
The technical issues of fruit accuracy.
To achieve the above object, in a first aspect, the embodiment of the invention provides one kind to construct sub- square based on non symmetrical distance
The score in predicting method of battle array, method includes the following steps:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtain user characteristics matrix and
Article characteristics matrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, structure
Build the non symmetrical distance matrix between article;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to asymmetric between article
Distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance, according to
Article characteristics matrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article between anchor point article
Non symmetrical distance, calculates the similarity of user-article pair and anchor point, and determines the anchor neighborhood of a point, the anchor point according to similarity
It constitutes with its whole neighborhoods using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the son
User scores to the prediction of article in matrix;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
Specifically, in step S2, other users are calculated to the asymmetric of it using user characteristics matrix for user i '
Distance:
Wherein, Dkl(Pj′||Pi′) indicate that user j ' arrives the distance of user i ', Pi′Indicate that the row of user characteristics matrix the i-th ', K are
User characteristics matrix column number, i '=1,2 ... M, j '=1,2 ... M, M indicate the quantity of user;As between user
The i-th ' row jth ' column value of non symmetrical distance matrix;
For article i ", using article characteristics matrix, other articles are calculated to its non symmetrical distance:
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K are
Article characteristics matrix column number, i "=1,2 ... N, j "=1,2 ... N, N indicate the quantity of article,As between article
The value of i-th " row jth " column of non symmetrical distance matrix.
Specifically, step S3 includes following sub-step:
S301. it is less than distance threshold in the i-th row of the non symmetrical distance matrix between counting userElement number, make
For the density of user iIt counts and is less than distance threshold in the jth row of the non symmetrical distance matrix between articleElement
Number, the density as article j
S302. it for user i, finds density and is greater thanAll users, calculate separately in these users each user with
The average distance of user i, the minimum value of all average distances are the separating distance of user iFor article j, it is big to find density
InAll items, calculate separately the average distance of each article and article j in these articles, the minimum of all average distances
Value is the separating distance of article j
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen as anchor point
User calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article;
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair, anchor are obtained
Point user-anchor point article is to composition anchor point;
Wherein, i=1,2 ... M, j=1,2 ... N, M indicate the quantity of user, and N indicates the quantity of article.
Specifically, the average distance of point a and point b is equal to (disab+disba)/2, disabFor the non symmetrical distance of point a to b,
disbaFor the non symmetrical distance of point b to a.
Specifically, threshold valueSelection allow as far as possible neighbours' number average value of user close to total number of users 4%, threshold valueSelection allow neighbours' number average value of article close to the 4% of total number of items as far as possible.
Specifically, anchor point number q is automatically determined according to data set situation, specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculate i-th user representative's degree and come it
If difference is greater than preset threshold candidate anchor point user list is added in the user by the difference of 10 user representative's degree mean values afterwards, after
It is continuous to calculate i+1 user;If difference is not more than preset threshold, (2) are entered step;
(2) all items representative degree descending is arranged, in the arrangement, calculate j-th article representative degree and come it
If difference is greater than preset threshold candidate anchor point item lists are added in the article by the difference of 10 article representative degree mean values afterwards, after
It is continuous to calculate+1 article of jth;If difference is not more than preset threshold, (3) are entered step;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value
As anchor point number q.
Specifically, step S5 includes following sub-step:
S501. it calculates user-article and anchor point (u is arrived to (I, J)t,vt) similarity beCalculation formula is such as
Under:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, t=1,2 ... q;
S502. judgeWhether 0 is not equal to, if so, user-article is anchor point (u to (I, J)t,vt)
Neighborhood, otherwise, user-article are not anchor point (u to (I, J)t,vt) neighborhood;
S503. anchor point (ut,vt) and its whole neighborhoods constitute the submatrix using the anchor point as core.
Specifically, step S6 specifically: by anchor point (ut,vt) be core submatrix be decomposed into user's factor matrix UtWith
Article factor matrix Vt, by the continuous repetitive exercise of gradient descent method, objective function is as follows:
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, as matrix decomposition
Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, U when training restrainstAnd VtIt is exactly me
The user's factor matrix and article factor matrix to be obtained, t=1,2 ... q;
User I scores to the prediction of article J in the submatrix
Specifically, user is as follows to the calculation formula of the final prediction scoring of article in step S7:
Wherein,Indicate that user I scores to the final prediction of article J,Indicate that user I is to object in t-th of submatrix
The prediction of product J is scored,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, (ut,vt) indicate the
T anchor point.
Second aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage mediums
Computer program is stored in matter, which realizes score in predicting described in above-mentioned first aspect when being executed by processor
Method.
In general, through the invention it is contemplated above technical scheme is compared with the prior art, have below beneficial to effect
Fruit:
1. the present invention is measured each user-article in matrix using non symmetrical distance and scored to the pass between anchor point
System, allow the interest of each anchor point neighborhood to concentrate near anchor point as far as possible, therefore constitute submatrix compared to use symmetrically away from
, can be more preferable to the prediction result of the hollow lacuna of matrix from the concentration interest that can more reflect a certain group user, prediction accuracy is more
It is high.
2. the present invention selects anchor point using a kind of clustering method for quickly finding density peak, have each anchor point larger
Neighborhood density, and be spaced farther out, be evenly distributed so as to cause the obtained submatrix of segmentation each other, it is representative,
Original rating matrix can effectively be covered.
3. each neighborhood of a point density and spacing that the present invention is measured during selecting anchor point have certain point
Cloth characteristic enables anchor point number to be adjusted according to data set size adaptation, improves so as to automatically determine anchor point number
Training effectiveness.
4. the present invention is measured each user-article in matrix using non symmetrical distance and scored to the pass between anchor point
System reduces the number that scores under symmetry distance and punishes very little to apart from bring, the inactive user less for scoring number
Or unexpected winner article, it has bigger probability and is divided into submatrix, improve submatrix to the level of coverage of data, prediction essence
Exactness is higher.
Detailed description of the invention
Fig. 1 is a kind of score in predicting method flow that submatrix is constructed based on non symmetrical distance provided in an embodiment of the present invention
Figure;
Fig. 2 is that index representative degree γ descending during anchor point is selected to arrange schematic diagram in the embodiment of the present invention;
Fig. 3 is the anchor point neighborhood schematic diagram provided in an embodiment of the present invention determined using non symmetrical distance;
Fig. 4 is the anchor point neighborhood schematic diagram determined in the prior art using symmetry distance;
Fig. 5 is the schematic diagram provided in an embodiment of the present invention that submatrix is constructed according to anchor point.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
Integral Thought of the invention is, first according to user each in data set, the density size and and density of article
More a little bigger distance automatically determines best anchor point number, and picks out suitable user and article composition anchor point;Then according to number
According to concentrating other users or article to find neighborhood to the non symmetrical distance of anchor point for each anchor point, to form sub- square one by one
Battle array;The matrix decomposition for carrying out Weight in each submatrix later carries out score in predicting to target item.Finally by every sub- square
Prediction result in battle array is combined, and forms final prediction result.
As shown in Figure 1, a kind of score in predicting method based on non symmetrical distance building submatrix, this method includes following step
It is rapid:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtain user characteristics matrix and
Article characteristics matrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, structure
Build the non symmetrical distance matrix between article;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to asymmetric between article
Distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance, according to
Article characteristics matrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article between anchor point article
Non symmetrical distance, calculates the similarity of user-article pair and anchor point, and determines the anchor neighborhood of a point, the anchor point according to similarity
It constitutes with its whole neighborhoods using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the son
User scores to the prediction of article in matrix;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
Step S1. decomposes user-article rating matrix using non-negative matrix factorization method, obtains user characteristics square
Battle array and article characteristics matrix.
According to user data, product data, user to the data of the scoring of article, user-article rating matrix is constructed
RM×N。
Wherein, M indicates the quantity of user, and N indicates the quantity of article, RijScoring of the expression user i to article j, i=1,
2 ... M, j=1,2 ... N.
User-article rating matrix R is decomposed using non-negative matrix factorization methodM×N, obtain user characteristics matrix PM×KAnd object
Product eigenmatrix QN×K。
It is required when same:
Wherein, PM×KIndicate the relationship between M user and K theme, QN×KIt indicates between N number of article and K theme
Relationship.PM×KEvery a line represent a user K dimension feature vector, QN×KEvery a line represent an article K dimension feature
Vector.
The feature vector that user, article are extracted by Non-negative Matrix Factorization avoids directly using one in rating matrix
Row or a column are as problem excessively sparse caused by feature vector.Meanwhile Non-negative Matrix Factorization ensure that feature vector is every
A dimension is all positive value, is conducive to next processing.
Step S2. constructs the non symmetrical distance matrix between user, according to article characteristics square according to user characteristics matrix
Battle array constructs the non symmetrical distance matrix between article.
Non symmetrical distance is the distance for giving directions the distance of a to point b to be not equal to point b to point a.The distance of a to b of setting up an office is
disab, the distance of point b to a is disba, disab≠disba。
For user i ', user characteristics matrix P is usedM×K, non symmetrical distance of the calculating other users to it
Wherein, Dkl(Pj′||Pi′) indicate that user j ' arrives the distance of user i ', Pi′Indicate that the row of user characteristics matrix the i-th ', K are
User characteristics matrix column number.Non symmetrical distance matrix is finally constitutedThe row of matrix the i-th ' indicates other users to the
The non symmetrical distance of a user of i 'Wherein, i '=1,2 ... M, j '=1,2 ... M.
For article i ", article characteristics matrix Q is usedN×K, other articles are calculated to its non symmetrical distance
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K are
Article characteristics matrix column number.Non symmetrical distance matrix is finally constituted" row indicates other articles to the to matrix i-th
The non symmetrical distance of a article of i "Wherein ,=1,2 ...=1,2 ... N of N, j " of i ".
Step S3. chooses q anchor point user, according to non-between article according to the non symmetrical distance matrix between user
Symmetry distance matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point.
S301. non symmetrical distance matrix is countedIt is less than distance threshold in i-th rowElement number, as with
The density of family iCount non symmetrical distance matrixIt is less than distance threshold in jth rowElement number, as article
The density of jI=1,2 ... M, j=1,2 ... N.
If the distance of user j ' to user i ' is less than distance threshold, user j ' is the neighbours of user i '.Non symmetrical distance
MatrixIt is less than distance threshold in i-th rowElement number, as neighbours' number of user i.Threshold valueSelection it is most
It is about the 4% of total number of users that neighbours' number average value of user, which may be allowed,.
If the distance of article j " arriving article i " is less than distance threshold, the neighbours of article j " being article i ".Non symmetrical distance
MatrixIt is less than distance threshold in jth rowElement number, as neighbours' number of article j.Threshold valueSelection it is most
It is about the 4% of total number of items that neighbours' number average value of article, which may be allowed,.
S302. it for user i, finds density and is greater thanAll users, calculate separately in these users each user with
The average distance of user i, the minimum value of all average distances are the separating distance of user iFor article j, it is big to find density
InAll items, calculate separately the average distance of each article and article j in these articles, all average distances are most
The separating distance that small value is article j
The average distance of point a and point b is equal to (disab+disba)/2, disabFor the non symmetrical distance of point a to b, disbaFor
The non symmetrical distance of point b to a.
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen as anchor point
User calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article.
ρ and δ are bigger, and γ is also bigger, illustrate that the point is more suitable as anchor point.The selection standard of anchor point are as follows: anchor point has larger
Neighborhood density, anchor point and density it is bigger point distance farther out.By limit anchor point to have larger neighborhood density and with density more
Big point distance farther out, not only has enough training datas using the submatrix that each anchor point is constituted as core, while each other
Also there is certain difference, the submatrix that this obtains segmentation is more reasonable, also allows submatrix that can larger cover as far as possible original
Matrix.
Anchor point number q is automatically determined according to data set situation, specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculate i-th user representative's degree and come it
If difference is greater than preset threshold candidate anchor point user list is added in the user by the difference of 10 user representative's degree mean values afterwards, after
It is continuous to calculate i+1 user;If difference is not more than preset threshold, stop calculating, i=1,2 ... M;
(2) all items representative degree descending is arranged, in the arrangement, calculate j-th article representative degree and come it
If difference is greater than preset threshold candidate anchor point item lists are added in the article by the difference of 10 article representative degree mean values afterwards, after
It is continuous to calculate+1 article of jth;If difference is not more than preset threshold, stop calculating, j=1,2 ... N;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value
As anchor point number q.
Preset threshold is preferably 0.1.If difference is not more than threshold value, then it is assumed that γ value has tended towards stability.According to every number
According to the difference of collection, the minimum anchor point number for being suitble to each data set can be adaptively determined by automatically selecting anchor point counting method,
The building that useless submatrix can be reduced, improves efficiency.
As shown in Fig. 2, abscissa n indicates the sequence serial number of these points after the arrangement of γ descending, ordinate γ indicates these points
Representative degree.In the descending arrangement of γ, more forward point is more suitable as anchor point.Be suitable as anchor point user γ variation compared with
Greatly, the user γ for being not suitable as anchor point is smoother, and there is certain jump between the two.By the jump for finding γ
Place can determine anchor point number q.The point and remaining each point for being suitable for anchor point are separated by dotted line, it can be seen that automatically select
Anchor point can select the anchor point of suitable number according to data set actual conditions, improve efficiency.
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair, anchor are obtained
Point user-anchor point article is to composition anchor point.
Step S4. is for each anchor point, according to user characteristics matrix, calculate user to anchor point user non symmetrical distance,
According to article characteristics matrix, article is calculated to the non symmetrical distance between anchor point article.
The non symmetrical distance of user to anchor point user are calculated using formula (4), and article is to asymmetric between anchor point article
Distance is calculated using formula (5).
Step S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article to anchor point article it
Between non symmetrical distance, calculate the similarity of user-article pair and anchor point, and the anchor neighborhood of a point determined according to similarity, this
Anchor point and its whole neighborhoods are constituted using the anchor point as the submatrix of core.
S501. calculate user-article is to the similarity of (I, J) to anchor point (u, v)Calculation formula is as follows:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, value 8.
S502. judgeWhether 0 is not equal to, if so, user-article is the neighbour of anchor point (u, v) to (I, J)
Domain, otherwise, user-article are not the neighborhood of anchor point (u, v) to (I, J).
Only meeting user I simultaneously is the neighbours that the neighbours of anchor point user u, article J are anchor point article v, user-article pair
(I, J) is only the neighborhood of anchor point (u, v).
As shown in figure 3, using the distance of non symmetrical distance measure user A and user B, if the interest distribution of A is B interest point
The subset of cloth, then the distance of A to B is small, but the distance of B to A is not necessarily small.Anchor point user and other use are measured with non symmetrical distance
The distance at family, the interest distribution of obtained neighborhood user is substantially all within the interest distribution of anchor point user, therefore neighborhood
The interest of user can compare concentration.So no matter how anchor neighborhood of a point article characteristics vector is distributed, user in the submatrix
Interest is all almost consistent.As shown in figure 4, interest of the A in B is distributed it using the distance of symmetry distance measure user A and B
Outside, it is more likely that there are also other interest to be distributed.Therefore the interest of anchor neighborhood of a point user is distributed in the interest distribution of anchor point user
Interior consistent degree is big, and it is small to be distributed outer consistent degree in the interest of anchor point user.If the interest of neighborhood article is distributed just in anchor point user
Interest distribution except, the interest of these neighborhoods user just cannot keep unanimously, thus submatrix also different surely generation of composition
The concentration interest of the table group user.
Neighborhood is found for each anchor point it follows that using non symmetrical distance, the submatrix constituted can be than using pair
The concentration interest of a certain group user, the prediction result of matrix decomposition can more be reacted by claiming distance to find the submatrix that neighborhood is constituted
It can be more preferable.Non symmetrical distance reduces the number that scores under symmetry distance and punishes very little to apart from bring simultaneously, a for scoring
Number less inactive user or unexpected winner article, have bigger probability and are divided into submatrix, and this considerably reduce isolated
The presence of point.
S503. anchor point (u, v) and its whole neighborhoods constitute the submatrix using the anchor point as core.
Each anchor point and its neighborhood constitute the sub- rating matrix using the anchor point as core, and q anchor point just will form q
Submatrix.For example, for anchor point (ut,vt), ownPoint (I, J) not equal to 0 is constituted together with (ut,vt) it is core
The submatrix that the heart is constituted.
Step S6. is trained the submatrix using the matrix disassembling method of Weight, obtains for each submatrix
User scores to the prediction of article in the submatrix.
Submatrix is decomposed into user's factor matrix U and article factor matrix V, passes through the continuous iteration of gradient descent method
Training, so that the product of U and V becomes closer to ewal matrix.Predicted value is exactly to keep its loss function minimum close to true value, this
It is our objective function.
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, as matrix decomposition
Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, generally takes 0.001-0.1.
U when training convergencetAnd VtIt is exactly our user's factor matrixs and article factor matrix to be obtained, t=1,2 ...
q;
User I scores to the prediction of article J in the submatrixUtIndicate the use of t-th of submatrix
Family factor matrix, VtIndicate the article factor matrix of t-th of submatrix, t=1,2 ... q.
As shown in figure 5, finding adjacent region data after q anchor point of selection for each anchor point, and then it may be constructed q sub- squares
Battle array.Matrix decomposition is carried out to each submatrix, to obtain the prediction scoring of each submatrix.
The prediction scoring of q submatrix is weighted and averaged by step S7., obtains final pre- assessment of the user to article
Point.
Wherein,Indicate that user I scores to the final prediction of article J, (ut,vt) indicate t-th of anchor point.
For the prediction effect for verifying prediction technique proposed by the present invention, selection know Live, movielens-100k and
Tri- data sets of movielens-1m compare the method for the present invention, the global score in predicting side using big matrix as research object
Method chooses the score in predicting method of building submatrix based on symmetry distance and random anchor point, is based on symmetry distance and preference anchor point
The prediction error of the score in predicting method of building submatrix is chosen, comparing result is as shown in table 1-3.
Table 1
Table 2
Table 3
By above-mentioned comparing result it is found that the score in predicting side proposed by the present invention based on non symmetrical distance building submatrix
Method, more traditional score in predicting method is significantly improved in the evaluation index of RMSE and MAE, shows in test set coverage rate
Writing improves the problem of test data can not be completely covered in submatrix in other score in predicting methods based on building submatrix.This
It is that neighborhood has been more, interval farther away anchor point again due to having selected when constructing submatrix, while is sought using non symmetrical distance
Anchor neighborhood of a point is looked for, so that the submatrix discrimination constituted is larger, more fully to the covering of big matrix, also can more react group
The concentration interest of user, therefore reduce prediction error.
More than, the only preferable specific embodiment of the application, but the protection scope of the application is not limited thereto, and it is any
Within the technical scope of the present application, any changes or substitutions that can be easily thought of by those familiar with the art, all answers
Cover within the scope of protection of this application.Therefore, the protection scope of the application should be subject to the protection scope in claims.
Claims (10)
1. a kind of score in predicting method based on non symmetrical distance building submatrix, which is characterized in that this method includes following step
It is rapid:
S1. user-article rating matrix is decomposed using non-negative matrix factorization method, obtains user characteristics matrix and article
Eigenmatrix;
S2. according to user characteristics matrix, the non symmetrical distance matrix between user is constructed, according to article characteristics matrix, construction
Non symmetrical distance matrix between product;
S3. according to the non symmetrical distance matrix between user, q anchor point user is chosen, according to the non symmetrical distance between article
Matrix chooses q anchor point article, anchor point user and anchor point article random pair, constitutes q anchor point;
S4. for each anchor point, according to user characteristics matrix, the non symmetrical distance of calculating user to anchor point user, according to article
Eigenmatrix calculates article to the non symmetrical distance between anchor point article;
S5. for each anchor point, according to the non symmetrical distance of user to anchor point user and article to non-right between anchor point article
Claim distance, calculate the similarity of user-article pair and anchor point, and the anchor neighborhood of a point is determined according to similarity, the anchor point and it
Whole neighborhoods constitute using the anchor point as the submatrix of core;
S6. for each submatrix, the submatrix is trained using the matrix disassembling method of Weight, obtains the submatrix
Middle user scores to the prediction of article;
S7. the prediction scoring of q submatrix is weighted and averaged, obtains user and scores the final prediction of article.
2. score in predicting method as described in claim 1, which is characterized in that in step S2, for user i ', use user spy
Matrix is levied, non symmetrical distance of the calculating other users to it:
Wherein, Dkl(Pj′||Pi) indicate that user j ' arrives the distance of user i ', Pi′Indicate the row of user characteristics matrix the i-th ', K is user
The columns of eigenmatrix, i '=1,2 ... M, j '=1,2 ... M, M indicate the quantity of user;It is as asymmetric between user
The i-th ' row jth ' column value of distance matrix;
For article i ", using article characteristics matrix, other articles are calculated to its non symmetrical distance:
Wherein, Dkl(Qj″||Qi″) indicate the distance of article j " arrive article i ", Qi″Indicating article eigenmatrix i-th, " row, K is article
The columns of eigenmatrix, i "=1,2 ... N, j "=1,2 ... N, N indicate the quantity of article,It is non-right as between article
Claim the value of i-th " row jth " column of distance matrix.
3. score in predicting method as described in claim 1, which is characterized in that step S3 includes following sub-step:
S301. it is less than distance threshold in the i-th row of the non symmetrical distance matrix between counting userElement number, as with
The density of family iIt counts and is less than distance threshold in the jth row of the non symmetrical distance matrix between articleElement number, make
For the density of article j
S302. it for user i, finds density and is greater thanAll users, calculate separately each user and user i in these users
Average distance, the minimum values of all average distances is the separating distance of user iFor article j, finds density and be greater than
All items, calculate separately the average distance of each article and article j in these articles, the minimum value of all average distances is
The separating distance of article j
S303. the representative degree of user i is calculatedThe maximum q user of user representative's degree is chosen to use as anchor point
Family calculates the representative degree of article jThe maximum q article of article representative degree is chosen as anchor point article;
S304. by q anchor point user and q anchor point article random pair, q anchor point user-anchor point article pair is obtained, anchor point is used
Family-anchor point article is to composition anchor point;
Wherein, i=1,2 ... M, j=1,2 ... N, M indicate the quantity of user, and N indicates the quantity of article.
4. score in predicting method as claimed in claim 3, which is characterized in that the average distance of point a and point b is equal to (disab+
disba)/2, disabFor the non symmetrical distance of point a to b, disbaFor the non symmetrical distance of point b to a.
5. score in predicting method as claimed in claim 3, which is characterized in that threshold valueSelection allow the neighbours of user as far as possible
Number average value close to total number of users 4%, threshold valueSelection allow neighbours' number average value of article close to article as far as possible
The 4% of sum.
6. score in predicting method as claimed in claim 3, which is characterized in that anchor point number q is automatic according to data set situation
Determining, it is specific as follows:
(1) all user representative's degree descendings are arranged, in the arrangement, calculates i-th user representative's degree and row after it 10
The user is added candidate anchor point user list, continues to count by the difference of a user representative's degree mean value if difference is greater than preset threshold
Calculate i+1 user;If difference is not more than preset threshold, (2) are entered step;
(2) all items representative degree descending is arranged, in the arrangement, calculates j-th article representative degree and row after it 10
The article is added candidate anchor point item lists, continues to count by the difference of a article representative degree mean value if difference is greater than preset threshold
Calculate+1 article of jth;If difference is not more than preset threshold, (3) are entered step;
(3) compare the article number in the user's number and candidate item list in candidate user list, select the larger value as
Anchor point number q.
7. score in predicting method as described in claim 1, which is characterized in that step S5 includes following sub-step:
S501. it calculates user-article and anchor point (u is arrived to (I, J)t,vt) similarity beCalculation formula is as follows:
Wherein, Dkl(s1||s2) indicate s1To s2Non symmetrical distance, h is distance threshold, t=1,2 ... q;
S502. judgeWhether 0 is not equal to, if so, user-article is anchor point (u to (I, J)t,vt) neighborhood,
Otherwise, user-article is not anchor point (u to (I, J)t,vt) neighborhood;
S503. anchor point (ut,vt) and its whole neighborhoods constitute the submatrix using the anchor point as core.
8. score in predicting method as described in claim 1, which is characterized in that step S6 specifically: by anchor point (ut,vt) it is core
The submatrix of the heart is decomposed into user's factor matrix UtWith article factor matrix Vt, by the continuous repetitive exercise of gradient descent method,
Objective function is as follows:
Wherein,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, the power as matrix decomposition
Weight;RIJIndicate practical scoring of the user I to article J, λ indicates regularization coefficient, U when training restrainstAnd VtIt is exactly that we want
The user's factor matrix and article factor matrix of acquisition, t=1,2 ... q;
User I scores to the prediction of article J in the submatrix
9. score in predicting method as described in claim 1, which is characterized in that final pre- assessment of the user to article in step S7
The calculation formula divided is as follows:
Wherein,Indicate that user I scores to the final prediction of article J,Indicate that user I is to article J's in t-th of submatrix
Prediction scoring,Indicate that user-article arrives anchor point (u to (I, J)t,vt) similarity, (ut,vt) indicate t-th of anchor
Point.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize score in predicting method as described in any one of claim 1 to 9 when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811382976.4A CN109636509B (en) | 2018-11-20 | 2018-11-20 | Scoring prediction method for constructing submatrix based on asymmetric distance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811382976.4A CN109636509B (en) | 2018-11-20 | 2018-11-20 | Scoring prediction method for constructing submatrix based on asymmetric distance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109636509A true CN109636509A (en) | 2019-04-16 |
CN109636509B CN109636509B (en) | 2020-12-18 |
Family
ID=66068390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811382976.4A Active CN109636509B (en) | 2018-11-20 | 2018-11-20 | Scoring prediction method for constructing submatrix based on asymmetric distance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109636509B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008334A (en) * | 2019-12-04 | 2020-04-14 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN112148584A (en) * | 2019-06-28 | 2020-12-29 | 北京达佳互联信息技术有限公司 | Account information processing method and device, electronic equipment and storage medium |
CN113239266A (en) * | 2021-04-07 | 2021-08-10 | 中国人民解放军战略支援部队信息工程大学 | Personalized recommendation method and system based on local matrix decomposition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130151441A1 (en) * | 2011-12-13 | 2013-06-13 | Xerox Corporation | Multi-task learning using bayesian model with enforced sparsity and leveraging of task correlations |
CN106021298A (en) * | 2016-05-03 | 2016-10-12 | 广东工业大学 | Asymmetrical weighing similarity based collaborative filtering recommendation method and system |
CN107885778A (en) * | 2017-10-12 | 2018-04-06 | 浙江工业大学 | A kind of personalized recommendation method based on dynamic point of proximity spectral clustering |
-
2018
- 2018-11-20 CN CN201811382976.4A patent/CN109636509B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130151441A1 (en) * | 2011-12-13 | 2013-06-13 | Xerox Corporation | Multi-task learning using bayesian model with enforced sparsity and leveraging of task correlations |
CN106021298A (en) * | 2016-05-03 | 2016-10-12 | 广东工业大学 | Asymmetrical weighing similarity based collaborative filtering recommendation method and system |
CN107885778A (en) * | 2017-10-12 | 2018-04-06 | 浙江工业大学 | A kind of personalized recommendation method based on dynamic point of proximity spectral clustering |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112148584A (en) * | 2019-06-28 | 2020-12-29 | 北京达佳互联信息技术有限公司 | Account information processing method and device, electronic equipment and storage medium |
CN111008334A (en) * | 2019-12-04 | 2020-04-14 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN111008334B (en) * | 2019-12-04 | 2023-04-18 | 华中科技大学 | Top-K recommendation method and system based on local pairwise ordering and global decision fusion |
CN113239266A (en) * | 2021-04-07 | 2021-08-10 | 中国人民解放军战略支援部队信息工程大学 | Personalized recommendation method and system based on local matrix decomposition |
CN113239266B (en) * | 2021-04-07 | 2023-03-14 | 中国人民解放军战略支援部队信息工程大学 | Personalized recommendation method and system based on local matrix decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN109636509B (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109636509A (en) | A kind of score in predicting method based on non symmetrical distance building submatrix | |
CN104123398B (en) | A kind of information-pushing method and device | |
CN106776930B (en) | A kind of location recommendation method incorporating time and geographical location information | |
CN110322053A (en) | A kind of score in predicting method constructing local matrix based on figure random walk | |
CN107256241B (en) | Movie recommendation method for improving multi-target genetic algorithm based on grid and difference replacement | |
CN108197285A (en) | A kind of data recommendation method and device | |
CN107885778A (en) | A kind of personalized recommendation method based on dynamic point of proximity spectral clustering | |
CN110059616A (en) | Pedestrian's weight identification model optimization method based on fusion loss function | |
Guo et al. | Feature selection based on Rough set and modified genetic algorithm for intrusion detection | |
CN106980659A (en) | A kind of doings based on isomery graph model recommend method | |
CN107103336A (en) | A kind of mixed attributes data clustering method based on density peaks | |
CN108563749B (en) | Online education system resource recommendation method based on multi-dimensional information and knowledge network | |
CN110032682A (en) | A kind of information recommendation list generation method, device and equipment | |
US20200327598A1 (en) | Method and apparatus for interacting with information distribution system | |
CN110532351A (en) | Recommend word methods of exhibiting, device, equipment and computer readable storage medium | |
CN108171535A (en) | A kind of personalized dining room proposed algorithm based on multiple features | |
CN110532429A (en) | It is a kind of based on cluster and correlation rule line on user group's classification method and device | |
CN105260460B (en) | One kind is towards multifarious recommendation method | |
CN105989005B (en) | A kind of method for pushing and device of information | |
CN108875071A (en) | A kind of education resource recommended method based on multi-angle of view interest | |
CN110008411A (en) | It is a kind of to be registered the deep learning point of interest recommended method of sparse matrix based on user | |
CN111008334B (en) | Top-K recommendation method and system based on local pairwise ordering and global decision fusion | |
CN109544261A (en) | A kind of intelligent perception motivational techniques based on diffusion and the quality of data | |
CN108280548A (en) | Intelligent processing method based on network transmission | |
Eom et al. | Improving image tag recommendation using favorite image context |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |