CN110134874A - A kind of collaborative filtering method optimizing user's similarity - Google Patents
A kind of collaborative filtering method optimizing user's similarity Download PDFInfo
- Publication number
- CN110134874A CN110134874A CN201910312071.8A CN201910312071A CN110134874A CN 110134874 A CN110134874 A CN 110134874A CN 201910312071 A CN201910312071 A CN 201910312071A CN 110134874 A CN110134874 A CN 110134874A
- Authority
- CN
- China
- Prior art keywords
- user
- similarity
- scoring
- project
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000011156 evaluation Methods 0.000 claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 22
- 238000005457 optimization Methods 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000009423 ventilation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention proposes a kind of collaborative filtering methods for optimizing user's similarity.While not increasing server delay, the precision of proposed algorithm is improved.The characteristics of this method, is: by being standardized pretreatment to user's score data, calculate Pearson similarity, the evaluation weight of user vector distance and asymmetrical similarity weight, and then Pearson similarity is optimized, so that traditional collaborative filtering recommends precision to be improved.This method is suitable for user --- the data set of project scoring.
Description
Technical field
For the recommender system based on collaborative filtering, the invention proposes a kind of collaborative filtering sides for optimizing user's similarity
Method.
Background technique
It the fast development of internet and popularizes as user's acquisition, share and propagate information and provide a great convenience.With this
Meanwhile increasing substantially for information content but reduces the utilization rate of information, make user be difficult to obtain from network in time it is true to oneself
Just useful information, causes information overload problem.A kind of method that can successfully manage information overload problem is design recommender system,
It according to the demand of user, the information such as interest, by the interested content of user and Products Show to user.With search engine phase
Than recommender system carries out personalized calculating, to find the point of interest of user, Jin Eryin by interest, the preference of research user
The information requirement that user has found oneself is led, and obtains the information useful to oneself.Good recommender system can not only provide for user
Personalized service, moreover it is possible to establish mutual substantial connection for different user, user is allowed to generate dependence to recommendation.
Recommender system mainly includes information filtering and collaborative filtering.Before the recommender system of Cempetency-based education is according to user
Browsing or purchaser record obtain user's concerned item purpose feature, the new projects for being best suitable for user interest profile are recommended into use
Family.And the recommender system based on collaborative filtering is similar between obtaining user by the similitude of historical record between calculating user
Degree searches other users similar with target user's preference, by the interested project recommendation of this kind of user to target user.
The recommender system of Cempetency-based education only considers target user itself, and the recommender system based on collaborative filtering is then abundant
Group wisdom is utilized, i.e., collects answer in the behavior and data of a large amount of crowd, the personalization level of recommendation is higher, so
Collaborative Filtering Recommendation Algorithm be most widely used in personalized ventilation system, effective proposed algorithm.
Recommender system based on collaborative filtering is divided into the Collaborative Filtering Recommendation System based on model and the association based on memory again
With filtered recommendation system.The former mainly utilizes the methods of machine learning, data mining and statistics, to the historical data of user
It is trained, then constructs corresponding user model, provide prediction and recommendation using the model for user, be related to matrix point
Solution, the technologies such as hidden semantic analysis.The latter is divided into Collaborative Filtering Recommendation System based on user and project-based collaborative filtering pushes away
Recommend system.
Though traditional measures similarity, non-logarithm using Pearson formula based on the Collaborative Filtering Recommendation System of user
Pre-processed according to collection, do not consider user score distance between vector, do not consider the inequalities of similarity relationships between user,
The recommendation quality of recommender system can be made to decline.For this purpose, the present invention is directed to the Collaborative Filtering Recommendation Algorithm based on user, to above-mentioned three
Point optimizes, and improves and recommends quality.
Summary of the invention
This invention address that reducing the mean absolute error value of traditional Collaborative Filtering Recommendation Algorithm based on user, effectively
The recommendation quality for improving recommender system provides a kind of collaborative filtering method for optimizing user's similarity.
In order to achieve the above objectives, the following technical solutions are proposed by the present invention:
A kind of collaborative filtering method optimizing user's similarity, by standardizing the scoring vector of user, in conjunction with user
The evaluation weight of vector distance, asymmetrical similarity weight optimize Pearson similarity, finally carry out the pre- of user's scoring
It surveys, the specific steps are as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base;
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark
User's scoring vector after standardization, generates user --- project rating matrix;
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates
Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with commenting for user vector distance
Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to
Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix;
4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user,
By scoring, formula predicts the non-scoring item of target user.
Compared with prior art, the present invention has the advantage that:
The method of the present invention carries out the optimization of user's similarity to proposed algorithm module in collaborative filtering, so that recommending system
System recommends quality to effectively improve while not increasing server delay.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Specific embodiment
With reference to the accompanying drawing, specific embodiments of the present invention are described further.
The present embodiment is for MovieLens-100k data set (can be from website https: //movielens.org/ downloading)
Instance analysis is carried out, which covers 943 users to total 100,000 scorings record of 1682 films, score value 1
Integer between to 5, wherein 1 representative evaluation is minimum, 5 represent evaluation highests.Each user commented at least 20 films
Point.80% data are training set in data set, and 20% data are test set.
As shown in Figure 1, a kind of collaborative filtering method for optimizing user's similarity, by by the scoring vector standard of user
Change, Pearson similarity is optimized in conjunction with evaluation weight, the asymmetrical similarity weight of user vector distance, is finally carried out
The prediction of user's scoring, the specific steps are as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base.
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark
User's scoring vector after standardization, generates user --- project rating matrix, the specific steps are as follows:
If the scoring vector of u-th of user is R in training setu=(r(u,1),r(u,2),…,r(u,m)), wherein r(u,m)It indicates
Scoring of the user u to project m;As shown in formula (1), with Z-score method to RuIt is standardized, wherein z(u,m)It is standardization
Scoring of the user u to project m afterwards,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component:
The scoring vector of user u after standardization is denoted as Zu=(z(u,1),z(u,2),…,z(u,m)), ZuMean value is 0, standard deviation
It is 1.User-project rating matrix that size is 943 × 1682 is generated, wherein 943 be number of users, 1682 be the number of entry.
ZuIt is recorded in user --- the score value of the user u project not scored is denoted as 0 by the u row of project rating matrix.
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates
Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with commenting for user vector distance
Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to
Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix.With
In the training set of MovieLens-100k for any two user u and user v, user u is calculated to the similarity of user v, tool
Steps are as follows for body:
3.1) Pearson similarity is calculated: as shown in formula (2), with Pearson similarity formula measure user u and user v
Pearson similarity sim(u,v), wherein set S is the project set of user u and user v to score jointly:
3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating ZuAnd ZvUser vector distance
Evaluation weight D(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, α table
The threshold value for showing the scoring gap of an independent project, if α is bigger, D(u,v)Closer to 1, if α is smaller, D(u,v)Closer to 0:
3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u to the asymmetrical similar of user v
Spend weight w(u,v), wherein S is the common scoring item set of user u and user v, IuIt is the scoring item set of user u, N
(S) element number for being set S, N (Iu) it is set IuElement number:
3.4) user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), optimized
User u is sim ' to the similarity of user v afterwards(u,v):
sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)
3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain 943 × 943
User's similarity matrix.
4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user,
By scoring, formula predicts the non-scoring item of target user, with one of user u any in training set in this example
For non-scoring item a, calculates user u and scores the prediction of project a, the specific steps are as follows:
4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as Ua=
{u(1,a),u(2,a),…,u(q,a), wherein u(q,a)Indicate q-th of user for evaluating project a;According to this q user and user u
Similarity size, be ranked up by the sequence of similarity from big to small, be denoted as U 'a={ u '(1,a),u′(2,a),…,u′(q,a)};
Again from sorted user's set U 'aNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection(1,a),
u′(2,a),…,u′(k,a)}。
4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)(u,a), wherein
Set U is the neighbor user set of user u,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component, z(v,a)It is to use
Normalized score of the family v to project a, sim '(u,v)It is similarity of the user u to user v:
Such as formula (7), recommendation precision is portrayed using mean absolute error MAE (Mean Absolute Error), MAE is got over
Small specification error is smaller, and precision is higher, wherein piIt is that user scores to the prediction of project i, riBe in test set user to project i
Practical scoring, n is the scoring quantity in test set:
The size of neighborhood takes 10 in the present embodiment, and MAE value of the invention is 0.74086, uses Pearson than original
Similarity low 3.06%.The time that the method for the present invention calculates user's similarity matrix needs is 61.3 seconds, what score in predicting needed
Time is 7.04 seconds.In practical applications, calculating user's similarity matrix can be completed by off-line calculation, and user uses this
When invention proposed algorithm carries out the scoring of on-line prediction project, the real-time calculating time used is almost consistent with original method, does not have
Increase the time that user waits online.
Claims (4)
1. a kind of collaborative filtering method for optimizing user's similarity, by the way that the scoring vector of user is standardized, in conjunction with user to
Span from evaluation weight, asymmetrical similarity weight to Pearson similarity optimize, finally carry out user's scoring it is pre-
It surveys, which is characterized in that specific step is as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base;
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and establishing criteria
User's scoring vector afterwards, generates user --- project rating matrix;
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson phase
Like degree, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with the evaluation weight, non-of user vector distance
Symmetrical similarity weight optimizes Pearson similarity, the similarity formula after obtaining optimization, according to the phase after optimization
The similarity that each user and other users are calculated like degree formula, ultimately generates similarity matrix;
4) prediction scoring: according to the similarity of target user and other users, the neighbor user set of target user is calculated, is passed through
Scoring formula predicts the non-scoring item of target user.
2. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that the step 2)
Specific step is as follows: setting the scoring vector of u-th of user in training set as Ru=(r(u,1),r(u,2),…,r(u,m)), wherein z(u,m)
It is scoring of the user u to project m after standardizing, r(u,m)Indicate scoring of the user u to project m;As shown in formula (1), with Z-
Score method is to RuIt is standardized, wherein z(u,m)It is scoring of the user u to project m after standardizing,It is RuEach component is put down
Mean value, σuIt is RuThe standard deviation of each component:
The scoring vector of user u after standardization is denoted as Zu=(z(u,1),z(u,2),…,z(u,m)), ZuMean value is 0, standard deviation 1;
Then, user --- project rating matrix is generated;ZuBe recorded in user --- the u row of project rating matrix, by user u not into
The score value of the project of row scoring is denoted as 0.
3. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 3)
By taking any two user u in training set and user v as an example, user u is calculated to the similarity of user v, the specific steps are as follows:
3.1) Pearson similarity is calculated: as shown in formula (2), with Pearson similarity formula measure user u and user v
Pearson similarity sim(u,v), wherein set S is the project set of user u and user v to score jointly:
3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating ZuAnd ZvUser vector distance evaluation
Weight D(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, and α indicates single
The threshold value of the scoring gap of an only project, if α is bigger, D(u,v)Closer to 1, if α is smaller, D(u,v)Closer to 0:
3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u and the asymmetrical similarity of user v is weighed
Weight w(u,v), wherein S is the common scoring item set of user u and user v, IuIt is the scoring item set of user u, N (S)
For the element number of set S, N (Iu) it is set IuElement number:
3.4) it user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), is used after being optimized
Family u is sim ' to the similarity of user v(u,v):
sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)
3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain user's similarity moment
Battle array.
4. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 4)
By taking a non-scoring item a of user u any in training set as an example, calculates user u and score the prediction of project a, specific steps
It is as follows:
4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as Ua={ u(1,a),
u(2,a),…,u(q,a), wherein u(q,a)Indicate q-th of user for evaluating project a;It is similar to user u's according to this q user
Size is spent, is ranked up by the sequence of similarity from big to small, is denoted as U 'a={ u '(1,a),u′(2,a),…,u′(q,a)};Again from row
User's set U ' of good sequenceaNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection(1,a),u
′(2,a),…,u′(k,a)};
4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)(u,a), wherein gathering
U is the neighbor user set of user u,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component, z(v,a)It is v pairs of user
The normalized score of project a, sim '(u,v)It is similarity of the user u to user v:
Such as formula (7), recommendation precision is portrayed using mean absolute error MAE, the smaller specification error of MAE is smaller, and precision is higher,
Middle piIt is that user scores to the prediction of project i, riIt is practical scoring of the user to project i in test set, n is commenting in test set
Dosis refracta:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910312071.8A CN110134874A (en) | 2019-04-18 | 2019-04-18 | A kind of collaborative filtering method optimizing user's similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910312071.8A CN110134874A (en) | 2019-04-18 | 2019-04-18 | A kind of collaborative filtering method optimizing user's similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110134874A true CN110134874A (en) | 2019-08-16 |
Family
ID=67570203
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910312071.8A Pending CN110134874A (en) | 2019-04-18 | 2019-04-18 | A kind of collaborative filtering method optimizing user's similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110134874A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110727876A (en) * | 2019-09-02 | 2020-01-24 | 南京理工大学 | Individual recommendation algorithm for intelligent retail system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04124782A (en) * | 1990-09-14 | 1992-04-24 | N T T Data Tsushin Kk | Method and device for feature extraction |
CN103559622A (en) * | 2013-07-31 | 2014-02-05 | 焦点科技股份有限公司 | Characteristic-based collaborative filtering recommendation method |
CN106021558A (en) * | 2016-05-27 | 2016-10-12 | 天津大学 | Calculation method for user availability in collaborative filtering recommendation system |
US20160314501A1 (en) * | 2015-03-24 | 2016-10-27 | Mxm Nation Inc. | Scalable networked computing system for scoring user influence in an internet-based social network |
CN107943948A (en) * | 2017-11-24 | 2018-04-20 | 中国科学院电子学研究所苏州研究院 | A kind of improved mixing collaborative filtering recommending method |
-
2019
- 2019-04-18 CN CN201910312071.8A patent/CN110134874A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04124782A (en) * | 1990-09-14 | 1992-04-24 | N T T Data Tsushin Kk | Method and device for feature extraction |
CN103559622A (en) * | 2013-07-31 | 2014-02-05 | 焦点科技股份有限公司 | Characteristic-based collaborative filtering recommendation method |
US20160314501A1 (en) * | 2015-03-24 | 2016-10-27 | Mxm Nation Inc. | Scalable networked computing system for scoring user influence in an internet-based social network |
CN106021558A (en) * | 2016-05-27 | 2016-10-12 | 天津大学 | Calculation method for user availability in collaborative filtering recommendation system |
CN107943948A (en) * | 2017-11-24 | 2018-04-20 | 中国科学院电子学研究所苏州研究院 | A kind of improved mixing collaborative filtering recommending method |
Non-Patent Citations (3)
Title |
---|
TASNIM ZAYET等: "A new weighting algorithm for collaborative filtering", 《IEEE XPLORE》 * |
何汶坤等: "基于共同评分数量及差异度的协同过滤推荐算法", 《伊犁师范学院学报(自然科学版)》 * |
李容等: "基于改进相似度的协同过滤算法研究", 《计算机科学》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110727876A (en) * | 2019-09-02 | 2020-01-24 | 南京理工大学 | Individual recommendation algorithm for intelligent retail system |
CN110727876B (en) * | 2019-09-02 | 2022-09-30 | 南京理工大学 | Individual recommendation algorithm for intelligent retail system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241405B (en) | Learning resource collaborative filtering recommendation method and system based on knowledge association | |
CN108256093B (en) | Collaborative filtering recommendation algorithm based on multiple interests and interest changes of users | |
US9706008B2 (en) | Method and system for efficient matching of user profiles with audience segments | |
TWI636416B (en) | Method and system for multi-phase ranking for content personalization | |
Li et al. | Improving one-class collaborative filtering by incorporating rich user information | |
CN101641697B (en) | Related search queries for a webpage and their applications | |
US10491694B2 (en) | Method and system for measuring user engagement using click/skip in content stream using a probability model | |
KR101003045B1 (en) | Apparatus and method for presenting personalized advertisements information based on artificial intelligence, and recording medium thereof | |
CN108520450B (en) | Recommendation method and system for local low-rank matrix approximation based on implicit feedback information | |
CN107833117B (en) | Bayesian personalized sorting recommendation method considering tag information | |
CN109918563B (en) | Book recommendation method based on public data | |
CN107424043A (en) | A kind of Products Show method and device, electronic equipment | |
CN106528643B (en) | Multi-dimensional comprehensive recommendation method based on social network | |
CN106471491A (en) | A kind of collaborative filtering recommending method of time-varying | |
US20120314941A1 (en) | Accurate text classification through selective use of image data | |
CN103745100A (en) | Item-based explicit and implicit feedback mixing collaborative filtering recommendation algorithm | |
CN103365839A (en) | Recommendation search method and device for search engines | |
CN102411754A (en) | Personalized recommendation method based on commodity property entropy | |
CN103886487A (en) | Individualized recommendation method and system based on distributed B2B platform | |
CN107016122B (en) | Knowledge recommendation method based on time migration | |
Yazdanfar et al. | Link recommender: Collaborative-filtering for recommending urls to twitter users | |
CN109165367B (en) | News recommendation method based on RSS subscription | |
CN102750334A (en) | Agricultural information accurate propelling method based on data mining (DM) | |
CN103324645A (en) | Method and device for recommending webpage | |
CN110348930A (en) | Business object data processing method, the recommended method of business object information and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |
|
RJ01 | Rejection of invention patent application after publication |