CN110134874A - A kind of collaborative filtering method optimizing user's similarity - Google Patents

A kind of collaborative filtering method optimizing user's similarity Download PDF

Info

Publication number
CN110134874A
CN110134874A CN201910312071.8A CN201910312071A CN110134874A CN 110134874 A CN110134874 A CN 110134874A CN 201910312071 A CN201910312071 A CN 201910312071A CN 110134874 A CN110134874 A CN 110134874A
Authority
CN
China
Prior art keywords
user
similarity
scoring
project
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910312071.8A
Other languages
Chinese (zh)
Inventor
安彦涵
张新鹏
吴汉舟
余江
王子驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN201910312071.8A priority Critical patent/CN110134874A/en
Publication of CN110134874A publication Critical patent/CN110134874A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention proposes a kind of collaborative filtering methods for optimizing user's similarity.While not increasing server delay, the precision of proposed algorithm is improved.The characteristics of this method, is: by being standardized pretreatment to user's score data, calculate Pearson similarity, the evaluation weight of user vector distance and asymmetrical similarity weight, and then Pearson similarity is optimized, so that traditional collaborative filtering recommends precision to be improved.This method is suitable for user --- the data set of project scoring.

Description

A kind of collaborative filtering method optimizing user's similarity
Technical field
For the recommender system based on collaborative filtering, the invention proposes a kind of collaborative filtering sides for optimizing user's similarity Method.
Background technique
It the fast development of internet and popularizes as user's acquisition, share and propagate information and provide a great convenience.With this Meanwhile increasing substantially for information content but reduces the utilization rate of information, make user be difficult to obtain from network in time it is true to oneself Just useful information, causes information overload problem.A kind of method that can successfully manage information overload problem is design recommender system, It according to the demand of user, the information such as interest, by the interested content of user and Products Show to user.With search engine phase Than recommender system carries out personalized calculating, to find the point of interest of user, Jin Eryin by interest, the preference of research user The information requirement that user has found oneself is led, and obtains the information useful to oneself.Good recommender system can not only provide for user Personalized service, moreover it is possible to establish mutual substantial connection for different user, user is allowed to generate dependence to recommendation.
Recommender system mainly includes information filtering and collaborative filtering.Before the recommender system of Cempetency-based education is according to user Browsing or purchaser record obtain user's concerned item purpose feature, the new projects for being best suitable for user interest profile are recommended into use Family.And the recommender system based on collaborative filtering is similar between obtaining user by the similitude of historical record between calculating user Degree searches other users similar with target user's preference, by the interested project recommendation of this kind of user to target user.
The recommender system of Cempetency-based education only considers target user itself, and the recommender system based on collaborative filtering is then abundant Group wisdom is utilized, i.e., collects answer in the behavior and data of a large amount of crowd, the personalization level of recommendation is higher, so Collaborative Filtering Recommendation Algorithm be most widely used in personalized ventilation system, effective proposed algorithm.
Recommender system based on collaborative filtering is divided into the Collaborative Filtering Recommendation System based on model and the association based on memory again With filtered recommendation system.The former mainly utilizes the methods of machine learning, data mining and statistics, to the historical data of user It is trained, then constructs corresponding user model, provide prediction and recommendation using the model for user, be related to matrix point Solution, the technologies such as hidden semantic analysis.The latter is divided into Collaborative Filtering Recommendation System based on user and project-based collaborative filtering pushes away Recommend system.
Though traditional measures similarity, non-logarithm using Pearson formula based on the Collaborative Filtering Recommendation System of user Pre-processed according to collection, do not consider user score distance between vector, do not consider the inequalities of similarity relationships between user, The recommendation quality of recommender system can be made to decline.For this purpose, the present invention is directed to the Collaborative Filtering Recommendation Algorithm based on user, to above-mentioned three Point optimizes, and improves and recommends quality.
Summary of the invention
This invention address that reducing the mean absolute error value of traditional Collaborative Filtering Recommendation Algorithm based on user, effectively The recommendation quality for improving recommender system provides a kind of collaborative filtering method for optimizing user's similarity.
In order to achieve the above objectives, the following technical solutions are proposed by the present invention:
A kind of collaborative filtering method optimizing user's similarity, by standardizing the scoring vector of user, in conjunction with user The evaluation weight of vector distance, asymmetrical similarity weight optimize Pearson similarity, finally carry out the pre- of user's scoring It surveys, the specific steps are as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base;
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark User's scoring vector after standardization, generates user --- project rating matrix;
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with commenting for user vector distance Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix;
4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user, By scoring, formula predicts the non-scoring item of target user.
Compared with prior art, the present invention has the advantage that:
The method of the present invention carries out the optimization of user's similarity to proposed algorithm module in collaborative filtering, so that recommending system System recommends quality to effectively improve while not increasing server delay.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention.
Specific embodiment
With reference to the accompanying drawing, specific embodiments of the present invention are described further.
The present embodiment is for MovieLens-100k data set (can be from website https: //movielens.org/ downloading) Instance analysis is carried out, which covers 943 users to total 100,000 scorings record of 1682 films, score value 1 Integer between to 5, wherein 1 representative evaluation is minimum, 5 represent evaluation highests.Each user commented at least 20 films Point.80% data are training set in data set, and 20% data are test set.
As shown in Figure 1, a kind of collaborative filtering method for optimizing user's similarity, by by the scoring vector standard of user Change, Pearson similarity is optimized in conjunction with evaluation weight, the asymmetrical similarity weight of user vector distance, is finally carried out The prediction of user's scoring, the specific steps are as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base.
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and according to mark User's scoring vector after standardization, generates user --- project rating matrix, the specific steps are as follows:
If the scoring vector of u-th of user is R in training setu=(r(u,1),r(u,2),…,r(u,m)), wherein r(u,m)It indicates Scoring of the user u to project m;As shown in formula (1), with Z-score method to RuIt is standardized, wherein z(u,m)It is standardization Scoring of the user u to project m afterwards,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component:
The scoring vector of user u after standardization is denoted as Zu=(z(u,1),z(u,2),…,z(u,m)), ZuMean value is 0, standard deviation It is 1.User-project rating matrix that size is 943 × 1682 is generated, wherein 943 be number of users, 1682 be the number of entry. ZuIt is recorded in user --- the score value of the user u project not scored is denoted as 0 by the u row of project rating matrix.
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson similarity, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with commenting for user vector distance Valence weight, asymmetrical similarity weight optimize Pearson similarity, the similarity formula after obtaining optimization, according to Similarity formula after optimization calculates the similarity of each user and other users, ultimately generates similarity matrix.With In the training set of MovieLens-100k for any two user u and user v, user u is calculated to the similarity of user v, tool Steps are as follows for body:
3.1) Pearson similarity is calculated: as shown in formula (2), with Pearson similarity formula measure user u and user v Pearson similarity sim(u,v), wherein set S is the project set of user u and user v to score jointly:
3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating ZuAnd ZvUser vector distance Evaluation weight D(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, α table The threshold value for showing the scoring gap of an independent project, if α is bigger, D(u,v)Closer to 1, if α is smaller, D(u,v)Closer to 0:
3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u to the asymmetrical similar of user v Spend weight w(u,v), wherein S is the common scoring item set of user u and user v, IuIt is the scoring item set of user u, N (S) element number for being set S, N (Iu) it is set IuElement number:
3.4) user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), optimized User u is sim ' to the similarity of user v afterwards(u,v):
sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)
3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain 943 × 943 User's similarity matrix.
4) prediction scoring: according to the similarity of target user and other users, calculating the neighbor user set of target user, By scoring, formula predicts the non-scoring item of target user, with one of user u any in training set in this example For non-scoring item a, calculates user u and scores the prediction of project a, the specific steps are as follows:
4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as Ua= {u(1,a),u(2,a),…,u(q,a), wherein u(q,a)Indicate q-th of user for evaluating project a;According to this q user and user u Similarity size, be ranked up by the sequence of similarity from big to small, be denoted as U 'a={ u '(1,a),u′(2,a),…,u′(q,a)}; Again from sorted user's set U 'aNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection(1,a), u′(2,a),…,u′(k,a)}。
4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)(u,a), wherein Set U is the neighbor user set of user u,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component, z(v,a)It is to use Normalized score of the family v to project a, sim '(u,v)It is similarity of the user u to user v:
Such as formula (7), recommendation precision is portrayed using mean absolute error MAE (Mean Absolute Error), MAE is got over Small specification error is smaller, and precision is higher, wherein piIt is that user scores to the prediction of project i, riBe in test set user to project i Practical scoring, n is the scoring quantity in test set:
The size of neighborhood takes 10 in the present embodiment, and MAE value of the invention is 0.74086, uses Pearson than original Similarity low 3.06%.The time that the method for the present invention calculates user's similarity matrix needs is 61.3 seconds, what score in predicting needed Time is 7.04 seconds.In practical applications, calculating user's similarity matrix can be completed by off-line calculation, and user uses this When invention proposed algorithm carries out the scoring of on-line prediction project, the real-time calculating time used is almost consistent with original method, does not have Increase the time that user waits online.

Claims (4)

1. a kind of collaborative filtering method for optimizing user's similarity, by the way that the scoring vector of user is standardized, in conjunction with user to Span from evaluation weight, asymmetrical similarity weight to Pearson similarity optimize, finally carry out user's scoring it is pre- It surveys, which is characterized in that specific step is as follows:
1) it prepares experimental data base: collecting certain amount user to the score value of disparity items, establish experimental data base;
2) it standardization pretreatment: is standardized with scoring vector of the Z-score method to each user, and establishing criteria User's scoring vector afterwards, generates user --- project rating matrix;
3) calculate the similarity matrix of user: user --- the project rating matrix generated according to step 2) calculates Pearson phase Like degree, the evaluation weight of user vector distance, asymmetrical similarity weight;In conjunction with the evaluation weight, non-of user vector distance Symmetrical similarity weight optimizes Pearson similarity, the similarity formula after obtaining optimization, according to the phase after optimization The similarity that each user and other users are calculated like degree formula, ultimately generates similarity matrix;
4) prediction scoring: according to the similarity of target user and other users, the neighbor user set of target user is calculated, is passed through Scoring formula predicts the non-scoring item of target user.
2. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that the step 2) Specific step is as follows: setting the scoring vector of u-th of user in training set as Ru=(r(u,1),r(u,2),…,r(u,m)), wherein z(u,m) It is scoring of the user u to project m after standardizing, r(u,m)Indicate scoring of the user u to project m;As shown in formula (1), with Z- Score method is to RuIt is standardized, wherein z(u,m)It is scoring of the user u to project m after standardizing,It is RuEach component is put down Mean value, σuIt is RuThe standard deviation of each component:
The scoring vector of user u after standardization is denoted as Zu=(z(u,1),z(u,2),…,z(u,m)), ZuMean value is 0, standard deviation 1; Then, user --- project rating matrix is generated;ZuBe recorded in user --- the u row of project rating matrix, by user u not into The score value of the project of row scoring is denoted as 0.
3. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 3) By taking any two user u in training set and user v as an example, user u is calculated to the similarity of user v, the specific steps are as follows:
3.1) Pearson similarity is calculated: as shown in formula (2), with Pearson similarity formula measure user u and user v Pearson similarity sim(u,v), wherein set S is the project set of user u and user v to score jointly:
3.2) it calculates the evaluation weight of user vector distance: as shown in formula (3), calculating ZuAnd ZvUser vector distance evaluation Weight D(u,v), wherein S is the common scoring item set of user u and user v, and N (S) is the element number of set S, and α indicates single The threshold value of the scoring gap of an only project, if α is bigger, D(u,v)Closer to 1, if α is smaller, D(u,v)Closer to 0:
3.3) it calculates asymmetrical similarity weight: as shown in formula (4), calculating user u and the asymmetrical similarity of user v is weighed Weight w(u,v), wherein S is the common scoring item set of user u and user v, IuIt is the scoring item set of user u, N (S) For the element number of set S, N (Iu) it is set IuElement number:
3.4) it user's similarity formula: as shown in formula (5), by fusion type (2), formula (3) and formula (4), is used after being optimized Family u is sim ' to the similarity of user v(u,v):
sim′(u,v)=D(u,v)*w(u,v)*sim(u,v) (5)
3.5) it calculates user's similarity matrix: calculating the similarity between different user by formula (5), finally obtain user's similarity moment Battle array.
4. the collaborative filtering method of optimization user's similarity according to claim 1, which is characterized in that in the step 4) By taking a non-scoring item a of user u any in training set as an example, calculates user u and score the prediction of project a, specific steps It is as follows:
4.1) it calculates neighbor user set: in training set, finding the user's set for evaluating project a, be denoted as Ua={ u(1,a), u(2,a),…,u(q,a), wherein u(q,a)Indicate q-th of user for evaluating project a;It is similar to user u's according to this q user Size is spent, is ranked up by the sequence of similarity from big to small, is denoted as U 'a={ u '(1,a),u′(2,a),…,u′(q,a)};Again from row User's set U ' of good sequenceaNeighbor user set of the k user as user u, is denoted as U={ u ' before middle selection(1,a),u ′(2,a),…,u′(k,a)};
4.2) scoring of the prediction user u to project a: user u is calculated to the prediction scoring p of project a by formula (6)(u,a), wherein gathering U is the neighbor user set of user u,It is RuThe average value of each component, σuIt is RuThe standard deviation of each component, z(v,a)It is v pairs of user The normalized score of project a, sim '(u,v)It is similarity of the user u to user v:
Such as formula (7), recommendation precision is portrayed using mean absolute error MAE, the smaller specification error of MAE is smaller, and precision is higher, Middle piIt is that user scores to the prediction of project i, riIt is practical scoring of the user to project i in test set, n is commenting in test set Dosis refracta:
CN201910312071.8A 2019-04-18 2019-04-18 A kind of collaborative filtering method optimizing user's similarity Pending CN110134874A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910312071.8A CN110134874A (en) 2019-04-18 2019-04-18 A kind of collaborative filtering method optimizing user's similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910312071.8A CN110134874A (en) 2019-04-18 2019-04-18 A kind of collaborative filtering method optimizing user's similarity

Publications (1)

Publication Number Publication Date
CN110134874A true CN110134874A (en) 2019-08-16

Family

ID=67570203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910312071.8A Pending CN110134874A (en) 2019-04-18 2019-04-18 A kind of collaborative filtering method optimizing user's similarity

Country Status (1)

Country Link
CN (1) CN110134874A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727876A (en) * 2019-09-02 2020-01-24 南京理工大学 Individual recommendation algorithm for intelligent retail system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124782A (en) * 1990-09-14 1992-04-24 N T T Data Tsushin Kk Method and device for feature extraction
CN103559622A (en) * 2013-07-31 2014-02-05 焦点科技股份有限公司 Characteristic-based collaborative filtering recommendation method
CN106021558A (en) * 2016-05-27 2016-10-12 天津大学 Calculation method for user availability in collaborative filtering recommendation system
US20160314501A1 (en) * 2015-03-24 2016-10-27 Mxm Nation Inc. Scalable networked computing system for scoring user influence in an internet-based social network
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04124782A (en) * 1990-09-14 1992-04-24 N T T Data Tsushin Kk Method and device for feature extraction
CN103559622A (en) * 2013-07-31 2014-02-05 焦点科技股份有限公司 Characteristic-based collaborative filtering recommendation method
US20160314501A1 (en) * 2015-03-24 2016-10-27 Mxm Nation Inc. Scalable networked computing system for scoring user influence in an internet-based social network
CN106021558A (en) * 2016-05-27 2016-10-12 天津大学 Calculation method for user availability in collaborative filtering recommendation system
CN107943948A (en) * 2017-11-24 2018-04-20 中国科学院电子学研究所苏州研究院 A kind of improved mixing collaborative filtering recommending method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
TASNIM ZAYET等: "A new weighting algorithm for collaborative filtering", 《IEEE XPLORE》 *
何汶坤等: "基于共同评分数量及差异度的协同过滤推荐算法", 《伊犁师范学院学报(自然科学版)》 *
李容等: "基于改进相似度的协同过滤算法研究", 《计算机科学》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110727876A (en) * 2019-09-02 2020-01-24 南京理工大学 Individual recommendation algorithm for intelligent retail system
CN110727876B (en) * 2019-09-02 2022-09-30 南京理工大学 Individual recommendation algorithm for intelligent retail system

Similar Documents

Publication Publication Date Title
CN109241405B (en) Learning resource collaborative filtering recommendation method and system based on knowledge association
CN108256093B (en) Collaborative filtering recommendation algorithm based on multiple interests and interest changes of users
US9706008B2 (en) Method and system for efficient matching of user profiles with audience segments
TWI636416B (en) Method and system for multi-phase ranking for content personalization
Li et al. Improving one-class collaborative filtering by incorporating rich user information
CN101641697B (en) Related search queries for a webpage and their applications
US10491694B2 (en) Method and system for measuring user engagement using click/skip in content stream using a probability model
KR101003045B1 (en) Apparatus and method for presenting personalized advertisements information based on artificial intelligence, and recording medium thereof
CN108520450B (en) Recommendation method and system for local low-rank matrix approximation based on implicit feedback information
CN107833117B (en) Bayesian personalized sorting recommendation method considering tag information
CN109918563B (en) Book recommendation method based on public data
CN107424043A (en) A kind of Products Show method and device, electronic equipment
CN106528643B (en) Multi-dimensional comprehensive recommendation method based on social network
CN106471491A (en) A kind of collaborative filtering recommending method of time-varying
US20120314941A1 (en) Accurate text classification through selective use of image data
CN103745100A (en) Item-based explicit and implicit feedback mixing collaborative filtering recommendation algorithm
CN103365839A (en) Recommendation search method and device for search engines
CN102411754A (en) Personalized recommendation method based on commodity property entropy
CN103886487A (en) Individualized recommendation method and system based on distributed B2B platform
CN107016122B (en) Knowledge recommendation method based on time migration
Yazdanfar et al. Link recommender: Collaborative-filtering for recommending urls to twitter users
CN109165367B (en) News recommendation method based on RSS subscription
CN102750334A (en) Agricultural information accurate propelling method based on data mining (DM)
CN103324645A (en) Method and device for recommending webpage
CN110348930A (en) Business object data processing method, the recommended method of business object information and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816

RJ01 Rejection of invention patent application after publication