CN114647773A - Improved collaborative filtering method based on multiple linear regression and third-party credit - Google Patents

Improved collaborative filtering method based on multiple linear regression and third-party credit Download PDF

Info

Publication number
CN114647773A
CN114647773A CN202011504132.XA CN202011504132A CN114647773A CN 114647773 A CN114647773 A CN 114647773A CN 202011504132 A CN202011504132 A CN 202011504132A CN 114647773 A CN114647773 A CN 114647773A
Authority
CN
China
Prior art keywords
user
nat
credit
collaborative filtering
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011504132.XA
Other languages
Chinese (zh)
Other versions
CN114647773B (en
Inventor
朱赟
于士浩
郑闻悦
高连峰
陈剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gannan Normal University
Original Assignee
Gannan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gannan Normal University filed Critical Gannan Normal University
Priority to CN202011504132.XA priority Critical patent/CN114647773B/en
Publication of CN114647773A publication Critical patent/CN114647773A/en
Application granted granted Critical
Publication of CN114647773B publication Critical patent/CN114647773B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Due to simple and easily understood steps and excellent computing performance, the collaborative filtering algorithm has become a popular research field in the recommendation system, and meanwhile, the traditional collaborative filtering algorithm has many defects, such as: sparsity matrices, cold start problems, trust problems for users, defects in similarity calculation formulas, etc. By analyzing the problems in the collaborative filtering algorithm, an improved collaborative filtering algorithm based on multiple linear regression and third-party credit is provided, so that the construction of a third-party credit model and the calculation of credit similarity are implemented. The specific implementation result shows that the improved collaborative filtering algorithm based on the multiple linear regression and the third-party credit is superior to the traditional collaborative filtering algorithm. In the improved collaborative filtering algorithm based on the multiple linear regression and the third-party credit, the sparsity problem, the trust problem of the user and the similarity calculation formula among the collaborative filtering algorithms are effectively relieved.

Description

Improved collaborative filtering method based on multiple linear regression and third-party credit
Technical Field
The invention belongs to the field of intelligent recommendation algorithms, and particularly relates to an improved collaborative filtering algorithm based on multivariate linear regression and third-party credit.
Background
In the information age developing at a high speed, it is becoming more and more important how to dig out the commonalities of data from massive data and find out the potential laws therein. The improved collaborative filtering algorithm finds out the potential requirements of the user from the data and can be popularized to various aspects of life by combining with the rapid development trend of the Internet. When a favorite song is wanted to be found in the early morning every day to meet a good mood of one day, the collaborative filtering algorithm can help you; when the user wants to find the favorite food for supplementing energy, the collaborative filtering algorithm can help the user; when people eat more sports equipment which is wanted to lose weight and entangle with sports equipment, the collaborative filtering algorithm can help you. In conclusion, the collaborative filtering algorithm has penetrated aspects of life, and the number of application examples is not sufficient.
However, the conventional collaborative filtering algorithm has many disadvantages, such as: sparsity matrices, cold start problems, trust problems for users, defects in similarity calculation formulas. How to effectively alleviate or solve the defects of the traditional collaborative filtering algorithm is a big problem, and a large number of scholars propose solutions for the problem. The problems of sparse matrix improvement by PCA dimension reduction are proposed by Yaojingbo, Yuyicheng and the like; the similarity measurement method provided by the Jinming, Mengjun and the like effectively solves the problems of cold start and the like of the system; the multiple similarity fusion proposed by Wangbosheng, Haowangbo and the like effectively relieves the problems of data sparsity and cold start; however, in these problems, the fusion consideration of various information based on the trust of the third party is often overlooked by people. In this case, the system generates user recommendations based on false information, and the recommendation error becomes large, and of course, may have more or less influence on other users. This necessarily presents a challenge to the system recommendation algorithm, and adding trust in third party personal credits to consideration may effectively address these issues. The improved collaborative filtering algorithm based on the multiple linear regression and the third-party credit has the advantages that the multiple linear regression equation of the score value and the various information is established after the fields are quantized according to the information such as the attribute characteristics of the articles and the credit value of the user, the favorite vector of the user is solved, the problem of the sparsity matrix is effectively solved, the Euclidean distance is used for solving the neighbor user of the target user, and finally the recommendation formula is used for recommending the user.
Disclosure of Invention
In the big data era, the information scale of users processed in the system can reach hundreds of millions of orders of magnitude, and meanwhile, a large number of users and items inevitably cause the high sparseness of user item matrixes, which brings huge challenges to the recommendation system. Aiming at the problem of a user project sparse matrix in a system, the invention aims to provide an improved recommendation algorithm based on multiple linear regression and a third-party credit agency.
The improved recommendation algorithm is a recommendation algorithm which integrates information such as a place where a certain project is acted by a user in the past, credit values of third-party organizations for the user, attribute characteristics of the project and the like, and mainly solves the problems of a user project sparse matrix, malicious user scores and the like. The method comprises the following steps:
(1) a weighted credit model based on third party individuals or credit agencies is constructed.
(2) And (3) constructing a project feature vector associated with the user according to the features of the project in the system and the credit obtained in the previous two steps:
nati=(nati1,nati2,...,natij,...,natin)
(3) nat obtained according to the second stepiAnd (4) scoring the project by the user in the vector and system, and constructing a multiple linear regression equation for each user.
yi=b0+b1x1+b2x2+...+bnxni
(4) And solving the scoring vector of the comprehensive multi-source information of each user according to the multiple linear regression equation in the third step.
useri=(yi1,yi2,...,yij,...,yin)
(5) And representing the favorite of the users by using the vector obtained in the fourth step, calculating the distance between the target user and other users by using the Euclidean distance according to the vector, calculating the similarity between the target user and other users by using the following formula, and selecting N users most similar to the target user as neighbors of the target user based on the KNN idea.
Figure BDA0002844388570000021
(6) And generating recommendation for the user by using a recommendation formula according to the user similarity obtained in the fifth step.
Figure BDA0002844388570000022
Step one, the construction of the third-party organization-based weighted credit model specifically comprises the following steps:
the credit of the user is a considerable part. On the internet, a phenomenon that some users do not want to make real scores or maliciously brush scores often exists, and a certain false property exists in a lot of data. The concept of a third party trust authority is therefore introduced. Supposing that m third-party trust authorities CA, n user users and a credit value matrix of each authority to each user are specifically expressed as follows:
CA={CA1,CA2,...,CAi,...,CAm}
user=(user1,user2,...,useri,...,usern)
Figure BDA0002844388570000031
firstly, aiming at the credibility of the third-party trust authority, the third-party trust authority CA is subjected to descending order arrangement according to known official data, and the ordered third-party trust authority ACA and the credit value matrix of each ordered authority to each user are as follows:
ACA={ACA1,ACA2,...,ACAi,...,ACAm}
Figure BDA0002844388570000032
then, the classification is carried out according to the sequenced third-party trust organization ACA, and the specific classification conditions are shown in the following table:
Figure BDA0002844388570000033
finally, in order to obtain the comprehensive credit CCRE of the user, the discussion can be divided into cases according to grades: for the trust authorities at the same level, the processed comprehensive credit degree is obtained by a method of solving a tail-cutting mean/tail-removing mean, and the specific formula is as follows:
Figure BDA0002844388570000034
wherein c represents the number of third-party trust authorities in a certain evaluation level, m represents a row, q represents n users, max represents the most approved third-party trust authority in the current evaluation level, and min represents the third-party trust authority with the highest approval degree ranking in the current evaluation level.
After the above transformation, the credit matrix can be represented by the following formula:
Figure BDA0002844388570000041
for trust organizations with different levels, the comprehensive credit degree of the user is obtained according to the previous trust weight, which is specifically expressed as follows:
Figure BDA0002844388570000042
wherein CreditT acAnd (m, n) represents the transposition of the credit value matrix, and the comprehensive user credit based on the third-party trust authority of n multiplied by 1 can be obtained through the formula.
And the nat vector in the second step is composed of the attribute of the item and the comprehensive credit degree of the third-party person or organization. Each item scored by each user will have a nat vector, and if the user participates in scoring m items, there will be m corresponding nat vectors. Assuming that a nat vector specifies n fields in total, if a certain article does not have a certain field, the value of the field is usually 0; if some item has its own attribute field, the value of the field is obtained according to the coefficient of the regression equation of the multiple linear regression equation, and in addition, the comprehensive credit value, the region and the season information of the user are all possessed by each vector, wherein the region value is replaced by the zip code, and the season value is shown in the following table:
Figure BDA0002844388570000043
from these vectors and the known score values, it is convenient to later establish a multiple linear regression equation of the score values with each field in the nat vector.
Step three, constructing a multiple linear regression equation for each user, specifically as follows:
and establishing a score value and a multiple linear regression equation of each field in the nat vector for each user based on the nat vector of the project characteristic vector associated with the user and the corresponding nat vector and the score value of the user. The specific formula is as follows:
yi=b0+b1nati1+b2nati2+...+bnnatini
wherein b isnRepresents the nth factor influencing the score value y, b0Represents a constant term, μiRepresents a random error, yiRepresenting the value of the i-th user's credit by the regression equation for a given nat vector.
After the coefficients of the multiple linear regression equation are obtained through the formula, the favorite vectors of the user can be obtained, and the favorite vectors of the user are used for replacing the user item matrix, so that the problem of matrix sparsity is solved. Suppose the user's Preference vector is reference, which is expressed specifically as follows:
Preferencei=(bi0+bi1i1,bi0+bi2i2,...,bi0+binin)
wherein referenceiRepresenting the favorite vector of the ith user, and the other variables have been mentioned above and will not be described herein.
Step four, the neighbor users of the target user are calculated by using the Euclidean distance method, and the method comprises the following specific steps:
the Euclidean distance method is simple in calculation and more accurate in result, and is a common distance definition. The idea is as follows: in the m-dimensional space, subtracting the favorite vectors of the current user and the target user, and if the final value distance (m, n) is smaller, indicating that the object preferences of the current user and the target user are more similar. (ii) a Conversely, it is stated that the more dissimilar the two user item preferences are. The calculation formula is as follows:
Figure BDA0002844388570000051
Figure BDA0002844388570000052
wherein xmnThe nth attribute represents the user m, k represents the total k users, distance (m, n) represents the distance between the mth user and the nth user, and Sim (m, n) represents the similarity of the mth user to the nth user.
And (3) calculating preference similarity between each user by combining the formula of Sim (m, n), thereby constructing a similarity matrix favored by the users, wherein the specific formula is as follows:
Figure BDA0002844388570000061
after the similarity matrix is obtained through calculation, in order to reduce unnecessary calculation, based on the idea of KNN, the previous m maximum values of each row of the Sim similarity matrix may be taken and substituted into the following recommended formula to obtain the final result.
And fifthly, generating recommendation for the user, wherein the specific recommendation mode is as follows:
after the similarity matrix is obtained in the fifth step, a user recommendation result can be given according to a recommendation formula, wherein the specific recommendation formula is as follows:
Figure BDA0002844388570000062
wherein R isijRepresenting the user i's score for item j in the score-item matrix, Sim (m, i) is the closeness of preference between two users m, i,
Figure BDA0002844388570000063
the average score for any item it has historically engaged in is given on behalf of the current user m,
Figure BDA0002844388570000064
represents the average score of the current user i on any item that the current user historically participates in, k represents the user who picks k users with higher likelihoods of similarity to the user m, and pre (m, j) represents the predicted score of the user m on the item j.
Drawings
FIG. 1 is a flow chart of an improved collaborative filtering algorithm based on multiple linear regression and third party credit.
FIG. 2 is a diagram of a user's composite credit value based on a weighted credit model for a third party individual.
Fig. 3 shows similarity values of each of the 20 users with the first 3 users having the highest similarity.
FIG. 4 is a graph of the predicted scores for each user for the non-scored items based on the improved collaborative filtering algorithm based on multiple linear regression and third party credit.
Detailed Description
In order to show the steps of the invention work clearly and intuitively, the matlab tool is used and is combined with the actual case to describe the improved collaborative filtering algorithm based on the multiple linear regression and the credit of the third-party organization in detail.
The case background is to recommend clothing according to the target user's preferences using an improved collaborative filtering algorithm based on multiple linear regression and third party credit. Assuming that there are 3 third party individuals available, their weights are assigned as follows based on official data:
third party individual Grade of judgement Trust weight
CA1 A 0.6
CA2 B 0.3
CA3 C 0.1
The third party individuals would score each user and calculate the user's composite credit value as follows.
Figure BDA0002844388570000071
The ordinate of figure 2 is their composite integrated credit value. And after obtaining the comprehensive credit value of the user, constructing project characteristic vectors associated with the user according to the features of the project in the system and the credit obtained in the previous two steps, constructing a multiple linear regression equation for each user according to the characteristic vectors, and obtaining the favorite vector of the user according to the equation by using the solved coefficients. Then, the similarity of the user is calculated by using the above Euclidean distance, then a Sim similarity matrix is constructed, and then the first m maximum values of each row of the Sim similarity matrix are taken and substituted into the following recommendation formula based on the idea of KNN, wherein fig. 3 shows that the first 3 maximum similarity values of each row of the Sim similarity matrix are taken. After obtaining the similar value, a recommendation is generated for the user, and a recommendation result is obtained, as shown in fig. 4.

Claims (1)

1. The improved collaborative filtering method based on the multiple linear regression and the third-party credit is characterized in that: firstly, a weighted credit model based on a third-party individual is constructed, and secondly, a credit value matrix of a user behavior generated on a certain project is collected. And quantizing the favorite vectors to obtain a linear regression equation, and selecting N users most similar to the target user as neighbors of the target user. And finally, generating recommendation for the user by using a recommendation formula.
(1) The weighted credit model calculation is described as:
Figure FDA0002844388560000011
wherein CreditT ac(m, n) denotes the transpose of the matrix of credit values, ωabcRespectively representing trust weights of trust authorities of different levels.
(2) The credit value matrix calculation is described as:
Figure FDA0002844388560000012
wherein a is a constant, c represents the number of third party individuals in a certain evaluation level, m represents a row, q represents n users, max represents the most approved third party trust in the current evaluation level, and min represents the third party individual with the highest approval degree ranking in the current evaluation level.
(3) The favorites vector calculation is described as:
nati=(nati1,nati2,...,natij,...,natin)
therein natinN attributes, nat, representing the ith subscriberiRepresenting the favorite vector of the ith user.
(4) The method for quantitatively constructing the favorite vector of each user is characterized in that:
yi=b0+b1nati1+b2nati2+...+bnnatini
wherein, bnRepresents the nth factor influencing the score value y, b0Represents a constant term, μiRepresents a random error, yiRepresenting the value of the credit obtained for the ith user for a given nat vector.
(5) The description of the collaborative filtering method for calculating the similarity between the target user and other users is as follows:
Figure FDA0002844388560000013
wherein distance (m, N) represents the distance between the target user m and the user N, k represents the total number of users in the system, Sim (m, N) represents the similarity value between the target user m and the user N, and N users most similar to the target user can be selected as neighbors of the target user.
(6) The recommended formula is described as:
Figure FDA0002844388560000021
wherein R isijRepresenting the user i's score for item j in the score-item matrix, Sim (m, i) is the closeness of preference between two users m, i,
Figure FDA0002844388560000022
represents the average score of the current user m for any items it has historically engaged in,
Figure FDA0002844388560000023
represents the average score of the current user i on any item that the current user historically participates in, k represents the user who picks k users with higher likelihoods of similarity to the user m, and pre (m, j) represents the predicted score of the user m on the item j. And generating recommendation for the user by using the recommendation formula according to the obtained user similarity.
CN202011504132.XA 2020-12-17 2020-12-17 Improved collaborative filtering method based on multiple linear regression and third party credit Active CN114647773B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011504132.XA CN114647773B (en) 2020-12-17 2020-12-17 Improved collaborative filtering method based on multiple linear regression and third party credit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011504132.XA CN114647773B (en) 2020-12-17 2020-12-17 Improved collaborative filtering method based on multiple linear regression and third party credit

Publications (2)

Publication Number Publication Date
CN114647773A true CN114647773A (en) 2022-06-21
CN114647773B CN114647773B (en) 2024-03-22

Family

ID=81990182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011504132.XA Active CN114647773B (en) 2020-12-17 2020-12-17 Improved collaborative filtering method based on multiple linear regression and third party credit

Country Status (1)

Country Link
CN (1) CN114647773B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007101278A2 (en) * 2006-03-04 2007-09-07 Davis Iii John S Behavioral trust rating filtering system
WO2012162873A1 (en) * 2011-05-27 2012-12-06 Nokia Corporation Method and apparatus for role-based trust modeling and recommendation
CN106484876A (en) * 2016-10-13 2017-03-08 中山大学 A kind of based on typical degree and the collaborative filtering recommending method of trust network
CN106940801A (en) * 2016-01-04 2017-07-11 中国科学院声学研究所 A kind of deeply for Wide Area Network learns commending system and method
CN109815402A (en) * 2019-01-23 2019-05-28 北京工业大学 Collaborative Filtering Recommendation Algorithm based on user characteristics
CN111324807A (en) * 2020-01-13 2020-06-23 北京工业大学 Collaborative filtering recommendation method based on trust degree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007101278A2 (en) * 2006-03-04 2007-09-07 Davis Iii John S Behavioral trust rating filtering system
WO2012162873A1 (en) * 2011-05-27 2012-12-06 Nokia Corporation Method and apparatus for role-based trust modeling and recommendation
CN106940801A (en) * 2016-01-04 2017-07-11 中国科学院声学研究所 A kind of deeply for Wide Area Network learns commending system and method
CN106484876A (en) * 2016-10-13 2017-03-08 中山大学 A kind of based on typical degree and the collaborative filtering recommending method of trust network
CN109815402A (en) * 2019-01-23 2019-05-28 北京工业大学 Collaborative Filtering Recommendation Algorithm based on user characteristics
CN111324807A (en) * 2020-01-13 2020-06-23 北京工业大学 Collaborative filtering recommendation method based on trust degree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王占;林岩;: "基于信任与用户兴趣变化的协同过滤方法研究", 情报学报, no. 02, 24 February 2017 (2017-02-24) *
蒋伟;秦志光;: "耦合社会信任信息的矩阵分解协同过滤模型", 电子科技大学学报, no. 03, 30 May 2019 (2019-05-30) *

Also Published As

Publication number Publication date
CN114647773B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
Eliyas et al. Recommendation systems: Content-based filtering vs collaborative filtering
CN113268669B (en) Relation mining-oriented interest point recommendation method based on joint neural network
CN111651678B (en) Personalized recommendation method based on knowledge graph
CN105069666A (en) E-commerce personalized recommendation method integrated with user implicit information
CN111753215B (en) Multi-objective recommendation optimization method and readable medium
CN106886559A (en) The collaborative filtering method of good friend's feature and similar users feature is incorporated simultaneously
Suriati et al. Weighted hybrid technique for recommender system
Wasid et al. Multi-criteria clustering-based recommendation using Mahalanobis distance
Hassan et al. Performance analysis of neural networks-based multi-criteria recommender systems
Ujjin et al. Learning user preferences using evolution
Gao et al. A robust collaborative filtering approach based on user relationships for recommendation systems
Song et al. Research on personalized hybrid recommendation system
CN114647773A (en) Improved collaborative filtering method based on multiple linear regression and third-party credit
Lee Fuzzy clustering with optimization for collaborative filtering-based recommender systems
Wasid et al. Particle swarm optimisation-based contextual recommender systems
CN114611013A (en) Collaborative filtering recommendation algorithm based on adaptive combination of user interest and scoring preference difference
Ju et al. Personal recommendation via heterogeneous diffusion on bipartite network
Chen et al. Research of collaborative filtering recommendation algorithm based on trust propagation model
Agagu et al. Context-aware recommendation methods
Farsani et al. A semantic recommendation procedure for electronic product catalog
Huang et al. Collaborative filtering algorithm based on rating difference and user interest
Chen et al. A new method to generate fuzzy rules from relational database systems for estimating null values
Lin Fuzzy similarity matching method for interior design drawing recommendation
Wang et al. A Privacy-Aware Multi-Preference-Based Collaborative Filtering Recommendation System with LSH
Seo et al. The Method of Personalized Recommendation with Ensemble Combination.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant