CN112100512A - Collaborative filtering recommendation method based on user clustering and project association analysis - Google Patents

Collaborative filtering recommendation method based on user clustering and project association analysis Download PDF

Info

Publication number
CN112100512A
CN112100512A CN202010278287.XA CN202010278287A CN112100512A CN 112100512 A CN112100512 A CN 112100512A CN 202010278287 A CN202010278287 A CN 202010278287A CN 112100512 A CN112100512 A CN 112100512A
Authority
CN
China
Prior art keywords
user
item
matrix
similarity
preference
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010278287.XA
Other languages
Chinese (zh)
Inventor
赵学健
邱钟成
孙知信
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010278287.XA priority Critical patent/CN112100512A/en
Publication of CN112100512A publication Critical patent/CN112100512A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a collaborative filtering recommendation method based on user clustering and project association analysis, aiming at the problems of cold start, data sparseness, low recommendation accuracy and the like of the traditional collaborative filtering recommendation algorithm. The method adopts an improved fuzzy C-means clustering algorithm to mine the preference degree of the hidden features of the user, and adopts an association analysis strategy based on prejudgment screening to screen frequent item sets. On the basis, the algorithm utilizes the user characteristic preference matrix and the user scoring matrix to calculate the similarity between users, utilizes the frequent item set matrix and the user scoring matrix to calculate the similarity between items, and integrates the user similarity and the item similarity to calculate the prediction score of the user on the unscored items, so that the Top-K recommendation is realized. Compared with the traditional collaborative filtering recommendation algorithm based on users and the collaborative filtering recommendation algorithm based on items, the method can effectively avoid the cold start problem and the data sparsity problem, and has better recommendation quality.

Description

Collaborative filtering recommendation method based on user clustering and project association analysis
The technical field is as follows:
the invention relates to a collaborative filtering recommendation method, in particular to a collaborative filtering recommendation method based on user clustering and project association analysis, and belongs to the technical field of computer data mining and information processing.
Technical background:
with the rapid development of electronic commerce, the variety and quantity of commodities provided by e-commerce platforms are rapidly increased, and the era of commodity information overload comes. In the face of massive commodity information, a user with clear requirements can locate a commodity to be purchased through a search function provided by an e-commerce platform. However, when the user needs are uncertain or ambiguous and it is difficult to perform search positioning by keywords, it is very important how to help the user quickly find interested goods. The recommendation system is produced as an effective information processing tool, and associates the user and the commodity through the historical behavior information of the user, so that the problem of information overload is solved. Currently, recommendation systems have been successfully applied in many fields such as e-commerce, online music, video websites, and social platforms. According to amazon statistics, only 16% of customers who purchase on their websites with clear purchasing intentions are sold by the recommendation system, and more than 20% to 30% of the sales are sold by the recommendation system.
The recommendation algorithm is an important component of the recommendation system and is the key point of the performance of the recommendation system. The types of recommendation algorithms are many, and the common recommendation algorithms include a recommendation algorithm based on demographics, a recommendation algorithm based on content, a recommendation algorithm based on association rules, a collaborative filtering recommendation algorithm, a hybrid recommendation algorithm, and the like. The collaborative filtering recommendation algorithm is one of the most developed and widely applied personalized recommendation technologies at present, and mainly comprises a collaborative filtering recommendation algorithm based on users and a collaborative filtering recommendation algorithm based on items. However, the two collaborative filtering recommendation algorithms and most of the improved algorithms based on the two algorithms have the problems of cold start, data sparseness and low recommendation accuracy.
Disclosure of Invention
Aiming at the problems of cold start, data sparseness, low recommendation accuracy and the like of the traditional collaborative filtering recommendation algorithm, the collaborative filtering recommendation method based on user clustering and project association analysis is disclosed, as shown in fig. 1, the collaborative filtering recommendation method comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and performing data cleaning operation to obtain a data set with a specific format and constructing a user project scoring matrix UIn×mAnd item feature affiliation matrix IFm×kThe value of the feature number k is usually much smaller than the number m of the items;
step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×kThe dimensionality of the preference matrix of the user to the project characteristics is greatly reduced relative to the user project scoring matrix, and the time and space complexity of a recommendation algorithm is favorably reduced;
step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm to enable the FCM algorithm to be fast and efficiently converged and avoid falling into local optimization;
step 5, calculating the similarity of the user by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the user to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
Further, step 2 further comprises: using user project scoring matrix UIn×mAnd item feature membership matrix IFm ×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
Figure BDA0002445572430000021
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) The construction process for the membership vector of the corresponding feature of item i is shown in FIG. 1.
Further, in step 3, performing min-max normalization processing on the user feature preference UFP matrix, and mapping each element value of the matrix to an interval [0, 1], where the mapping method is shown in the following formula (2):
Figure BDA0002445572430000022
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of preference of all users for the item characteristics.
Further, in step 4, user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, so that the FCM algorithm is fast and efficiently converged, and local optimization is avoided, and the method comprises the following steps:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding andinitializing a population, encoding according to a formula, and randomly generating a population X, wherein n research objects in the population X serve as initial individuals, namely X ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
Figure BDA0002445572430000031
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
Further, in step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
Figure BDA0002445572430000032
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;
Figure BDA0002445572430000033
represents the average of all the scores of the user u; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
Figure BDA0002445572430000034
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,
Figure BDA0002445572430000041
represents the average of the user u's preference for all features,
Figure BDA0002445572430000042
representing the average of how much user v prefers all features.
Further, in step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iAnd if not, adding the item i into the transaction corresponding to the user u.
Further, in step 7, for the transaction data set D, a frequent item set S is generated by using the frequent item set mining strategy based on pre-judgment screening proposed by Zhao Zhi et al (< electronic and informatics newspapers >, & 2016, 38(7), 1654-FI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt ×mThe construction method is shown in the following formula (7):
Figure BDA0002445572430000043
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe ith row and the jth column in the array, i ∈ (0, t), j ∈ (0, m), the frequent item set matrix FISt×mExamples are shown below
Figure BDA0002445572430000044
Further, in step 8, the similarity of the items is calculated by integrating the frequent item set matrix and the user item rating matrix, so that the item similarity can not only contain the displayed rating information of the original user on the items, but also reflect the internal relation among the items, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j;
Sim′1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
Figure BDA0002445572430000045
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;
Figure BDA0002445572430000051
represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
Figure BDA0002445572430000052
wherein t representsNumber of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
Further, in step 9, determining nearest neighbor users of the user u and nearest neighbor items of the item i, calculating prediction scores of the user u for all unscored items and performing Top-K recommendation, wherein the method for calculating the prediction scores of the user u for the unscored items i comprises the following steps:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set N of the item ii
Calculating the prediction score of the user u on the unscored item i
Figure BDA0002445572430000053
The calculation formula is shown in the following formula (11):
Figure BDA0002445572430000054
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,
Figure BDA0002445572430000055
and
Figure BDA0002445572430000056
the average scores for user u and user p are represented respectively,
Figure BDA0002445572430000057
and
Figure BDA0002445572430000058
the average scores obtained for the items i and q are respectively represented, Sim (u, p) represents the similarity between the user u and the user v, and Sim' (i, q) represents the similarity between the items i and q. Calculating the prediction scores of the user u on all the unscored items according to the formula (11), performing descending order arrangement, and selecting the prediction scoresThe Top K items are subjected to Top-K recommendation.
Has the advantages that:
the method and the system utilize the user characteristic preference matrix and the user scoring matrix to calculate the similarity between users, utilize the frequent item set matrix and the user scoring matrix to calculate the similarity between items, and synthesize the user similarity and the item similarity to calculate the prediction score of the user on the unscored items, thereby realizing Top-K recommendation. Compared with the traditional collaborative filtering recommendation algorithm based on users and the collaborative filtering recommendation algorithm based on items, the method can effectively avoid the cold start problem and the data sparsity problem, and has better recommendation quality.
Drawings
FIG. 1 is a schematic diagram of a user characteristic preference matrix according to the present invention.
FIG. 2 is a flow chart of the present invention.
Detailed Description
The embodiment provides a collaborative filtering recommendation method based on user clustering and project association analysis, which comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and performing data cleaning operation to obtain a data set with a specific format and constructing a user project scoring matrix UIn×mAnd item feature affiliation matrix IFm×kThe value of the feature number k is usually much smaller than the number m of the items;
step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×kThe dimensionality of the preference matrix of the user to the project characteristics is greatly reduced relative to the user project scoring matrix, and the time and space complexity of a recommendation algorithm is favorably reduced;
step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm to enable the FCM algorithm to be fast and efficiently converged and avoid falling into local optimization;
step 5, calculating the similarity of the user by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the user to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
Further, step 2 further comprises: using user project scoring matrix UIn×mAnd item feature membership matrix IFm ×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
Figure BDA0002445572430000073
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) The construction process for the membership vector of the corresponding feature of item i is shown in FIG. 1.
Further, in step 3, performing min-max normalization processing on the user feature preference UFP matrix, and mapping each element value of the matrix to an interval [0, 1], where the mapping method is shown in the following formula (2):
Figure BDA0002445572430000071
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of preference of all users for the item characteristics.
Further, in step 4, user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, so that the FCM algorithm is fast and efficiently converged, and local optimization is avoided, and the method comprises the following steps:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding and population initialization, coding according to formula and randomly generating a population X, wherein n research objects in X are used as initial individuals, namely X is ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
Figure BDA0002445572430000072
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
Further, in step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
Figure BDA0002445572430000081
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;
Figure BDA0002445572430000082
represents the average of all the scores of the user u; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
Figure BDA0002445572430000083
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,
Figure BDA0002445572430000084
represents the average of the user u's preference for all features,
Figure BDA0002445572430000085
representing the average of how much user v prefers all features.
Further, in step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iIf not, adding the item i into the transaction corresponding to the user u, and the transaction data set D is shown in Table 1.
Figure BDA0002445572430000086
TABLE 1
Further, in step 7, for the transaction data set D, a frequent item set S is generated by using the frequent item set mining strategy based on pre-judgment screening proposed by Zhao Zhi et al (< electronic and informatics newspapers >, & 2016, 38(7), 1654-FI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt ×mThe construction method is shown in the following formula (7):
Figure BDA0002445572430000091
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe ith row and the jth column in the array, i ∈ (0, t), j ∈ (0, m), the frequent item set matrix FISt×mExamples are shown below
Figure BDA0002445572430000092
Further, in step 8, the similarity of the items is calculated by integrating the frequent item set matrix and the user item rating matrix, so that the item similarity can not only contain the displayed rating information of the original user on the items, but also reflect the internal relation among the items, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j;
Sim′1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
Figure BDA0002445572430000093
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;
Figure BDA0002445572430000094
represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
Figure BDA0002445572430000095
where t represents the number of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
Further, in step 9, determining nearest neighbor users of the user u and nearest neighbor items of the item i, calculating prediction scores of the user u for all unscored items and performing Top-K recommendation, wherein the method for calculating the prediction scores of the user u for the unscored items i comprises the following steps:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set Ni of the item i;
calculating the prediction score of the user u on the unscored item i
Figure BDA0002445572430000101
The calculation formula is shown in the following formula (11):
Figure BDA0002445572430000102
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,
Figure BDA0002445572430000103
and
Figure BDA0002445572430000104
the average scores for user u and user p are represented respectively,
Figure BDA0002445572430000105
and
Figure BDA0002445572430000106
the average scores obtained for the items i and q are respectively represented, Sim (u, p) represents the similarity between the user u and the user v, and Sim' (i, q) represents the similarity between the items i and q. And (4) calculating the prediction scores of all the unscored items of the user u according to the formula (11), performing descending order arrangement, and selecting K items with the highest prediction scores to perform Top-K recommendation.

Claims (9)

1. A collaborative filtering recommendation method based on user clustering and project association analysis is characterized in that:
the method comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and carrying out data cleaning operation to construct a user project scoring matrix UIn×mAnd item feature membership matrix IFm×k
Step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×k
Step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm;
step 5, calculating the similarity of the users by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the users to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
2. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: the step 2 further comprises the following steps: using user project scoring matrix UIn×mAnd item feature membership matrix IFm×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
Figure FDA0002445572420000011
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) And (4) the membership vector of the corresponding characteristic of the item i.
3. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 3, a min-max normalization process is performed on the user feature preference UFP matrix, and the values of the elements of the matrix are mapped to an interval [0, 1], where the mapping method is shown in the following formula (2):
Figure FDA0002445572420000021
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of the preference level of all users for the item characteristics.
4. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 4, the user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, and the steps are as follows:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding and population initialization, coding according to formula, and randomly generating a population X, wherein n research objects in X are used as initial individuals, namely X is ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
Figure FDA0002445572420000022
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
5. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
Figure FDA0002445572420000031
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;
Figure FDA0002445572420000037
represents the average of all user u scoresA value; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
Figure FDA0002445572420000032
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,
Figure FDA0002445572420000033
represents the average of the user u's preference for all features,
Figure FDA0002445572420000034
representing the average of the user v's preference for all features.
6. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iAnd if not, adding the item i into the transaction corresponding to the user u.
7. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 7, aiming at the transaction data set D, a frequent item set mining strategy based on prejudgment screening is used for generating a frequent item set SFI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt×mThe construction method is shown in the following formula (7):
Figure FDA0002445572420000035
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe element in the ith row and the jth column, i belongs to (0, t), j belongs to (0, m), and the frequent item set matrix FISt×mAs follows:
Figure FDA0002445572420000036
8. the collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in step 8, the similarity of the projects is calculated by integrating the frequent item set matrix and the user project rating matrix, so that the project similarity can not only contain the display rating information of the original user to the projects, but also reflect the internal relation among the projects, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j; sim'1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
Figure FDA0002445572420000041
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;
Figure FDA0002445572420000042
represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
Figure FDA0002445572420000043
where t represents the number of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
9. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 9, the nearest neighbor users of the user u and the nearest neighbor items of the item i are determined, the prediction scores of the user u on all the unscored items are calculated, and Top-K recommendation is performed, wherein the method for calculating the prediction scores of the user u on the unscored items i is as follows:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set N of the item ii
Calculating the prediction score of the user u on the unscored item i
Figure FDA0002445572420000044
The calculation formula is shown in the following formula (11):
Figure FDA0002445572420000045
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,
Figure FDA0002445572420000046
and
Figure FDA0002445572420000047
the average scores for user u and user p are represented separately,
Figure FDA0002445572420000051
and
Figure FDA0002445572420000052
the method comprises the steps of respectively representing average scores obtained by an item i and an item q, Sim (u, p) represents the similarity between a user u and a user v, Sim' (i, q) represents the similarity between the item i and the item q, calculating the prediction scores of the user u for all unscored items according to a formula (11), carrying out descending order arrangement, and selecting K items with the highest prediction scores to carry out Top-K recommendation.
CN202010278287.XA 2020-04-10 2020-04-10 Collaborative filtering recommendation method based on user clustering and project association analysis Withdrawn CN112100512A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010278287.XA CN112100512A (en) 2020-04-10 2020-04-10 Collaborative filtering recommendation method based on user clustering and project association analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010278287.XA CN112100512A (en) 2020-04-10 2020-04-10 Collaborative filtering recommendation method based on user clustering and project association analysis

Publications (1)

Publication Number Publication Date
CN112100512A true CN112100512A (en) 2020-12-18

Family

ID=73749592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010278287.XA Withdrawn CN112100512A (en) 2020-04-10 2020-04-10 Collaborative filtering recommendation method based on user clustering and project association analysis

Country Status (1)

Country Link
CN (1) CN112100512A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052392A (en) * 2020-09-10 2020-12-08 江苏电力信息技术有限公司 Online service recommendation method based on LFM model
CN113076478A (en) * 2021-04-14 2021-07-06 同济大学 Technical resource and service recommendation system based on hybrid recommendation algorithm
CN113094542A (en) * 2021-03-24 2021-07-09 西安交通大学 Set ordering music recommendation method aiming at user implicit feedback data
CN113221003A (en) * 2021-05-20 2021-08-06 北京建筑大学 Mixed filtering recommendation method and system based on dual theory
CN113704608A (en) * 2021-08-26 2021-11-26 武汉卓尔数字传媒科技有限公司 Personalized item recommendation method and device, electronic equipment and storage medium
CN114461899A (en) * 2021-12-24 2022-05-10 新奥新智科技有限公司 Collaborative filtering recommendation method and device for user, electronic equipment and storage medium
CN114638443A (en) * 2022-05-19 2022-06-17 安徽数智建造研究院有限公司 Construction equipment intelligent type selection and allocation method based on improved genetic algorithm
CN115713432A (en) * 2022-09-21 2023-02-24 湖南科技大学 Production element-oriented service recommendation method in industrial Internet environment
CN117952726A (en) * 2024-03-27 2024-04-30 摘星社信息科技(浙江)股份有限公司 Personalized equity package recommendation system based on operator data analysis

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112052392A (en) * 2020-09-10 2020-12-08 江苏电力信息技术有限公司 Online service recommendation method based on LFM model
CN113094542A (en) * 2021-03-24 2021-07-09 西安交通大学 Set ordering music recommendation method aiming at user implicit feedback data
CN113094542B (en) * 2021-03-24 2023-08-15 西安交通大学 Set ordering music recommendation method for implicit feedback data of user
CN113076478A (en) * 2021-04-14 2021-07-06 同济大学 Technical resource and service recommendation system based on hybrid recommendation algorithm
CN113221003A (en) * 2021-05-20 2021-08-06 北京建筑大学 Mixed filtering recommendation method and system based on dual theory
CN113704608A (en) * 2021-08-26 2021-11-26 武汉卓尔数字传媒科技有限公司 Personalized item recommendation method and device, electronic equipment and storage medium
CN114461899A (en) * 2021-12-24 2022-05-10 新奥新智科技有限公司 Collaborative filtering recommendation method and device for user, electronic equipment and storage medium
CN114638443A (en) * 2022-05-19 2022-06-17 安徽数智建造研究院有限公司 Construction equipment intelligent type selection and allocation method based on improved genetic algorithm
CN114638443B (en) * 2022-05-19 2022-08-23 安徽数智建造研究院有限公司 Construction equipment intelligent type selection and allocation method based on improved genetic algorithm
CN115713432A (en) * 2022-09-21 2023-02-24 湖南科技大学 Production element-oriented service recommendation method in industrial Internet environment
CN117952726A (en) * 2024-03-27 2024-04-30 摘星社信息科技(浙江)股份有限公司 Personalized equity package recommendation system based on operator data analysis

Similar Documents

Publication Publication Date Title
CN112100512A (en) Collaborative filtering recommendation method based on user clustering and project association analysis
CN106844787B (en) Recommendation method for searching target users and matching target products for automobile industry
CN107833117B (en) Bayesian personalized sorting recommendation method considering tag information
US20080208652A1 (en) Method and system utilizing online analytical processing (olap) for making predictions about business locations
CN107220365A (en) Accurate commending system and method based on collaborative filtering and correlation rule parallel processing
CN109710835B (en) Heterogeneous information network recommendation method with time weight
Cintia Ganesha Putri et al. Design of an unsupervised machine learning-based movie recommender system
CN114880486A (en) Industry chain identification method and system based on NLP and knowledge graph
CN108563690A (en) A kind of collaborative filtering recommending method based on object-oriented cluster
WO2020095357A1 (en) Search needs assessment device, search needs assessment system, and search needs assessment method
CN105868422B (en) A kind of collaborative filtering recommending method based on elastic dimensional feature vector Optimizing Extraction
CN116431931A (en) Real-time incremental data statistical analysis method
Fareed et al. A collaborative filtering recommendation framework utilizing social networks
Zheng et al. Graph-convolved factorization machines for personalized recommendation
CN115829683A (en) Power integration commodity recommendation method and system based on inverse reward learning optimization
CN111612583A (en) Individualized shopping guide system based on clustering
Agustyaningrum et al. Online shopper intention analysis using conventional machine learning and deep neural network classification algorithm
Alsalama A hybrid recommendation system based on association rules
Chou et al. The RFM Model Analysis for VIP Customer: A case study of golf clothing brand
Lu et al. Artificial immune network with feature selection for bank term deposit recommendation
Sun et al. A Dynamic Collaborative Filtering Algorithm based on Convolutional Neural Networks and Multi-layer Perceptron
CN114429384A (en) Intelligent product recommendation method and system based on e-commerce platform
CN115114517A (en) Collaborative filtering recommendation algorithm based on user attributes and item scores
CN114238758A (en) User portrait prediction method based on multi-source cross-border data fusion
Wang et al. NAUI: Neural attentive user interest model for cross-domain CTR prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201218

WW01 Invention patent application withdrawn after publication