CN112100512A - Collaborative filtering recommendation method based on user clustering and project association analysis - Google Patents
Collaborative filtering recommendation method based on user clustering and project association analysis Download PDFInfo
- Publication number
- CN112100512A CN112100512A CN202010278287.XA CN202010278287A CN112100512A CN 112100512 A CN112100512 A CN 112100512A CN 202010278287 A CN202010278287 A CN 202010278287A CN 112100512 A CN112100512 A CN 112100512A
- Authority
- CN
- China
- Prior art keywords
- user
- item
- matrix
- similarity
- preference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Artificial Intelligence (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a collaborative filtering recommendation method based on user clustering and project association analysis, aiming at the problems of cold start, data sparseness, low recommendation accuracy and the like of the traditional collaborative filtering recommendation algorithm. The method adopts an improved fuzzy C-means clustering algorithm to mine the preference degree of the hidden features of the user, and adopts an association analysis strategy based on prejudgment screening to screen frequent item sets. On the basis, the algorithm utilizes the user characteristic preference matrix and the user scoring matrix to calculate the similarity between users, utilizes the frequent item set matrix and the user scoring matrix to calculate the similarity between items, and integrates the user similarity and the item similarity to calculate the prediction score of the user on the unscored items, so that the Top-K recommendation is realized. Compared with the traditional collaborative filtering recommendation algorithm based on users and the collaborative filtering recommendation algorithm based on items, the method can effectively avoid the cold start problem and the data sparsity problem, and has better recommendation quality.
Description
The technical field is as follows:
the invention relates to a collaborative filtering recommendation method, in particular to a collaborative filtering recommendation method based on user clustering and project association analysis, and belongs to the technical field of computer data mining and information processing.
Technical background:
with the rapid development of electronic commerce, the variety and quantity of commodities provided by e-commerce platforms are rapidly increased, and the era of commodity information overload comes. In the face of massive commodity information, a user with clear requirements can locate a commodity to be purchased through a search function provided by an e-commerce platform. However, when the user needs are uncertain or ambiguous and it is difficult to perform search positioning by keywords, it is very important how to help the user quickly find interested goods. The recommendation system is produced as an effective information processing tool, and associates the user and the commodity through the historical behavior information of the user, so that the problem of information overload is solved. Currently, recommendation systems have been successfully applied in many fields such as e-commerce, online music, video websites, and social platforms. According to amazon statistics, only 16% of customers who purchase on their websites with clear purchasing intentions are sold by the recommendation system, and more than 20% to 30% of the sales are sold by the recommendation system.
The recommendation algorithm is an important component of the recommendation system and is the key point of the performance of the recommendation system. The types of recommendation algorithms are many, and the common recommendation algorithms include a recommendation algorithm based on demographics, a recommendation algorithm based on content, a recommendation algorithm based on association rules, a collaborative filtering recommendation algorithm, a hybrid recommendation algorithm, and the like. The collaborative filtering recommendation algorithm is one of the most developed and widely applied personalized recommendation technologies at present, and mainly comprises a collaborative filtering recommendation algorithm based on users and a collaborative filtering recommendation algorithm based on items. However, the two collaborative filtering recommendation algorithms and most of the improved algorithms based on the two algorithms have the problems of cold start, data sparseness and low recommendation accuracy.
Disclosure of Invention
Aiming at the problems of cold start, data sparseness, low recommendation accuracy and the like of the traditional collaborative filtering recommendation algorithm, the collaborative filtering recommendation method based on user clustering and project association analysis is disclosed, as shown in fig. 1, the collaborative filtering recommendation method comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and performing data cleaning operation to obtain a data set with a specific format and constructing a user project scoring matrix UIn×mAnd item feature affiliation matrix IFm×kThe value of the feature number k is usually much smaller than the number m of the items;
step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×kThe dimensionality of the preference matrix of the user to the project characteristics is greatly reduced relative to the user project scoring matrix, and the time and space complexity of a recommendation algorithm is favorably reduced;
step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm to enable the FCM algorithm to be fast and efficiently converged and avoid falling into local optimization;
step 5, calculating the similarity of the user by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the user to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m;
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
Further, step 2 further comprises: using user project scoring matrix UIn×mAnd item feature membership matrix IFm ×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) The construction process for the membership vector of the corresponding feature of item i is shown in FIG. 1.
Further, in step 3, performing min-max normalization processing on the user feature preference UFP matrix, and mapping each element value of the matrix to an interval [0, 1], where the mapping method is shown in the following formula (2):
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of preference of all users for the item characteristics.
Further, in step 4, user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, so that the FCM algorithm is fast and efficiently converged, and local optimization is avoided, and the method comprises the following steps:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding andinitializing a population, encoding according to a formula, and randomly generating a population X, wherein n research objects in the population X serve as initial individuals, namely X ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
Further, in step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;represents the average of all the scores of the user u; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,represents the average of the user u's preference for all features,representing the average of how much user v prefers all features.
Further, in step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iAnd if not, adding the item i into the transaction corresponding to the user u.
Further, in step 7, for the transaction data set D, a frequent item set S is generated by using the frequent item set mining strategy based on pre-judgment screening proposed by Zhao Zhi et al (< electronic and informatics newspapers >, & 2016, 38(7), 1654-FI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt ×mThe construction method is shown in the following formula (7):
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe ith row and the jth column in the array, i ∈ (0, t), j ∈ (0, m), the frequent item set matrix FISt×mExamples are shown below
Further, in step 8, the similarity of the items is calculated by integrating the frequent item set matrix and the user item rating matrix, so that the item similarity can not only contain the displayed rating information of the original user on the items, but also reflect the internal relation among the items, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j;
Sim′1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
wherein t representsNumber of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
Further, in step 9, determining nearest neighbor users of the user u and nearest neighbor items of the item i, calculating prediction scores of the user u for all unscored items and performing Top-K recommendation, wherein the method for calculating the prediction scores of the user u for the unscored items i comprises the following steps:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set N of the item ii;
Calculating the prediction score of the user u on the unscored item iThe calculation formula is shown in the following formula (11):
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,andthe average scores for user u and user p are represented respectively,andthe average scores obtained for the items i and q are respectively represented, Sim (u, p) represents the similarity between the user u and the user v, and Sim' (i, q) represents the similarity between the items i and q. Calculating the prediction scores of the user u on all the unscored items according to the formula (11), performing descending order arrangement, and selecting the prediction scoresThe Top K items are subjected to Top-K recommendation.
Has the advantages that:
the method and the system utilize the user characteristic preference matrix and the user scoring matrix to calculate the similarity between users, utilize the frequent item set matrix and the user scoring matrix to calculate the similarity between items, and synthesize the user similarity and the item similarity to calculate the prediction score of the user on the unscored items, thereby realizing Top-K recommendation. Compared with the traditional collaborative filtering recommendation algorithm based on users and the collaborative filtering recommendation algorithm based on items, the method can effectively avoid the cold start problem and the data sparsity problem, and has better recommendation quality.
Drawings
FIG. 1 is a schematic diagram of a user characteristic preference matrix according to the present invention.
FIG. 2 is a flow chart of the present invention.
Detailed Description
The embodiment provides a collaborative filtering recommendation method based on user clustering and project association analysis, which comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and performing data cleaning operation to obtain a data set with a specific format and constructing a user project scoring matrix UIn×mAnd item feature affiliation matrix IFm×kThe value of the feature number k is usually much smaller than the number m of the items;
step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×kThe dimensionality of the preference matrix of the user to the project characteristics is greatly reduced relative to the user project scoring matrix, and the time and space complexity of a recommendation algorithm is favorably reduced;
step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm to enable the FCM algorithm to be fast and efficiently converged and avoid falling into local optimization;
step 5, calculating the similarity of the user by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the user to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m;
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
Further, step 2 further comprises: using user project scoring matrix UIn×mAnd item feature membership matrix IFm ×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) The construction process for the membership vector of the corresponding feature of item i is shown in FIG. 1.
Further, in step 3, performing min-max normalization processing on the user feature preference UFP matrix, and mapping each element value of the matrix to an interval [0, 1], where the mapping method is shown in the following formula (2):
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of preference of all users for the item characteristics.
Further, in step 4, user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, so that the FCM algorithm is fast and efficiently converged, and local optimization is avoided, and the method comprises the following steps:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding and population initialization, coding according to formula and randomly generating a population X, wherein n research objects in X are used as initial individuals, namely X is ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
Further, in step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;represents the average of all the scores of the user u; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,represents the average of the user u's preference for all features,representing the average of how much user v prefers all features.
Further, in step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iIf not, adding the item i into the transaction corresponding to the user u, and the transaction data set D is shown in Table 1.
TABLE 1
Further, in step 7, for the transaction data set D, a frequent item set S is generated by using the frequent item set mining strategy based on pre-judgment screening proposed by Zhao Zhi et al (< electronic and informatics newspapers >, & 2016, 38(7), 1654-FI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt ×mThe construction method is shown in the following formula (7):
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe ith row and the jth column in the array, i ∈ (0, t), j ∈ (0, m), the frequent item set matrix FISt×mExamples are shown below
Further, in step 8, the similarity of the items is calculated by integrating the frequent item set matrix and the user item rating matrix, so that the item similarity can not only contain the displayed rating information of the original user on the items, but also reflect the internal relation among the items, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j;
Sim′1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
where t represents the number of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
Further, in step 9, determining nearest neighbor users of the user u and nearest neighbor items of the item i, calculating prediction scores of the user u for all unscored items and performing Top-K recommendation, wherein the method for calculating the prediction scores of the user u for the unscored items i comprises the following steps:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set Ni of the item i;
calculating the prediction score of the user u on the unscored item iThe calculation formula is shown in the following formula (11):
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,andthe average scores for user u and user p are represented respectively,andthe average scores obtained for the items i and q are respectively represented, Sim (u, p) represents the similarity between the user u and the user v, and Sim' (i, q) represents the similarity between the items i and q. And (4) calculating the prediction scores of all the unscored items of the user u according to the formula (11), performing descending order arrangement, and selecting K items with the highest prediction scores to perform Top-K recommendation.
Claims (9)
1. A collaborative filtering recommendation method based on user clustering and project association analysis is characterized in that:
the method comprises the following steps:
step 1, data preprocessing, namely extracting user project scoring data and project characteristic data from raw data and carrying out data cleaning operation to construct a user project scoring matrix UIn×mAnd item feature membership matrix IFm×k;
Step 2, constructing a user characteristic preference matrix, and constructing a user characteristic preference matrix UFP by using a user item scoring matrix and an item category characteristic matrixn×k;
Step 3, carrying out min-max normalization processing on the UFP matrix, and mapping each element value of the matrix to an interval [0, 1 ];
step 4, realizing user clustering division through an FCM algorithm, and fusing a genetic algorithm with the FCM algorithm;
step 5, calculating the similarity of the users by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the preference of the users to the item characteristics;
step 6, based on the user project scoring matrix UIn×mGenerating a transaction data set D;
step 7, aiming at the transaction data set D, generating a frequent item set by using a frequent item set mining strategy based on prejudgment screening, and constructing a frequent item set matrix FISf×m;
Step 8, calculating the similarity of the projects by integrating the frequent item set matrix and the user project grading matrix, so that the project similarity can not only contain the display grading information of the original user to the projects, but also reflect the internal relation among the projects;
and 9, determining the nearest neighbor items of the user u and the item i, and performing Top-K recommendation by integrating the user similarity and the item similarity.
2. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: the step 2 further comprises the following steps: using user project scoring matrix UIn×mAnd item feature membership matrix IFm×kUFP (user preference profile) for constructing user characteristic preference matrixn×kElement R in the user characteristic preference matrixuiThe calculation process is shown in the following formula (1):
wherein r isu=(ru1,ru2,ru3,...,rum) Vector of scores for user u for items, fi=(f1i,f2i,f3i,...,fmi) And (4) the membership vector of the corresponding characteristic of the item i.
3. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 3, a min-max normalization process is performed on the user feature preference UFP matrix, and the values of the elements of the matrix are mapped to an interval [0, 1], where the mapping method is shown in the following formula (2):
wherein xijThe element value corresponding to the ith row and the jth column of the preference matrix of the user characteristics represents the preference degree of the user i to the item characteristics j, xminIs the minimum value, x, of all user preference degrees for the item characteristicsmaxThe maximum value of the preference level of all users for the item characteristics.
4. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 4, the user clustering division is realized through the FCM algorithm, and the genetic algorithm is fused with the FCM algorithm, and the steps are as follows:
firstly, initializing parameters, initializing relevant parameters including a population size M and a cross probability PcProbability of variation PmMaximum number of iterations tmaxThe cluster number c, the value of membership factor m and the convergence precision;
coding and population initialization, coding according to formula, and randomly generating a population X, wherein n research objects in X are used as initial individuals, namely X is ═ X1,x2,x3...,xn];
Calculating individual fitness fitmThe calculation method is shown in the following formula (3):
in the above formula, cj(j ═ 1, 2, 3.., k) is the center of each cluster, μi,jRepresenting the membership function of the ith sample corresponding to the jth class;
fourthly, selecting, crossing and mutating the current population to generate a new generation of individuals;
if t is tmaxWhen the genetic algorithm is finished, outputting final data, and turning to the step 7; otherwise, let t be t +1, and return to step three;
and sixthly, the whole data set is divided according to the global optimal solution in a fuzzy mode, a clustering center matrix is output, and user clustering division is achieved.
5. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 5, the similarity of the user is calculated by integrating the user characteristic preference matrix and the user item scoring matrix, so that the user similarity can not only contain explicit information of the original user item scoring matrix, but also embody implicit information of the user preference to the item characteristics, and the calculation method is shown as the following formula (4):
Sim(u,v)=λSim1(u,v)+(1-λ)Sim2(u,v) (4)
wherein λ is a weight factor, the value range is (0, 1), and Sim (u, v) represents the comprehensive similarity of the user u and the user v; sim1(u, v) represents the similarity obtained by using the original user item scoring matrix, and the calculation method is shown as the following formula (5):
wherein, IuvA set of items representing the common scores of user u and user v; r isuiIs the user u's score for item i;represents the average of all user u scoresA value; sim2(u, v) represents the similarity obtained by using the user preference matrix for the item features, and the calculation method is shown as the following formula (6):
wherein FuvSet of features representing common preferences of user u and user v, RuiIs the preference degree of the user u for the feature i, RviIs the degree of preference of user v for feature i,represents the average of the user u's preference for all features,representing the average of the user v's preference for all features.
6. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 6, based on the user item scoring matrix UIn×mGenerating a transaction data set D by scoring the item i if the user u scores the item i, namely ru,iAnd if not, adding the item i into the transaction corresponding to the user u.
7. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 7, aiming at the transaction data set D, a frequent item set mining strategy based on prejudgment screening is used for generating a frequent item set SFI=(FS1,FS2,…,FSt) FS represents a frequent item set, t represents the number of the frequent item set, and a frequent item set matrix FIS is constructedt×mThe construction method is shown in the following formula (7):
in the above formula, FijFIS matrix representing frequent itemsetsf×mThe element in the ith row and the jth column, i belongs to (0, t), j belongs to (0, m), and the frequent item set matrix FISt×mAs follows:
8. the collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in step 8, the similarity of the projects is calculated by integrating the frequent item set matrix and the user project rating matrix, so that the project similarity can not only contain the display rating information of the original user to the projects, but also reflect the internal relation among the projects, and the calculation method is shown as the following formula (8):
Sim′(i,j)=βSim′1(i,j)+(1-β)Sim′2(i,j) (8)
wherein beta is a weight factor, the value range is (0, 1), and Sim' (i, j) represents the comprehensive similarity of the item i and the item j; sim'1(u, v) represents the item similarity obtained using the original user item scoring matrix, and is calculated as shown in the following formula (9):
wherein, UijRepresenting a set of users evaluating item i and item j; r isuiIs the user u's score for item i;represents the average score for item i; sim'2(u, v) represents the item similarity obtained based on the frequent item set matrix, and the calculation method is shown in the following formula (10):
where t represents the number of frequent itemsets, FsiIndicating whether item i is included in the s-th frequent item set.
9. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein: in the step 9, the nearest neighbor users of the user u and the nearest neighbor items of the item i are determined, the prediction scores of the user u on all the unscored items are calculated, and Top-K recommendation is performed, wherein the method for calculating the prediction scores of the user u on the unscored items i is as follows:
firstly, ranking the user similarity obtained by calculation according to the formula (4) to obtain a nearest neighbor set N of a user uuSorting the user similarity obtained by calculation according to the formula (8) to obtain a nearest neighbor set N of the item ii;
Calculating the prediction score of the user u on the unscored item iThe calculation formula is shown in the following formula (11):
in the above formula, ω is a weight coefficient, NuSet of nearest neighbors for user u, NiFor the set of nearest neighbors of the item i,andthe average scores for user u and user p are represented separately,andthe method comprises the steps of respectively representing average scores obtained by an item i and an item q, Sim (u, p) represents the similarity between a user u and a user v, Sim' (i, q) represents the similarity between the item i and the item q, calculating the prediction scores of the user u for all unscored items according to a formula (11), carrying out descending order arrangement, and selecting K items with the highest prediction scores to carry out Top-K recommendation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010278287.XA CN112100512A (en) | 2020-04-10 | 2020-04-10 | Collaborative filtering recommendation method based on user clustering and project association analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010278287.XA CN112100512A (en) | 2020-04-10 | 2020-04-10 | Collaborative filtering recommendation method based on user clustering and project association analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112100512A true CN112100512A (en) | 2020-12-18 |
Family
ID=73749592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010278287.XA Withdrawn CN112100512A (en) | 2020-04-10 | 2020-04-10 | Collaborative filtering recommendation method based on user clustering and project association analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112100512A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052392A (en) * | 2020-09-10 | 2020-12-08 | 江苏电力信息技术有限公司 | Online service recommendation method based on LFM model |
CN113076478A (en) * | 2021-04-14 | 2021-07-06 | 同济大学 | Technical resource and service recommendation system based on hybrid recommendation algorithm |
CN113094542A (en) * | 2021-03-24 | 2021-07-09 | 西安交通大学 | Set ordering music recommendation method aiming at user implicit feedback data |
CN113221003A (en) * | 2021-05-20 | 2021-08-06 | 北京建筑大学 | Mixed filtering recommendation method and system based on dual theory |
CN113704608A (en) * | 2021-08-26 | 2021-11-26 | 武汉卓尔数字传媒科技有限公司 | Personalized item recommendation method and device, electronic equipment and storage medium |
CN114461899A (en) * | 2021-12-24 | 2022-05-10 | 新奥新智科技有限公司 | Collaborative filtering recommendation method and device for user, electronic equipment and storage medium |
CN114638443A (en) * | 2022-05-19 | 2022-06-17 | 安徽数智建造研究院有限公司 | Construction equipment intelligent type selection and allocation method based on improved genetic algorithm |
CN115713432A (en) * | 2022-09-21 | 2023-02-24 | 湖南科技大学 | Production element-oriented service recommendation method in industrial Internet environment |
CN117952726A (en) * | 2024-03-27 | 2024-04-30 | 摘星社信息科技(浙江)股份有限公司 | Personalized equity package recommendation system based on operator data analysis |
-
2020
- 2020-04-10 CN CN202010278287.XA patent/CN112100512A/en not_active Withdrawn
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112052392A (en) * | 2020-09-10 | 2020-12-08 | 江苏电力信息技术有限公司 | Online service recommendation method based on LFM model |
CN113094542A (en) * | 2021-03-24 | 2021-07-09 | 西安交通大学 | Set ordering music recommendation method aiming at user implicit feedback data |
CN113094542B (en) * | 2021-03-24 | 2023-08-15 | 西安交通大学 | Set ordering music recommendation method for implicit feedback data of user |
CN113076478A (en) * | 2021-04-14 | 2021-07-06 | 同济大学 | Technical resource and service recommendation system based on hybrid recommendation algorithm |
CN113221003A (en) * | 2021-05-20 | 2021-08-06 | 北京建筑大学 | Mixed filtering recommendation method and system based on dual theory |
CN113704608A (en) * | 2021-08-26 | 2021-11-26 | 武汉卓尔数字传媒科技有限公司 | Personalized item recommendation method and device, electronic equipment and storage medium |
CN114461899A (en) * | 2021-12-24 | 2022-05-10 | 新奥新智科技有限公司 | Collaborative filtering recommendation method and device for user, electronic equipment and storage medium |
CN114638443A (en) * | 2022-05-19 | 2022-06-17 | 安徽数智建造研究院有限公司 | Construction equipment intelligent type selection and allocation method based on improved genetic algorithm |
CN114638443B (en) * | 2022-05-19 | 2022-08-23 | 安徽数智建造研究院有限公司 | Construction equipment intelligent type selection and allocation method based on improved genetic algorithm |
CN115713432A (en) * | 2022-09-21 | 2023-02-24 | 湖南科技大学 | Production element-oriented service recommendation method in industrial Internet environment |
CN117952726A (en) * | 2024-03-27 | 2024-04-30 | 摘星社信息科技(浙江)股份有限公司 | Personalized equity package recommendation system based on operator data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112100512A (en) | Collaborative filtering recommendation method based on user clustering and project association analysis | |
CN106844787B (en) | Recommendation method for searching target users and matching target products for automobile industry | |
CN107833117B (en) | Bayesian personalized sorting recommendation method considering tag information | |
US20080208652A1 (en) | Method and system utilizing online analytical processing (olap) for making predictions about business locations | |
CN107220365A (en) | Accurate commending system and method based on collaborative filtering and correlation rule parallel processing | |
CN109710835B (en) | Heterogeneous information network recommendation method with time weight | |
Cintia Ganesha Putri et al. | Design of an unsupervised machine learning-based movie recommender system | |
CN114880486A (en) | Industry chain identification method and system based on NLP and knowledge graph | |
CN108563690A (en) | A kind of collaborative filtering recommending method based on object-oriented cluster | |
WO2020095357A1 (en) | Search needs assessment device, search needs assessment system, and search needs assessment method | |
CN105868422B (en) | A kind of collaborative filtering recommending method based on elastic dimensional feature vector Optimizing Extraction | |
CN116431931A (en) | Real-time incremental data statistical analysis method | |
Fareed et al. | A collaborative filtering recommendation framework utilizing social networks | |
Zheng et al. | Graph-convolved factorization machines for personalized recommendation | |
CN115829683A (en) | Power integration commodity recommendation method and system based on inverse reward learning optimization | |
CN111612583A (en) | Individualized shopping guide system based on clustering | |
Agustyaningrum et al. | Online shopper intention analysis using conventional machine learning and deep neural network classification algorithm | |
Alsalama | A hybrid recommendation system based on association rules | |
Chou et al. | The RFM Model Analysis for VIP Customer: A case study of golf clothing brand | |
Lu et al. | Artificial immune network with feature selection for bank term deposit recommendation | |
Sun et al. | A Dynamic Collaborative Filtering Algorithm based on Convolutional Neural Networks and Multi-layer Perceptron | |
CN114429384A (en) | Intelligent product recommendation method and system based on e-commerce platform | |
CN115114517A (en) | Collaborative filtering recommendation algorithm based on user attributes and item scores | |
CN114238758A (en) | User portrait prediction method based on multi-source cross-border data fusion | |
Wang et al. | NAUI: Neural attentive user interest model for cross-domain CTR prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20201218 |
|
WW01 | Invention patent application withdrawn after publication |