CN112100512A

CN112100512A - A collaborative filtering recommendation method based on user clustering and item association analysis

Info

Publication number: CN112100512A
Application number: CN202010278287.XA
Authority: CN
Inventors: 赵学健; 邱钟成; 孙知信
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2020-12-18

Abstract

The invention discloses a collaborative filtering recommendation method based on user clustering and item association analysis, aiming at the problems of cold start, data sparseness and low recommendation accuracy existing in traditional collaborative filtering recommendation algorithms. This method uses the improved fuzzy C-means clustering algorithm to mine the user's hidden feature preference degree, and uses the association analysis strategy based on pre-judgment screening to filter the frequent itemsets. On this basis, the algorithm uses the user feature preference matrix and the user rating matrix to calculate the similarity between users, uses the frequent itemset matrix and the user rating matrix to calculate the similarity between items, and integrates the user similarity and item similarity. Calculate the user's predicted rating for unrated items to achieve Top-K recommendation. Compared with the traditional user-based collaborative filtering recommendation algorithm and item-based collaborative filtering recommendation algorithm, this method can effectively avoid the problem of cold start and data sparsity, and has better recommendation quality.

Description

A collaborative filtering recommendation method based on user clustering and item association analysis

技术领域：Technical field:

本发明涉及一种协同过滤推荐方法，尤其是一种基于用户聚类和项目关联分析的协同过滤推荐方法，属于计算机数据挖掘及信息处理技术领域。The invention relates to a collaborative filtering recommendation method, in particular to a collaborative filtering recommendation method based on user clustering and item association analysis, and belongs to the technical field of computer data mining and information processing.

技术背景：technical background:

随着电子商务的迅速发展，电商平台提供的商品种类和数量急剧增长，商品信息过载时代来临。面对海量的商品信息，具有明确需求的用户可通过电商平台提供的搜索功能定位想要购买的商品。然而，当用户需求不确定或者具有模糊性，难以通过关键词进行搜索定位时，如何帮助用户快速的找到感兴趣的商品极为重要。推荐系统应运而生，作为一种有效的信息处理工具，其通过用户的历史行为信息，将用户和商品关联起来，解决信息过载的问题。目前，推荐系统已经成功应用于电子商务、在线音乐、视频网站以及社交平台等众多领域。据亚马逊统计，在其网站购物的客户中，有明确购买意向的用户仅占16％，有超过20％～30％的销售来自于推荐系统。With the rapid development of e-commerce, the types and quantities of commodities provided by e-commerce platforms have grown rapidly, and the era of commodity information overload is coming. In the face of massive product information, users with clear needs can locate the products they want to buy through the search function provided by the e-commerce platform. However, when user needs are uncertain or ambiguous, and it is difficult to search and locate by keywords, how to help users quickly find interesting products is extremely important. Recommender system emerged as the times require. As an effective information processing tool, it associates users with products through the user's historical behavior information to solve the problem of information overload. At present, recommender systems have been successfully applied in many fields such as e-commerce, online music, video websites and social platforms. According to Amazon's statistics, among the customers shopping on its website, only 16% of users have a clear purchase intention, and more than 20% to 30% of the sales come from the recommendation system.

推荐算法是推荐系统的重要组成部分，是推荐系统性能好坏的关键所在。推荐算法的种类有很多，常用的推荐算法有基于人口统计的推荐算法、基于内容的推荐算法、基于关联规则的推荐算法、协同过滤推荐算法，混合推荐算法等。其中，协同过滤推荐算法是目前发展最成熟、应用最广泛的个性化推荐技术之一，主要包括基于用户的协同过滤推荐算法和基于项目的协同过滤推荐算法。然而，这两种协同过滤推荐算法及大多数以这两种算法为基础的改进算法都存在冷启动、数据稀疏和推荐准确率不高的问题。The recommendation algorithm is an important part of the recommendation system and the key to the performance of the recommendation system. There are many types of recommendation algorithms. Commonly used recommendation algorithms include demographic-based recommendation algorithms, content-based recommendation algorithms, association rule-based recommendation algorithms, collaborative filtering recommendation algorithms, and hybrid recommendation algorithms. Among them, collaborative filtering recommendation algorithm is one of the most mature and widely used personalized recommendation technologies, mainly including user-based collaborative filtering recommendation algorithm and item-based collaborative filtering recommendation algorithm. However, these two collaborative filtering recommendation algorithms and most of the improved algorithms based on these two algorithms have the problems of cold start, sparse data and low recommendation accuracy.

发明内容SUMMARY OF THE INVENTION

针对传统协同过滤推荐算法存在的冷启动、数据稀疏及推荐准确率低等问题，公开了一种基于用户聚类和项目关联分析的协同过滤推荐方法，如图1所示，包括如下步骤：Aiming at the problems of cold start, sparse data and low recommendation accuracy in traditional collaborative filtering recommendation algorithms, a collaborative filtering recommendation method based on user clustering and item association analysis is disclosed, as shown in Figure 1, including the following steps:

步骤1，数据预处理，从原始数据中提取用户项目评分数据和项目特征数据并进行数据清洗操作，获得特定格式的数据集，并构建用户项目评分矩阵UI^n×m和项目特征隶属矩阵IF^m×k，通常特征数目k的取值远小于项目的数量m；Step 1, data preprocessing, extract user item rating data and item feature data from the original data and perform data cleaning operations to obtain a dataset in a specific format, and construct a user item rating matrix UI ^n×m and item feature membership matrix IF ^{m ×k} , usually the value of the number of features k is much smaller than the number of items m;

步骤2，构建用户特征偏好矩阵，利用用户项目评分矩阵和项目类别特征矩阵构建用户特征偏好矩阵UFP^n×k，用户对项目特征的偏好矩阵相对于用户项目评分矩阵维度得到了极大降低，有利于降低推荐算法的时间和空间复杂度；Step 2, construct a user feature preference matrix, and use the user item rating matrix and the item category feature matrix to construct a user feature preference matrix UFP ^n×k , the user’s preference matrix for item features is greatly reduced relative to the dimension of the user item rating matrix, there are It is beneficial to reduce the time and space complexity of the recommendation algorithm;

步骤3，对UFP矩阵进行min-max归一化处理，将矩阵各元素数值映射到区间[0，1]；Step 3, perform min-max normalization on the UFP matrix, and map the value of each element of the matrix to the interval [0, 1];

步骤4，通过FCM算法实现用户聚类划分，并将遗传算法与FCM的算法融合，使 FCM算法快速高效收敛，避免陷入局部最优；Step 4: Realize user clustering and division through FCM algorithm, and integrate genetic algorithm with FCM algorithm, so that FCM algorithm converges quickly and efficiently, avoiding falling into local optimum;

步骤5，综合用户特征偏好矩阵和用户项目评分矩阵计算用户的相似度，使用户相似度既能包含原始用户项目评分矩阵的显性信息，又能体现到用户对项目特征偏好的隐性信息；Step 5, calculating the similarity of the user by synthesizing the user feature preference matrix and the user item rating matrix, so that the user similarity can include not only the explicit information of the original user item rating matrix, but also the implicit information of the user's item feature preference;

步骤6，基于用户项目评分矩阵UI^n×m，生成事务数据集D；Step 6, based on the user item scoring matrix UI ^n×m , generate a transaction data set D;

步骤7，针对事务数据集D，使用基于预判筛选的频繁项集挖掘策略生成频繁项集，并构建频繁项集矩阵FIS^f×m；Step 7, for the transaction data set D, use the frequent itemset mining strategy based on pre-judgment screening to generate frequent itemsets, and construct a frequent itemset matrix FIS ^f×m ;

步骤8，综合频繁项集矩阵和用户项目评分矩阵计算项目的相似度，使项目相似度既能包含原始用户对项目的显示评分信息，又能体现项目间的内在联系；Step 8, calculating the similarity of the items by synthesizing the frequent itemset matrix and the user item scoring matrix, so that the item similarity can not only include the displayed scoring information of the original user on the item, but also reflect the internal connection between the items;

步骤9，确定用户u的最近邻用户和项目i的最近邻项目，综合用户相似度和项目相似度进行Top-K推荐。Step 9: Determine the nearest neighbor user of user u and the nearest neighbor item of item i, and perform Top-K recommendation based on user similarity and item similarity.

进一步的，步骤2中还包括：利用用户项目评分矩阵UI^n×m和项目特征隶属矩阵 IF^m ^×k构建用户特征偏好矩阵UFP^n×k，用户特征偏好矩阵中元素R_ui计算过程如下式(1) 所示：Further, step 2 also includes: constructing a user feature preference matrix UFP ^n×k by using the user item rating matrix UI ^n×m and the item feature membership matrix IF ^m ^× k , and the calculation process of the element R _ui in the user feature preference matrix is as follows ( 1) as shown:

其中，r_u＝(r_u1，r_u2，r_u3，...，r_um)为用户u对项目的评分向量，f_i＝(f_1i，f_2i，f_3i，...，f_mi)为项目i对应特征的隶属向量，构建过程如附图1所示。Among them, r _u = (r _u1 , r _u2 , r _u3 , ..., r _um ) is the rating vector of the item by user u, f _i = (f _1i , f _2i , f _3i , ..., f _mi ) ) is the membership vector of the corresponding feature of item i, and the construction process is shown in Figure 1.

进一步的，步骤3中，对用户特征偏好UFP矩阵进行min-max归一化处理，将矩阵各元素数值映射到区间[0，1]，映射方法如下式(2)所示：Further, in step 3, min-max normalization is performed on the user feature preference UFP matrix, and the value of each element of the matrix is mapped to the interval [0, 1]. The mapping method is shown in the following formula (2):

其中x_ij为用户特征偏好矩阵第i行第j列对应的元素值，表示用户i对项目特征j的偏爱程度， x_min为所有用户对项目特征偏爱程度的最小值，x_max为所有用户对项目特征偏爱程度的最大值。where x _ij is the element value corresponding to the i-th row and the j-th column of the user feature preference matrix, indicating the preference degree of user i to item feature j, x _min is the minimum value of all users’ preference for item feature j, and x _max is the The maximum value of item feature preference degree.

进一步的，步骤4中，通过FCM算法实现用户聚类划分，并将遗传算法与FCM 的算法融合，使FCM算法快速高效收敛，避免陷入局部最优，步骤如下：Further, in step 4, the user clustering division is realized by the FCM algorithm, and the genetic algorithm and the FCM algorithm are integrated to make the FCM algorithm converge quickly and efficiently, and avoid falling into a local optimum. The steps are as follows:

①参数初始化，初始化相关参数，包括种群大小M，交叉概率P_c，变异概率P_m，最大迭代次数t_max，聚类簇数c、隶属度因子m的值，收敛精度ε；①Parameter initialization, initialize related parameters, including population size M, crossover probability P _c , mutation probability P _m , maximum number of iterations t _max , number of clusters c, value of membership factor m, convergence accuracy ε;

②编码及种群初始化，根据公式进行编码，并随机产生一个种群X，X中有n个研究对象作为初始个体，即X＝[x₁，x₂，x₃...，x_n]；②Coding and population initialization, coding according to the formula, and randomly generating a population X, there are n research objects in X as initial individuals, that is, X=[x ₁ , x ₂ , x ₃ ..., x _n ];

③计算个体适应度fit_m，计算方法如下式(3)所示：③ Calculate the individual fitness fit _m , the calculation method is shown in the following formula (3):

上式中，c_j(j＝1，2，3，...，k)为每个聚类的中心，μ_i，j表示第i个样本对应第j类的隶属度函数；In the above formula, c _j (j=1, 2, 3, ..., k) is the center of each cluster, and μ _{i, j} represents the membership function of the i-th sample corresponding to the j-th class;

④对当前种群执行选择、交叉和变异操作，产生新一代个体；④ Perform selection, crossover and mutation operations on the current population to generate a new generation of individuals;

⑤若t＝t^max，遗传算法结束，输出最终的数据，并转入步骤7；否则，令t＝t+1，并返回步骤③；⑤If t=t ^max , the genetic algorithm ends, outputs the final data, and goes to step 7; otherwise, set t=t+1, and return to step ③;

⑥根据全局最优解模糊划分整个数据集，输出聚类中心矩阵，实现用户聚类划分。⑥ According to the global optimal solution, the entire data set is divided fuzzy, and the cluster center matrix is output to realize the user cluster division.

进一步的，步骤5中，综合用户特征偏好矩阵和用户项目评分矩阵计算用户的相似度，使用户相似度既能包含原始用户项目评分矩阵的显性信息，又能体现到用户对项目特征偏好的隐性信息，计算方法如下式(4)所示：Further, in step 5, the similarity of the user is calculated by integrating the user feature preference matrix and the user item rating matrix, so that the user similarity can not only include the explicit information of the original user item rating matrix, but also reflect the user's preference for item features. Implicit information, the calculation method is shown in the following formula (4):

Sim(u，v)＝λSim₁(u，v)+(1-λ)Sim₂(u，v) (4)Sim(u, v)=λSim ₁ (u, v)+(1-λ)Sim ₂ (u, v) (4)

其中λ是权重因子，取值范围为(0，1)，Sim(u，v)表示用户u和用户v的综合相似度；Sim₁(u，v)表示使用原始用户项目评分矩阵得到的相似度，计算方法如下式(5)所示：where λ is the weight factor, the value range is (0, 1), Sim(u, v) represents the comprehensive similarity between user u and user v; Sim ₁ (u, v) represents the similarity obtained by using the original user item rating matrix degree, the calculation method is shown in the following formula (5):

其中，I_uv表示用户u和用户v共同评分的项目构成的集合；r_ui是用户u对项目i的评分；

表示用户u所有评分的平均值；Sim₂(u，v)表示使用用户对项目特征偏好矩阵得到的相似度，计算方法如下式(6)所示：Among them, I _uv represents the set of items that user u and user v score jointly; r _ui is the score of item i by user u;

Represents the average value of all ratings of user u; Sim ₂ (u, v) represents the similarity obtained by using the user's preference matrix for item features, and the calculation method is shown in the following formula (6):

其中F_uv表示用户u和用户v共同偏好的特征的集合，R_ui是用户u对特征i的偏好程度，R_vi是用户v对特征i的偏好程度，

表示用户u对所有特征偏好程度的平均值，

表示用户v对所有特征偏好程度的平均值。where F _uv represents the set of features that user u and user v share in preference, R _ui is user u’s preference for feature i, and _Rvi is user v’s preference for feature i,

represents the average of user u's preference for all features,

Represents the average value of user v's preference for all features.

进一步的，步骤6中，基于用户项目评分矩阵UI^n×m，生成事务数据集D，生成方法为若用户u对项目i进行了评分，即r_u，i非空，则将项目i加入用户u对应的事务。Further, in step 6, the transaction data set D is generated based on the user item scoring matrix UI ^n×m , and the generating method is that if the user u has scored the item i, that is, r _{u, i} is not empty, then the item i is added to the user. u corresponds to the transaction.

进一步的，步骤7中，针对事务数据集D，使用赵学健等(＜电子与信息学报＞，2016， 38(7)，1654-1659)提出的基于预判筛选的频繁项集挖掘策略生成频繁项集集合 S_FI＝(FS₁，FS₂，…，FS_t)，FS表示频繁项集，t表示频繁项集的个数，并构建频繁项集矩阵 FIS^t ^×m，构建方法如下公式(7)所示：Further, in step 7, for the transaction data set D, use the frequent itemset mining strategy based on prejudgment screening proposed by Zhao Xuejian et al. Set set S _FI = (FS ₁ , FS ₂ ,..., FS _t ), FS represents frequent itemsets, t represents the number of frequent itemsets, and constructs frequent itemsets matrix FIS ^t ^×m , the construction method is as follows (7 ) as shown:

上式中，F_ij表示频繁项集矩阵FIS^f×m中第i行第j列的元素，i∈(0，t)，j∈(0，m)，频繁项集矩阵FIS^t×m示例如下所示In the above formula, F _ij represents the element of the i-th row and the j-th column in the frequent itemset matrix FIS ^f×m , i∈(0, t), j∈(0, m), the frequent itemset matrix FIS ^t×m example As follows

进一步的，步骤8中，综合频繁项集矩阵和用户项目评分矩阵计算项目的相似度，使项目相似度既能包含原始用户对项目的显示评分信息，又能体现项目间的内在联系，计算方法如下式(8)所示：Further, in step 8, the similarity of the items is calculated by synthesizing the frequent itemset matrix and the user item scoring matrix, so that the similarity of the items can not only include the displayed scoring information of the original user on the item, but also reflect the internal relationship between the items. The calculation method It is shown in the following formula (8):

Sim′(i，j)＝βSim′₁(i，j)+(1-β)Sim′₂(i，j) (8)Sim'(i,j)=βSim'1(i,j)+( ₁ -β) _Sim'2 (i,j) (8)

其中β是权重因子，取值范围为(0，1)，Sim′(i，j)表示项目i和项目j的综合相似度；Where β is the weight factor, the value range is (0, 1), Sim'(i, j) represents the comprehensive similarity between item i and item j;

Sim′₁(u，v)表示使用原始用户项目评分矩阵得到的项目相似度，计算方法如下式(9)所示：Sim′ ₁ (u, v) represents the item similarity obtained by using the original user item rating matrix, and the calculation method is shown in the following formula (9):

其中，U_ij表示评价项目i和项目j的用户集合；r_ui是用户u对项目i的评分；

表示对项目i的平均评分；Sim′₂(u，v)表示基于频繁项集矩阵得到的项目相似度，计算方法如下式(10) 所示：Among them, U _ij represents the set of users who evaluate item i and item j; r _ui is the rating of user u on item i;

Represents the average score for item i; Sim′ ₂ (u, v) represents the item similarity obtained based on the frequent itemset matrix, and the calculation method is shown in the following formula (10):

其中t表示频繁项集的数目，F_si表示第s个频繁项集中是否包括项目i。where t represents the number of frequent itemsets, and _Fsi represents whether item i is included in the s-th frequent itemset.

进一步的，步骤9中，确定用户u的最近邻用户和项目i的最近邻项目，计算用户u对所有未评分项目的预测评分并进行Top-K推荐，用户u对未评分项目i预测评分计算方法如下：Further, in step 9, the nearest neighbors of user u and the nearest neighbors of item i are determined, the predicted scores of user u for all unrated items are calculated and the Top-K recommendation is performed, and the predicted scores of user u for unrated item i are calculated. Methods as below:

①对根据公式(4)计算得到的用户相似度进行排序得到用户u的最近邻居集合N_u，对根据公式(8)计算得到的用户相似度进行排序得到项目i的最近邻居集合N_i；① Sort the user similarity calculated according to formula (4) to obtain the nearest neighbor set Nu of user _u , and sort the user similarity calculated according to formula (8) to obtain the nearest neighbor set N _{i of item i} ;

②计算用户u对未评分项目i的预测评分

计算公式如下式(11)所示：② Calculate user u's predicted rating for unrated item i

The calculation formula is shown in the following formula (11):

上式中，ω为权重系数，N_u为用户u的最近邻居集合，N_i为项目i的最近邻居集合，

和

分别表示用户u和用户p的平均评分，

和

分别表示项目i和项目q获得的平均评分，Sim(u，p) 表示用户u和用户v的相似度，Sim′(i，q)表示项目i和项目q的相似度。根据(11)式计算用户u对所有未评分项目的预测评分，并进行降序排列，选择预测评分最高的K个项目进行 Top-K推荐。In the above formula, ω is the weight coefficient, Nu is the nearest neighbor set of user _u , Ni is the nearest neighbor set of item _i ,

and

are the average ratings of user u and user p, respectively,

and

are the average scores obtained by item i and item q, respectively, Sim(u, p) is the similarity between user u and user v, and Sim'(i, q) is the similarity between item i and item q. According to formula (11), the predicted scores of user u for all unrated items are calculated and sorted in descending order, and the K items with the highest predicted scores are selected for Top-K recommendation.

有益效果：Beneficial effects:

本发明利用用户特征偏好矩阵和用户评分矩阵计算用户之间的相似度，利用频繁项集矩阵和用户评分矩阵计算项目之间的相似度，并综合用户相似度和项目相似度计算用户对未评分项目的预测评分，实现Top-K推荐。该方法相比于传统的基于用户的协同过滤推荐算法和基于项目的协同过滤推荐算法能够有效避免冷启动问题和数据稀疏性问题，具有更好的推荐质量。The invention calculates the similarity between users by using the user feature preference matrix and the user scoring matrix, calculates the similarity between items by using the frequent itemset matrix and the user scoring matrix, and calculates the user's unscored rating based on the user similarity and the item similarity. Project's predicted score to achieve Top-K recommendation. Compared with the traditional user-based collaborative filtering recommendation algorithm and item-based collaborative filtering recommendation algorithm, this method can effectively avoid the problem of cold start and data sparsity, and has better recommendation quality.

附图说明Description of drawings

图1为本发明中用户特征偏好矩阵构建示意图。FIG. 1 is a schematic diagram of constructing a user feature preference matrix in the present invention.

图2为本发明流程图。Figure 2 is a flow chart of the present invention.

具体实施方式Detailed ways

本实施例提供了一种基于用户聚类和项目关联分析的协同过滤推荐方法，包括如下步骤：The present embodiment provides a collaborative filtering recommendation method based on user clustering and item association analysis, comprising the following steps:

表示用户u对所有特征偏好程度的平均值，

represents the average of user u's preference for all features,

Represents the average value of user v's preference for all features.

进一步的，步骤6中，基于用户项目评分矩阵UI^n×m，生成事务数据集D，生成方法为若用户u对项目i进行了评分，即r_u，i非空，则将项目i加入用户u对应的事务，事务数据集D如表1所示。Further, in step 6, the transaction data set D is generated based on the user item scoring matrix UI ^n×m , and the generating method is that if the user u has scored the item i, that is, r _{u, i} is not empty, then the item i is added to the user. The transaction corresponding to u, the transaction data set D is shown in Table 1.

表1Table 1

①对根据公式(4)计算得到的用户相似度进行排序得到用户u的最近邻居集合N_u，对根据公式(8)计算得到的用户相似度进行排序得到项目i的最近邻居集合Ni；① Sort the user similarity calculated according to formula (4) to obtain the nearest neighbor set Nu of user _u , and sort the user similarity calculated according to formula (8) to obtain the nearest neighbor set Ni of item i;

②计算用户u对未评分项目i的预测评分

The calculation formula is shown in the following formula (11):

和

分别表示用户u和用户p的平均评分，

和

and

are the average ratings of user u and user p, respectively,

and

Claims

1. a collaborative filtering recommendation method based on user clustering and item association analysis, is characterized in that:

It includes the following steps:

Step 1, data preprocessing, extracting user item rating data and item feature data from the original data and performing data cleaning operations to construct a user item rating matrix UI ^n×m and an item feature membership matrix IF ^m×k ;

Step 2, construct the user feature preference matrix, utilize the user item rating matrix and the item category feature matrix to construct the user feature preference matrix UFP ^n×k ;

Step 3, perform min-max normalization on the UFP matrix, and map the value of each element of the matrix to the interval [0, 1];

Step 4, realize user clustering division by FCM algorithm, and fuse the algorithm of genetic algorithm and FCM;

Step 5, calculating the similarity of the user by synthesizing the user feature preference matrix and the user item rating matrix, so that the user similarity can include not only the explicit information of the original user item rating matrix, but also the implicit information of the user's item feature preference;

Step 6, based on the user item scoring matrix UI ^n×m , generate a transaction data set D;

Step 7, for the transaction data set D, use the frequent itemset mining strategy based on pre-judgment screening to generate frequent itemsets, and construct a frequent itemset matrix FIS ^f×m ;

Step 8, calculating the similarity of the items by synthesizing the frequent itemset matrix and the user item scoring matrix, so that the item similarity can not only include the displayed scoring information of the original user on the item, but also reflect the internal connection between the items;

Step 9: Determine the nearest neighbor user of user u and the nearest neighbor item of item i, and perform Top-K recommendation based on user similarity and item similarity.

2. the collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, is characterized in that: in described step 2, also comprises: utilize user item scoring matrix UI ^{n * m} and item feature membership matrix IF ^m×k constructs the user characteristic preference matrix UFP ^n×k , and the calculation process of the element R _ui in the user characteristic preference matrix is shown in the following formula (1):

Among them, r _u = (r _u1 , r _u2 , r _u3 , ..., r _um ) is the rating vector of the item by user u, f _i = (f _1i , f _2i , f _3i , ..., f _mi ) ) is the membership vector of the corresponding feature of item i.

3. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, characterized in that: in the step 3, min-max normalization is performed on the user feature preference UFP matrix, and the matrix is The value of each element is mapped to the interval [0, 1], and the mapping method is shown in the following formula (2):

where x _ij is the element value corresponding to the i-th row and j-th column of the user feature preference matrix, indicating the preference degree of user i to item feature j, x _min is the minimum value of all users’ preference for item feature j, and x _max is the The maximum value of item feature preference degree.

4. the collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, is characterized in that: in described step 4, realize user clustering division by FCM algorithm, and the algorithm of Genetic Algorithm and FCM Fusion, the steps are as follows:

①Parameter initialization, initialize related parameters, including population size M, crossover probability P _c , mutation probability P _m , maximum number of iterations t _max , number of clusters c, value of membership factor m, convergence accuracy ε;

②Coding and population initialization, coding according to the formula, and randomly generating a population X, there are n research objects in X as initial individuals, that is, X=[x ₁ , x ₂ , x ₃ ..., x _n ];

③ Calculate the individual fitness fit _m , the calculation method is shown in the following formula (3):

In the above formula, c _j (j=1, 2, 3, ..., k) is the center of each cluster, and μ _{i, j} represents the membership function of the i-th sample corresponding to the j-th class;

④ Perform selection, crossover and mutation operations on the current population to generate a new generation of individuals;

⑤ If t=t ^max , the genetic algorithm ends, outputs the final data, and goes to step 7; otherwise, set t=t+1, and return to step ③;

⑥ According to the global optimal solution, the entire data set is divided fuzzy, and the cluster center matrix is output to realize the user cluster division.

5. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, is characterized in that: in the described step 5, the user's similarity is calculated by the comprehensive user feature preference matrix and the user item scoring matrix, so that User similarity can contain not only the explicit information of the original user item rating matrix, but also the implicit information of the user's preference for item features. The calculation method is shown in the following formula (4):

Sim(u, v)=λSim ₁ (u, v)+(1-λ)Sim ₂ (u, v) (4)

where λ is the weight factor, the value range is (0, 1), Sim(u, v) represents the comprehensive similarity between user u and user v; Sim ₁ (u, v) represents the similarity obtained by using the original user item rating matrix degree, the calculation method is shown in the following formula (5):

Among them, I _uv represents the set of items that user u and user v score jointly; r _ui is the score of item i by user u;

where F _uv represents the set of features that user u and user v share in preference, R _ui is user u’s preference for feature i, and _Rvi is user v’s preference for feature i,

represents the average of user u's preference for all features,

Represents the average value of user v's preference for all features.

6. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, characterized in that: in the step 6, based on the user item scoring matrix UI ^{n × m} , a transaction data set D is generated, and a The method is to add item i to the transaction corresponding to user u if user u has scored item i, that is, r _{u, i} is not empty.

7. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, wherein in the step 7, for the transaction data set D, a frequent itemset mining strategy based on pre-judgment screening is used Generate frequent itemsets set S _FI = (FS ₁ , FS ₂ , ..., FS _t ), FS represents frequent itemsets, t represents the number of frequent itemsets, and constructs frequent itemsets matrix FIS ^t×m , the construction method is as follows Formula (7) shows:

In the above formula, F _ij represents the element of the i-th row and the j-th column in the frequent itemset matrix FIS ^f×m , i∈(0, t), j∈(0, m), and the frequent itemset matrix FIS ^t×m is as follows shown:

8. the collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, is characterized in that: in step 8, the similarity of the project is calculated by the comprehensive frequent itemset matrix and the user project scoring matrix, so that the projects are similar The degree can not only contain the original user's displayed rating information for the item, but also reflect the internal connection between the items. The calculation method is shown in the following formula (8):

Sim'(i,j)=βSim'1(i,j)+( ₁ -β) _Sim'2 (i,j) (8)

where β is the weight factor, the value range is (0, 1), Sim'(i, j) represents the comprehensive similarity between item i and item j; Sim' ₁ (u, v) represents the use of the original user item rating matrix to obtain The item similarity of , the calculation method is shown in the following formula (9):

Among them, U _ij represents the set of users who evaluate item i and item j; r _ui is the rating of user u on item i;

represents the average rating of item i; Sim′ ₂ (u, v) represents the item similarity obtained based on the frequent itemset matrix, and the calculation method is shown in the following formula (10):

where t represents the number of frequent itemsets, and _Fsi represents whether item i is included in the s-th frequent itemset.

9. The collaborative filtering recommendation method based on user clustering and item association analysis according to claim 1, characterized in that: in the step 9, determine the nearest neighbor user of user u and the nearest neighbor item of item i, calculate the user u predicts the ratings of all unrated items and performs Top-K recommendation. User u predicts the ratings of unrated items i as follows:

① Sort the user similarity calculated according to formula (4) to obtain the nearest neighbor set Nu of user _u , and sort the user similarity calculated according to formula (8) to obtain the nearest neighbor set N _{i of item i} ;

② Calculate user u's predicted rating for unrated item i

The calculation formula is shown in the following formula (11):

In the above formula, ω is the weight coefficient, Nu is the nearest neighbor set of user _u , Ni is the nearest neighbor set of item _i ,

and

are the average ratings of user u and user p, respectively,

and

are the average scores obtained by item i and item q, respectively, Sim(u, p) is the similarity between user u and user v, Sim′(i, q) is the similarity between item i and item q, according to formula (11) Calculate the predicted scores of user u for all unrated items, sort them in descending order, and select the K items with the highest predicted scores for Top-K recommendation.