CN106354886A

CN106354886A - Method for screening nearest neighbor by using potential neighbor relation graph in recommendation system

Info

Publication number: CN106354886A
Application number: CN201610909600.9A
Authority: CN
Inventors: 王晓军
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University
Priority date: 2016-10-18
Filing date: 2016-10-18
Publication date: 2017-01-25
Anticipated expiration: 2036-10-18
Also published as: CN106354886B

Abstract

The invention discloses a method for screening nearest neighbors by using a potential neighbor relationship graph in a recommendation system, comprising the following steps: (1) generating an object cluster set C with redundancy; (2) constructing a potential neighbor relationship graph corresponding to the cluster set C ; (3) quantify the weight of each edge in the potential neighbor relationship graph, and the weight of the edge indicates the possibility that the two objects adjacent to the edge become the nearest neighbor; (4) cut the potential neighbor relationship and eliminate redundant comparisons; (5 ) utilizes the cropped potential neighbor graph to screen the nearest neighbors of the object. Under the premise of ensuring the accuracy of the recommendation, this screening method maps the recommendation based on the complete large-scale data set to the recommendation of a smaller data set, reduces the scale of the recommendation system, and ensures the high efficiency of the recommendation method.

Description

A Method for Screening Nearest Neighbors Using Potential Neighbor Graphs in Recommender Systems

技术领域technical field

本发明涉及推荐技术领域，特别是在推荐系统中利用潜在邻居关系图筛选最近邻居的方法。The invention relates to the technical field of recommendation, in particular to a method for screening nearest neighbors using a potential neighbor relationship graph in a recommendation system.

背景技术Background technique

当前不断增加的数据量使用户需要花费很多的时间才能找到有价值的信息。协同过滤被认为是解决信息超载问题的有效技术之一，已广泛应用于电影、音乐、图书、旅行、新闻等推荐领域。协同过滤方法主要的思路是根据相似用户对项目的喜好预测目标用户对项目的观点，或是依据用户对相似项目的意见为目标项目实施推荐。因此，协同过滤需要解决的一个关键问题是：如何有效选择目标用户或项目的最近邻居。但推荐系统拥有数以百万计用户和项目，其规模仍持续增长，用传统方法在海量数据中搜索最近邻居难以保证在合理时间内提供较准确的推荐。Currently, the ever-increasing amount of data makes it take a lot of time for users to find valuable information. Collaborative filtering is considered to be one of the effective technologies to solve the problem of information overload, and has been widely used in recommendation fields such as movies, music, books, travel, and news. The main idea of the collaborative filtering method is to predict the target user's view on the item based on similar users' preferences on the item, or to recommend the target item based on the user's opinion on the similar item. Therefore, a key problem that collaborative filtering needs to solve is: how to effectively select the nearest neighbors of target users or items. However, the recommendation system has millions of users and projects, and its scale continues to grow. It is difficult to provide more accurate recommendations within a reasonable time by using traditional methods to search for nearest neighbors in massive data.

协同过滤方法的核心是需要对推荐系统中所有项目(或用户)进行两两比较，通过计算彼此之间的相似度，选择最近邻居。两两比较的次数越多，运行的效率越低，但其搜索到最近邻居的可能性越高；反之两两比较的次数越少，其运行效率越高，但错失最近邻居的可能性也越高，从而影响到推荐精度。再加之推荐系统中数据内在关联关系密切而复杂，价值密度分布极不均衡，对推荐系统中海量数据的计算不能像小样本数据集那样依赖于对全局数据的统计分析和迭代计算，需要探索数据按需约简方法。The core of the collaborative filtering method is to compare all items (or users) in the recommendation system, and select the nearest neighbors by calculating the similarity between them. The more pairwise comparisons, the lower the efficiency of the operation, but the higher the probability of finding the nearest neighbor; on the contrary, the fewer the number of pairwise comparisons, the higher the operating efficiency, but the higher the possibility of missing the nearest neighbor. High, thus affecting the recommendation accuracy. In addition, the internal correlation of data in the recommendation system is close and complex, and the distribution of value density is extremely uneven. The calculation of massive data in the recommendation system cannot rely on the statistical analysis and iterative calculation of global data like the small sample data set, and needs to explore the data. Reduce methods on demand.

发明内容Contents of the invention

本发明所要解决的技术问题是克服现有技术的不足而提供在推荐系统中利用潜在邻居关系图筛选最近邻居的方法，本发明方法以剔除冗余和多余的比较，既保障推荐效率，又不牺牲推荐精度，确保在合理的时间内提供较准确的推荐。The technical problem to be solved by the present invention is to overcome the deficiencies of the prior art and provide a method for screening the nearest neighbors in the recommendation system using the potential neighbor relationship graph. Sacrifice recommendation accuracy to ensure more accurate recommendations within a reasonable amount of time.

本发明为解决上述技术问题采用以下技术方案：The present invention adopts the following technical solutions for solving the problems of the technologies described above:

根据本发明提出的在推荐系统中利用潜在邻居关系图筛选最近邻居的方法，包括以下步骤：According to the method proposed in the present invention for screening nearest neighbors in a recommendation system using a potential neighbor relationship graph, the method comprises the following steps:

步骤1、设i∈O，O是需要筛选最近邻居的对象集，i是对象，采用模糊聚类技术依据对象的特征向量，将对象i按预先设定的概率指派到多个簇中，由此产生含K个对象簇的簇集合C；Step 1. Let i∈O, O is the object set that needs to filter the nearest neighbors, i is the object, use the fuzzy clustering technology to assign the object i to multiple clusters according to the preset probability according to the feature vector of the object, by This produces a cluster set C containing K object clusters;

步骤2、构建簇集合C对应的潜在邻居关系图G_C＝{V_C,E_C}，其中，V_C是顶点集合，E_C是无向边集合；具体如下：Step 2. Construct the potential neighbor relationship graph G _C ={V _C , E _C } corresponding to the cluster set C, where V _C is the set of vertices, and E _C is the set of undirected edges; the details are as follows:

若对象i和对象j同时出现在簇集合C的同一个簇c中，则称对象i和j为共现对并记为＜i,j＞；对于簇集合C中每对共现对＜i,j＞，先将对象i和j对应的顶点v_i和v_j加入到图G_C中，若两个对象i和j之间没有存在无向边，则用边e_i,j连接顶点v_i和v_j；其中，图G_C中每条边e_i,j表示一潜在的邻居关系，边e_i,j邻接的两个顶点v_i和v_j对应的对象i和j称为邻接对象，j∈O，c∈C，v_i∈V_C，v_j∈V_C，e_i,j∈E_C；If object i and object j appear in the same cluster c of cluster set C at the same time, then object i and j are called co-occurrence pairs and recorded as <i, j>; for each pair of co-occurrence pairs in cluster set C <i ,j>, first add the vertices v _i and v _j corresponding to the objects i and j to the graph G _C , if there is no undirected edge between the two objects i and j, use the edge e _i,j to connect the vertex v _i and v _j ; where, each edge e _i,j in graph G _C represents a potential neighbor relationship, and the objects i and _j corresponding to the two vertices v _i and v j adjacent to edge e _i,j are called adjacent objects , j∈O, c∈C, v _i ∈ V _C , v _j ∈ V _C , e _i,j ∈ _{E C} ;

步骤3、量化图G_C中每条边的权重；Step 3, quantify the weight of each edge in the graph G _C ;

步骤4、对图G_C进行裁剪，删除潜在邻居关系图G_C中边的权重低于w_min的边，余下边构成了一新图G_C'；其中，w_min为设置的最小权重阈值；Step 4. Cut the graph G _C , delete the edges whose weights are lower than w _min in the potential neighbor graph G _C , and the remaining edges constitute a new graph G _C '; where w _min is the minimum weight threshold set;

步骤5、选取对象i作为目标，利用裁剪后的潜在邻居关系图G_C'筛选目标的最近邻居，针对G_C'图中目标i的每条邻接边e_i,j，比较并计算效用向量Rⁱ与R^j之间的相似度，然后依据近邻选择条件在目标i的所有邻接对象中筛选其最近邻居；其中，Rⁱ表示对象i的效用向量，R^j表示对象j的效用向量。Step 5. Select the object i as the target, use the cropped potential neighbor relationship graph G _C ' to screen the nearest neighbors of the target, and compare and calculate the utility vector R for each adjacent edge e _i,j of the target i in the graph G _C ' The similarity between ⁱ and R ^j , and then select its nearest neighbors among all adjacent objects of target i according to the neighbor selection criteria; where R ⁱ represents the utility vector of object i, and R ^j represents the utility vector of object j.

作为本发明所述的在推荐系统中利用潜在邻居关系图筛选最近邻居的方法进一步优化方案，所述步骤3中采用下式计算边e_i,j的权重e_i,j.weight：As a further optimization scheme of the method for screening the nearest neighbors using the potential neighbor relationship graph in the recommendation system according to the present invention, in the step 3, the following formula is used to calculate the weight e _i,j .weight of the edge e _i,j :

${e e}_{i i,, j j} . . w w e e i i g g h h t t = = \frac{| | {C C}_{i i,, j j} | |}{| | {C C}_{i i} | | + + | | {C C}_{j j} | | - - | | {C C}_{i i,, j j} | |} \cdot &Center Dot; l l o o g g \frac{| | {E E.}_{C C} | |}{d d (({v v}_{i i}))} \cdot &Center Dot; l l o o g g \frac{| | {E E.}_{C C} | |}{d d (({v v}_{j j}))}$

其中，C_i表示对象i隶属的簇集合，C_j表示对象j隶属的簇集合，C_i,j为对象i和j共享的集合，|*|为集合*中成员个数，d(*)表示顶点*的度。Among them, C _i represents the cluster set to which object i belongs, C _j represents the cluster set to which object j belongs, C _{i, j} is the set shared by object i and j, |*| is the number of members in the set *, and d(*) represents the degree of the vertex *.

作为本发明所述的在推荐系统中利用潜在邻居关系图筛选最近邻居的方法进一步优化方案，所述步骤5中近邻选择条件是指选取与目标的相似度最大的前k个对象构成目标的近邻集。As a further optimization scheme of the method for screening the nearest neighbors in the recommendation system using a potential neighbor relationship graph according to the present invention, the neighbor selection condition in the step 5 refers to selecting the first k objects with the largest similarity with the target to form the neighbors of the target set.

作为本发明所述的在推荐系统中利用潜在邻居关系图筛选最近邻居的方法进一步优化方案，K≥1。As a further optimization scheme of the method for screening the nearest neighbors by using the potential neighbor relationship graph in the recommendation system according to the present invention, K≥1.

作为本发明所述的在推荐系统中利用潜在邻居关系图筛选最近邻居的方法进一步优化方案，边e_i,j的权重与关系图G_C中边e_i,j所邻接的对象i和j共享的簇有关。As a further optimization scheme of the method for screening the nearest neighbors using the potential neighbor relationship graph in the recommendation system according to the present invention _, the weight of the edge e _{i,j is shared with the objects i and j adjacent to the edge e i,j} in the relationship graph G _C clusters are related.

本发明采用以上技术方案与现有技术相比，具有以下技术效果：Compared with the prior art, the present invention adopts the above technical scheme and has the following technical effects:

在保障推荐精度前提下，将基于完整大规模数据集的推荐映射到一个规模较小的数据集的推荐，并显著降低推荐系统规模，保证推荐方法的高效率。具体表现在以下方面：Under the premise of ensuring the recommendation accuracy, the recommendation based on the complete large-scale data set is mapped to the recommendation of a smaller data set, and the scale of the recommendation system is significantly reduced to ensure the high efficiency of the recommendation method. Specifically in the following aspects:

(1)采用模糊聚类技术将对象按一定概率指派到多个簇中，这种冗余特性提供了一种不可缺少的、可靠的、避免错过最近邻居的方法；(1) Using fuzzy clustering technology to assign objects to multiple clusters with a certain probability, this redundancy feature provides an indispensable and reliable method to avoid missing the nearest neighbor;

(2)在聚类过程中，将相似性高的对象指派到同一个簇中，排除个别离群对象。这不仅有益于提高推荐精度，同时也减少了离群对象带来的多余比较，改善推荐效率；(2) During the clustering process, assign objects with high similarity to the same cluster and exclude individual outlier objects. This is not only beneficial to improve the recommendation accuracy, but also reduces redundant comparisons caused by outliers and improves recommendation efficiency;

(3)在潜在邻居关系图的构建过程中，每对共现对至多对应图中一条边，每条边表示了一种潜在的邻居关系，后续的步骤只对每条边的邻接对象进行比较，计算它们的相似度；因此，剔除了冗余和多余的共现对，减少了对象集中两两比较的次数，改善了推荐的效率；(3) During the construction of the potential neighbor relationship graph, each pair of co-occurrence pairs corresponds to at most one edge in the graph, and each edge represents a potential neighbor relationship, and the subsequent steps only compare the adjacent objects of each edge , to calculate their similarity; therefore, redundant and redundant co-occurrence pairs are eliminated, the number of pairwise comparisons in the object set is reduced, and the efficiency of recommendation is improved;

(4)通过设置恰当的边的权重阈值，剔除潜在邻居加权关系图中低于权重的边，进一步消除多余的比较。(4) By setting an appropriate edge weight threshold, the edges below the weight in the weighted relationship graph of potential neighbors are eliminated to further eliminate redundant comparisons.

附图说明Description of drawings

图1是本发明的处理流程。Fig. 1 is the processing flow of the present invention.

具体实施方式detailed description

下面结合附图对本发明的技术方案做进一步的详细说明：Below in conjunction with accompanying drawing, technical scheme of the present invention is described in further detail:

如图1是本发明的处理流程，如下步骤：步骤(1)、产生具有冗余特性的对象簇集合C；步骤(2)、构建簇集合C对应的潜在邻居关系图；步骤(3)、量化潜在邻居关系图中每条边权重，边的权重表示边所邻接的两个对象成为最近邻居的可能性；步骤(4)、对潜在邻居关系进行裁剪，剔除多余的比较；步骤(5)、利用裁剪后的潜在邻居关系图筛选目标的最近邻居。Fig. 1 is the processing flow of the present invention, following steps: step (1), produce the object cluster set C with redundant characteristic; Step (2), construct the potential neighbor relationship graph corresponding to cluster set C; Step (3), Quantify the weight of each edge in the potential neighbor relationship graph, and the weight of the edge indicates the possibility that the two objects adjacent to the edge become the nearest neighbor; step (4), trim the potential neighbor relationship, and eliminate redundant comparisons; step (5) , Using the cropped potential neighbor graph to filter the nearest neighbors of the target.

【实施例1】【Example 1】

在一个电影推荐系统中，已知用户集U，项目集I，每个项目即为一部电影；F_i表示项目i的特征向量，用于描述电影流派、获奖类型、演员和导演等；Rⁱ表示项目i∈I的效用向量，r_u,i∈Rⁱ是用户u∈U对项目i的评分。推荐系统利用潜在邻居关系图筛选目标项目i的最近邻居，然后利用目标用户u对项目i所有最近邻居的评分预测用户u对项目i的评分。在本实施例1中，对象集合为项目集合，筛选项目的最近邻居的具体过程如下：In a movie recommendation system, the user set U and the item set I are known, and each item is a movie; F _i represents the feature vector of item i, which is used to describe the movie genre, award-winning type, actor and director, etc.; R ⁱ represents the utility vector of item i ∈ I, r _u,i ∈ R ⁱ is the rating of item i by user u ∈ U. The recommendation system uses the potential neighbor relation graph to screen the nearest neighbors of the target item i, and then uses the ratings of the target user u on all the nearest neighbors of item i to predict the rating of user u on item i. In the present embodiment 1, the object set is an item set, and the specific process of screening the nearest neighbors of the items is as follows:

(1)产生簇集合。利用模糊聚类技术依据项目特征将项目i∈I按一定概率指派到多个簇中，由此产生项目簇集合C。(1) Generate a cluster set. Use the fuzzy clustering technique to assign the item i∈I to multiple clusters with a certain probability according to the item characteristics, thereby generating the item cluster set C.

(2)构建簇集合C对应的潜在邻居关系图G_C＝{V_C,E_C}，V_C是顶点集合，E_C是无向边集合。图G_C具体构建方法如下：对于C中每对共现对＜i,j＞，先将项目i∈I和j∈I对应的顶点v_i和v_j加入到图G_C中，然后用边e_i,j∈E_C连接顶点v_i和v_j。在创建关系图的过程中，若发现两个项目之间已经存在一条无向边，则无需再在它们之间增加一条边。图中每条边表示一潜在的邻居关系，需在后续步骤中通过相似度计算，确定是否为最近邻居。(2) Construct the potential neighbor relationship graph G _C corresponding to the cluster set C = {V _C , E _C }, where V _C is a set of vertices, and E _C is a set of undirected edges. The specific construction method of graph G _C is as follows: For each co-occurrence pair <i, j> in C, first add the vertices v _i and v _j corresponding to items i∈I and j∈I to graph G _C , and then use the edge e _i,j ∈ _{E C} connects vertices v _i and v _j . In the process of creating a relationship graph, if it is found that there is already an undirected edge between two items, there is no need to add an edge between them. Each edge in the graph represents a potential neighbor relationship, which needs to be calculated in a subsequent step to determine whether it is the nearest neighbor.

(3)量化图G_C每条边权重。关系图G_C尘埃落定后，为进一步筛选近邻关系，需量化图中每条边。权重计算方法如下：已知分别表示项目i、j隶属的簇集合，C_i,j＝C_i∩C_j称为项目i和j共享的集合。当项目i和j共享的簇越多(即|C_i,j|值越大)，它们成为彼此最近邻居的可能性越大。除了考虑|C_i,j|，边权重还需考虑其邻接项目所隶属的簇的总数，以及边的邻接顶点的度。当某个边的邻接项目隶属的簇越少，该边的权重应越高；当某个边的顶点的度越小，该边的权重应越高。用下式计算边e_i,j∈E_C的权重(3) Quantify the weight of each edge of graph G _C. After the dust settles on the relationship graph G and _C , in order to further screen the neighbor relationship, it is necessary to quantify each edge in the graph. The weight calculation method is as follows: Known represent the cluster sets to which item i and j belong respectively, and C _i,j = C _i ∩ C _j is called the set shared by item i and j. When items i and j share more clusters (i.e., larger values of | _Ci,j |), the more likely they are to be each other's nearest neighbors. In addition to considering |C _i,j |, edge weights also need to consider the total number of clusters to which its adjacent items belong, as well as the degree of the adjacent vertices of the edge. When the adjacent items of a certain edge belong to fewer clusters, the weight of the edge should be higher; when the degree of the vertices of a certain edge is smaller, the weight of the edge should be higher. Use the following formula to calculate the weight of edge e _i,j ∈ _{E C}

其中，d(*)表示顶点*的度。where d(*) denotes the degree of vertex *.

(4)裁剪多余比较。依据设置的最小权重阈值w_min删除图G_C中边的权重低于w_min的边，余下边构成了裁剪后的图G_C'。(4) Cut redundant comparisons. According to the set minimum weight threshold w _min , delete the edges whose weight is lower than w _min in the graph G _C , and the remaining edges constitute the pruned graph G _C '.

(5)选取项目i作为目标，筛选目标i的最近邻居。针对G_C'图中目标i的每条邻接边e_i,j，比较目标i与j在用户-项目效用矩阵中项目i的效用向量Rⁱ和项目j的效用向量R^j之间的相似性，计算它们之间的相似度；然后依据近邻选择条件在目标i的所有邻接项目中筛选其最近邻居，在本实例1中，选取与目标项目的相似度最大的前k个项目构成目标的近邻集。最后利用目标用户u对项目近邻集中各近邻的评分预测目标用户u对目标项目的评分。(5) Select item i as the target, and filter the nearest neighbors of target i. For each adjacent edge e _i,j of target i in the G _C' graph, compare the similarity between target i and j in the user-item utility matrix between the utility vector R ⁱ of item i and the utility vector R ^j of item j , calculate the similarity between them; then select its nearest neighbors among all adjacent items of the target i according to the neighbor selection criteria, in this example 1, select the top k items with the largest similarity with the target item to form the target’s neighbors set. Finally, the ratings of the target user u on the target item are predicted by the ratings of the target user u on each neighbor in the item's neighbor set.

【实施例2】[Example 2]

在一个电影推荐系统中，已知用户集U，项目集I，每个项目即为一部电影；F_q表示用户q的特征向量，用于描述用户的性别、年龄、职业等；R_q表示用户q∈U的效用向量，r_q,m∈R_u是用户q对项目m∈I的评分。推荐系统利用潜在邻居关系图筛选目标用户q的最近邻居，然后利用目标用户q的所有最近邻居对项目m的评分预测用户q对项目m的评分。在本实施例2中，对象集合为用户集合，筛选用户的最近邻居的具体过程如下：In a movie recommendation system, the user set U and item set I are known, and each item is a movie; F _q represents the feature vector of user q, which is used to describe the user’s gender, age, occupation, etc.; R _q represents The utility vector of user q ∈ U, r _q,m ∈ R _u is the rating of user q on item m ∈ I. The recommendation system uses the potential neighbor relation graph to screen the nearest neighbors of the target user q, and then uses the ratings of all the nearest neighbors of the target user q to the item m to predict the rating of the user q on the item m. In Embodiment 2, the object set is a user set, and the specific process of screening the nearest neighbors of the users is as follows:

(1)产生簇集合。利用模糊聚类技术依据用户特征将用户u∈U按一定概率指派到多个簇中，由此产生用户簇集合C。(1) Generate a cluster set. Use fuzzy clustering technology to assign user u∈U to multiple clusters with a certain probability according to user characteristics, thereby generating user cluster set C.

(2)构建簇集合C对应的潜在邻居关系图G_C＝{V_C,E_C}，V_C是顶点集合，E_C是无向边集合。图G_C具体构建方法如下：对于C中每对共现对＜q,h＞，先将用户q∈U和h∈U对应的顶点v_q和_vh加入到图G_C中，然后用边e_q,h∈E_C连接顶点v_q和_vh。在创建关系图的过程中，若发现两个用户之间已经存在一条无向边，则无需再在它们之间增加一条边。图中每条边表示一潜在的邻居关系，需在后续步骤中通过相似度计算，确定是否为最近邻居。(2) Construct the potential neighbor relationship graph G _C corresponding to the cluster set C = {V _C , E _C }, where V _C is a set of vertices, and E _C is a set of undirected edges. The specific construction method of graph G _C is as follows: For each co-occurrence pair <q, h> in C, first add the vertices v _q and _vh corresponding to users q∈U and h∈U to graph G _C , and then use the edge e _q,h ∈ _{E C} connects vertices v _q and _vh . In the process of creating a relationship graph, if it is found that there is already an undirected edge between two users, there is no need to add an edge between them. Each edge in the graph represents a potential neighbor relationship, which needs to be calculated in a subsequent step to determine whether it is the nearest neighbor.

(3)量化图G_C每条边权重。关系图G_C尘埃落定后，为进一步筛选近邻关系，需量化图中每条边。权重计算方法如下：已知分别表示用户q、h隶属的簇集合，C_q,h＝C_q∩C_h称为用户q和h共享的集合。当用户q和h共享的簇越多(即|C_q,h|值越大)，它们成为彼此最近邻居的可能性越大。除了考虑|C_q,h|，边权重还需考虑其邻接用户所隶属的簇的总数，以及边的邻接顶点的度。当某个边的邻接用户隶属的簇越少，该边的权重应越高；当某个边的顶点的度越小，该边的权重应越高。因此，可用下式计算边e_q,h的权重：(3) Quantify the weight of each edge of graph G _C. After the dust settles on the relationship graph G and _C , in order to further screen the neighbor relationship, it is necessary to quantify each edge in the graph. The weight calculation method is as follows: Known Denote the cluster sets to which users q and h belong respectively, and C _q,h = C _q ∩C _h is called the set shared by users q and h. When users q and h share more clusters (i.e., larger values of |C _q,h |), the more likely they are to be each other's nearest neighbors. In addition to considering |C _q,h |, the edge weight also needs to consider the total number of clusters its adjacent users belong to, and the degree of the adjacent vertices of the edge. When the adjacent users of a certain edge belong to fewer clusters, the weight of the edge should be higher; when the degree of the vertices of a certain edge is smaller, the weight of the edge should be higher. Therefore, the weight of the edge e _q,h can be calculated by the following formula:

$w w e e i i g g h h t t ((q q,, h h)) = = \frac{| | {C C}_{q q,, h h} | |}{| | {C C}_{q q} | | + + | | {C C}_{h h} | | - - | | {C C}_{q q,, h h} | |} \cdot \cdot log log \frac{| | {E E.}_{C C} | |}{d d (({v v}_{q q}))} \cdot \cdot log log \frac{| | {E E.}_{C C} | |}{d d (({v v}_{h h}))},,$

其中，d(*)表示顶点*的度。where d(*) denotes the degree of vertex *.

(5)选取用户q作为目标，筛选目标用户q的最近邻居。针对G_C'图中目标q的每条邻接边e_q,h，比较目标q与h的在用户-项目效用矩阵中用户q的效用向量R_q和用户h的效用向量R_h之间的相似性，计算它们之间的相似度；然后依据近邻选择条件在目标q的所有邻接用户中筛选其最近邻居，在本实例2中，选取与目标用户q的相似度最大的前k个用户构成目标的近邻集。最后利用目标用户q的近邻集中各近邻对项目的评分预测目标用户q对目标项目的评分。(5) Select user q as the target, and filter the nearest neighbors of the target user q. For each adjacent edge e _q,h of the target q in G _C 'graph, compare the similarity between the utility vector R q of user _q and the utility vector R _h of user h in the user-item utility matrix of target q and h and calculate the similarity between them; then select the nearest neighbors among all adjacent users of the target q according to the neighbor selection criteria. In this example 2, select the top k users with the largest similarity with the target user q to form the target set of neighbors. Finally, use the ratings of each neighbor in the neighbor set of the target user q to predict the rating of the target user q on the target item.

以上所述的具体实施方案，对本发明的目的、技术方案和有益效果进行了进一步的详细说明，所应理解的是，以上所述仅为本发明的具体实施方案而已，并非用以限定本发明的范围，任何本领域的技术人员，在不脱离本发明的构思和原则的前提下所做出的等同变化与修改，均应属于本发明保护的范围。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention, and are not intended to limit the present invention. Any equivalent changes and modifications made by those skilled in the art without departing from the concepts and principles of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method for screening nearest neighbors using a potential neighbor relationship graph in a recommendation system, characterized in that, comprising the following steps:

Step 1. Let i∈O, O is the object set that needs to filter the nearest neighbors, i is the object, use the fuzzy clustering technology to assign the object i to multiple clusters according to the preset probability according to the feature vector of the object, by This produces a cluster set C containing K object clusters;

Step 2. Construct the potential neighbor relationship graph G _C ={V _C , E _C } corresponding to the cluster set C, where V _C is the set of vertices, and E _C is the set of undirected edges; the details are as follows:

If object i and object j appear in the same cluster c of cluster set C at the same time, then object i and j are called co-occurrence pairs and recorded as <i, j>; for each pair of co-occurrence pairs in cluster set C <i ,j>, first add the vertices v _i and v _j corresponding to the objects i and j to the graph G _C , if there is no undirected edge between the two objects i and j, use the edge e _i,j to connect the vertex v _i and v _j ; where, each edge e _i,j in graph G _C represents a potential neighbor relationship, and the objects i and _j corresponding to the two vertices v _i and v j adjacent to edge e _i,j are called adjacent objects , j∈O, c∈C, v _i ∈ V _C , v _j ∈ V _C , e _i,j ∈ _{E C} ;

Step 3, quantify the weight of each edge in the graph G _C ;

Step 4. Cutting the graph G _C , deleting the edges whose weights are lower than w _min in the potential neighbor graph G _C , and the remaining edges constitute a new graph G _C' ; where w _min is the minimum weight threshold set;

Step 5. Select the object i as the target, use the cropped potential neighbor relationship graph G _C' to screen the nearest neighbors of the target, and compare and calculate the utility vector R for each adjacent edge e _i,j of the target i in the G _C' graph The similarity between ⁱ and R ^j , and then select its nearest neighbors among all adjacent objects of target i according to the neighbor selection criteria; where R ⁱ represents the utility vector of object i, and R ^j represents the utility vector of object j.

2. The method according to claim 1 in which a potential neighbor relation graph is used to screen nearest neighbors in a recommendation system, wherein, in said step 3, the following formula is used to calculate the weight e _i,j .weight of edge e _i, j :

{e e}_{i i,, j j} . . w w e e i i g g h h t t = = \frac{| | {C C}_{i i,, j j} | |}{| | {C C}_{i i} | | + + | | {C C}_{j j} | | - - | | {C C}_{i i,, j j} | |} \cdot \cdot log log \frac{| | {E E.}_{C C} | |}{d d (({v v}_{i i}))} \cdot &Center Dot; log log \frac{| | {E E.}_{C C} | |}{d d (({v v}_{j j}))}

Among them, C _i represents the cluster set to which object i belongs, C _j represents the cluster set to which object j belongs, C _{i, j} is the set shared by object i and j, C _i,j ＝C _i ∩C _j , |*| is the number of members in the set *, d(*) represents the degree of the vertex *.

3. the method according to claim 1 utilizing the potential neighbor relationship graph to screen the nearest neighbors in the recommendation system, characterized in that the neighbor selection condition in the step 5 refers to selecting the top k objects with the largest similarity with the target The set of neighbors that make up the target.

4. The method for screening nearest neighbors using a potential neighbor relationship graph in a recommender system according to claim 1, wherein K≥1.

5. The method according to claim 1 in which a potential neighbor relationship graph is used to screen nearest neighbors in a recommendation system, wherein the weight of edge e _i _{, j is related to the object adjacent to edge e i, j} in the relationship graph G _C The clusters shared by i and j are related.