CN103049488B - A kind of collaborative filtering disposal route and system - Google Patents
A kind of collaborative filtering disposal route and system Download PDFInfo
- Publication number
- CN103049488B CN103049488B CN201210518155.5A CN201210518155A CN103049488B CN 103049488 B CN103049488 B CN 103049488B CN 201210518155 A CN201210518155 A CN 201210518155A CN 103049488 B CN103049488 B CN 103049488B
- Authority
- CN
- China
- Prior art keywords
- matrix
- calculation server
- component
- row
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 97
- 239000011159 matrix material Substances 0.000 claims abstract description 655
- 238000004364 calculation method Methods 0.000 claims abstract description 461
- 238000000034 method Methods 0.000 claims abstract description 66
- 230000008569 process Effects 0.000 claims abstract description 39
- 230000017105 transposition Effects 0.000 claims description 35
- 238000013138 pruning Methods 0.000 claims description 26
- 230000000717 retained effect Effects 0.000 claims description 2
- 238000000638 solvent extraction Methods 0.000 claims 18
- 230000005540 biological transmission Effects 0.000 claims 5
- 230000001419 dependent effect Effects 0.000 abstract description 89
- 238000003672 processing method Methods 0.000 abstract description 12
- 238000012790 confirmation Methods 0.000 description 25
- 239000013598 vector Substances 0.000 description 16
- 238000012545 processing Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 206010029897 Obsessive thoughts Diseases 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种协同过滤处理方法和系统,涉及计算机技术领域。所述方法包括:接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程,所述过程包括:每个计算服务器获取包括主体-项目权重矩阵、协同过滤矩阵的各种分块矩阵数据;每个计算服务器确认当前计算服务器依赖的各计算服务器及其中的各分块矩阵数据的分量;每个计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;每个计算服务器针对本地的每个主体,将至少一个项目推荐给所述主体。本发明可快速针对庞大的矩阵数据计算推荐项目,并且可降低计算系统对硬件的要求,可从整体上降低硬件成本。
The invention discloses a collaborative filtering processing method and system, and relates to the technical field of computers. The method includes: receiving a request for recommending at least one item for a subject, and starting at least two computing servers to perform a recommendation computing process according to the request, and the process includes: each computing server obtains a subject-item weight matrix, a collaborative filtering matrix various block matrix data; each calculation server confirms each calculation server that the current calculation server depends on and the component of each block matrix data; each calculation server sends the dependent component to each calculation server that depends on the component The server; and receiving the components sent by each calculation server; each calculation server recommends at least one item to each local subject to the subject. The present invention can quickly calculate recommended items for huge matrix data, and can reduce the hardware requirements of the computing system, thereby reducing the hardware cost as a whole.
Description
技术领域 technical field
本发明涉及计算机技术领域,具体涉及一种协同过滤处理方法和系统。The invention relates to the field of computer technology, in particular to a collaborative filtering processing method and system.
背景技术 Background technique
互联网上的信息资源呈指数膨胀带来了所谓的“信息过载”和“信息迷向”问题,即人们很难找到自己感兴趣的信息,即使找到了一些,也经常混有很多“噪音”。因此出现了面向互联网的信息检索、信息过滤和协同过滤等技术。但信息检索不具有智能性,不能学习用户的兴趣,尤其是对具有特定专业兴趣的用户,输入相同的关键词只能得到相同的检索结果。信息过滤不能对同一主题的过滤结果区分质量好坏,并且随着信息资源的剧增,更有效的过滤需要结合人们的质量评价信息。基于该种需求,出现了推荐系统,推荐系统是为解决信息过载问题而提出的一种智能代理系统,能从大量信息中向用户自动推荐出符合其兴趣偏好或需求的资源。随着互联网的普及和飞速发展,推荐系统已经被广泛应用于各种领域,尤其在电子商务领域,推荐系统得到了越来越多的研究和应用。目前,几乎所有的大型电子商务网站都不同程度的使用了各种形式的推荐系统,比如Amazon、CDNOW、eBay和当当网上书店等。其中,协同过滤技术在当前推荐系统的应用中获得了较大的成功。The exponential expansion of information resources on the Internet has brought the so-called "information overload" and "information obsession" problems, that is, it is difficult for people to find the information they are interested in, and even if they find some, it is often mixed with a lot of "noise". Therefore, technologies such as Internet-oriented information retrieval, information filtering and collaborative filtering have emerged. However, information retrieval is not intelligent and cannot learn the interests of users, especially for users with specific professional interests, inputting the same keywords can only get the same retrieval results. Information filtering cannot distinguish the quality of the filtering results of the same topic, and with the rapid increase of information resources, more effective filtering needs to combine people's quality evaluation information. Based on this demand, a recommendation system has emerged. The recommendation system is an intelligent agent system proposed to solve the problem of information overload, which can automatically recommend resources that meet their interests, preferences or needs from a large amount of information to users. With the popularity and rapid development of the Internet, recommender systems have been widely used in various fields, especially in the field of e-commerce, recommender systems have been more and more researched and applied. At present, almost all large e-commerce websites use various forms of recommendation systems to varying degrees, such as Amazon, CDNOW, eBay and Dangdang online bookstores. Among them, collaborative filtering technology has achieved greater success in the current application of recommendation systems.
协同过滤是一类推荐算法。协同过滤考虑了用户的评价信息。协同过滤分析用户兴趣,在用户群中找到指定用户的相似(兴趣)用户,综合这些相似用户对某一信息的评价,形成系统对该指定用户对此信息的喜好程度的预测。主要分为基于用户的协同过滤(Userbased)和基于商品的(Itembased)两种。例如基于用户的协同过滤算法,它的基本思想是:如果要为一个用户推荐其没有的商品,首先计算其它用户和这个用户的喜好的相似程度,然后从与其喜好最相似的若干用户中找到这个用户没有的商品进行推荐。基于项目的协同过滤原理类似。Collaborative filtering is a class of recommendation algorithms. Collaborative filtering takes into account user evaluation information. Collaborative filtering analyzes user interests, finds similar (interested) users of the specified user in the user group, and synthesizes the evaluation of a certain information by these similar users to form a prediction of the system's preference for the specified user. It is mainly divided into two types: user-based collaborative filtering (Userbased) and product-based (Itembased). For example, based on the user-based collaborative filtering algorithm, its basic idea is: if you want to recommend a product that a user does not have, first calculate the similarity between other users and this user's preferences, and then find this from several users who are most similar to their preferences. Recommend items that the user does not have. Item-based collaborative filtering works similarly.
在互联网环境下,用户和商品都是海量数据,同时由于实效性对算法有一定的时间要求,而现有技术中对于协同过滤的计算基本上是在一个计算服务器上进行,当遭遇超大量的用户,比如百万数量级的时候,其各种矩阵的数据也非常庞大,采用单服务器计算则时效太慢,并且对于硬件的要求比较高。In the Internet environment, both users and products are massive data, and at the same time due to the effectiveness of the algorithm, there is a certain time requirement. In the existing technology, the calculation of collaborative filtering is basically performed on a computing server. When encountering a large number of For example, when the number of users is on the order of millions, the data of their various matrices is also very large. Using a single server to calculate is too slow and requires relatively high hardware requirements.
发明内容 Contents of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的一种协同过滤处理系统和相应的一种协同过滤处理方法。In view of the above problems, the present invention is proposed to provide a collaborative filtering processing system and a corresponding collaborative filtering processing method that overcome the above problems or at least partially solve the above problems.
依据本发明的一个方面,提供了一种协同过滤处理方法,包括:According to one aspect of the present invention, a collaborative filtering processing method is provided, including:
接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程,所述过程包括:A request for recommending at least one item for the subject is received, and at least two computing servers are started according to the request to perform a recommendation computing process, the process including:
针对至少两个计算服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;For at least two calculation servers, each calculation server participating in the calculation obtains various block matrix data; the block matrix data includes block matrix data of the subject-item weight matrix and block matrix data of the collaborative filtering matrix;
每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;Each calculation server participating in the calculation confirms each server that the current calculation server depends on according to the subject-item weight matrix and the collaborative filtering matrix, and the components of each block matrix data in each dependent calculation server;
每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;Each calculation server participating in the calculation sends the dependent component to each calculation server dependent on the component; and receives the components sent by each calculation server;
每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。For each subject in the block matrix data of the local subject-item weight matrix, each computing server participating in the calculation uses the block matrix data of the local subject-item weight matrix and the block matrix data of the local collaborative filtering matrix and the corresponding relationship between the received components, recommending at least one item to the subject.
依据本发明的另外一个方面,提供了一种协同过滤处理系统,包括:According to another aspect of the present invention, a collaborative filtering processing system is provided, including:
请求接收模块和至少两个计算服务器;a request receiving module and at least two computing servers;
所述请求接收模块,适于接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器;The request receiving module is adapted to receive a request for the subject to recommend at least one item, and start at least two computing servers according to the request;
至少两个计算服务器中每个所述计算服务器包括:Each of the at least two computing servers includes:
分块矩阵数据获取模块,适于每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;The block matrix data acquisition module is suitable for each computing server participating in the calculation to obtain various block matrix data; the block matrix data includes the block matrix data of the subject-item weight matrix and the block matrix of the collaborative filtering matrix data;
依赖计算服务器确认模块,适于每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;Dependent calculation server confirmation module, suitable for each calculation server participating in the calculation to confirm each server that the current calculation server depends on and each block in each dependent calculation server according to the subject-item weight matrix and the collaborative filtering matrix the components of the matrix data;
分量发送接收模块,适于每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;The component sending and receiving module is suitable for each computing server participating in the calculation to send the dependent component to each computing server dependent on the component; and receive the component sent by each computing server;
推荐项目计算模块,适于每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。The recommended item calculation module is suitable for each calculation server participating in the calculation, for each subject in the block matrix data of the local subject-item weight matrix, using the block matrix data of the local subject-item weight matrix and the local collaborative Correspondences between the block matrix data of the filter matrix and the received components recommend at least one item to the subject.
根据本发明的一种协同过滤处理方法可以利用多个计算节点并行的进行协同过滤的推荐计算,由此解决了现有技术针对庞大的矩阵数据计算时效慢,并且对于硬件的要求比较高的问题,高取得了针对需要快速计算推荐项目的需求,可快速针对庞大的矩阵数据计算推荐项目,并且可降低计算系统对硬件的要求,可从整体上降低硬件成本的有益效果。According to a collaborative filtering processing method of the present invention, multiple computing nodes can be used to perform collaborative filtering recommendation calculations in parallel, thereby solving the problems of slow calculation timeliness for huge matrix data and relatively high hardware requirements in the prior art , Highly achieved the beneficial effect of quickly calculating recommended items for the need for fast calculation of recommended items, can quickly calculate recommended items for huge matrix data, and can reduce the hardware requirements of the computing system, which can reduce hardware costs as a whole.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.
附图说明 Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:
图1示出了根据本发明一个实施例的一种协同过滤处理方法实施例一的流程示意图;FIG. 1 shows a schematic flowchart of Embodiment 1 of a collaborative filtering processing method according to an embodiment of the present invention;
图2示出了根据本发明一个实施例的一种协同过滤处理方法实施例二的流程示意图;Fig. 2 shows a schematic flowchart of Embodiment 2 of a collaborative filtering processing method according to an embodiment of the present invention;
图3示出了根据本发明一个实施例的Userbased计算原理的矩阵对照示意图;FIG. 3 shows a schematic diagram of a matrix comparison of the Userbased calculation principle according to an embodiment of the present invention;
图4示出了根据本发明一个实施例的Userbased计算原理的两个计算服务器矩阵对照示例;Fig. 4 shows two computing server matrix comparison examples of the Userbased computing principle according to an embodiment of the present invention;
图5示出了根据本发明一个实施例的一种协同过滤处理方法实施例三的流程示意图;Fig. 5 shows a schematic flowchart of Embodiment 3 of a collaborative filtering processing method according to an embodiment of the present invention;
图6示出了根据本发明一个实施例的Itembased的计算原理示意图;FIG. 6 shows a schematic diagram of the calculation principle of Itembased according to an embodiment of the present invention;
图7示出了根据本发明一个实施例的一种协同过滤处理系统实施例一的结构示意图;Fig. 7 shows a schematic structural diagram of Embodiment 1 of a collaborative filtering processing system according to an embodiment of the present invention;
图8示出了根据本发明一个实施例的一种协同过滤处理系统实施例二的结构示意图;以及FIG. 8 shows a schematic structural diagram of Embodiment 2 of a collaborative filtering processing system according to an embodiment of the present invention; and
图9示出了根据本发明一个实施例的一种协同过滤处理系统实施例三的结构示意图。Fig. 9 shows a schematic structural diagram of Embodiment 3 of a collaborative filtering processing system according to an embodiment of the present invention.
具体实施方式 Detailed ways
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
参照图1,其示出了本发明一种协同过滤处理方法实施例一的流程示意图,具体可以包括:Referring to FIG. 1 , it shows a schematic flowchart of Embodiment 1 of a collaborative filtering processing method according to the present invention, which may specifically include:
步骤100,接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程;Step 100, receiving a request for recommending at least one item from a subject, and starting at least two computing servers to perform a recommendation computing process according to the request;
在本发明实施例中,所述主体可比如网络中的用户标识,那么对于用户标识在网络中使用过的或者未使用过的各种项目,系统或者用户则可请求针对各个用户标识推荐某个或者某几个项目,比如针对在网络中购买的产品,向用户推荐相关产品。In the embodiment of the present invention, the subject may be a user ID in the network, then for various items that the user ID has used or not used in the network, the system or the user may request to recommend a certain item for each user ID Or certain projects, such as recommending related products to users for products purchased on the Internet.
那么所述至少两个计算服务器进行推荐计算过程包括:Then, the at least two calculation servers performing the recommendation calculation process include:
步骤110,针对至少两个计算服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;Step 110, for at least two calculation servers, each calculation server participating in the calculation obtains various block matrix data; the block matrix data includes the block matrix data of the subject-item weight matrix, and the block matrix data of the collaborative filtering matrix. matrix data;
在本发明中,对于所述主体-项目权重矩阵,可为用户标识-项目的评分矩阵,比如表一:In the present invention, for the subject-item weight matrix, it may be a user identification-item scoring matrix, such as Table 1:
表一Table I
表一中,若以产品类目为Item,用户标识为File1、File2、File3,那么对应File1的用户对其使用过的Item1至Item4分别打分为70、60、80、90,Item5未打分;对应File2的用户对其使用过的Item1至Item3分别打分为40、90、50,Item4未打分、Item5为70;对应File3的用户对其使用过的Item2至Item4分别打分为70、80、80,Item1和Item5未打分;In Table 1, if the product category is Item, and the user ID is File1, File2, and File3, then the user corresponding to File1 will score 70, 60, 80, and 90 for Item1 to Item4 used by the user, and Item5 will not be scored; The users of File2 scored 40, 90, and 50 for Item1 to Item3 they had used, Item4 was not scored, and Item5 was 70; users corresponding to File3 scored 70, 80, and 80 for Item2 to Item4 they used, and Item1 and Item5 not scored;
协同过滤矩阵为表一矩阵中用户对于的用户相似度矩阵,比如计算Sim(File1,File2),File1=(70、60、80、90、0),File2=(40、90、50、0、70),sim计算可为计算两个向量的余弦夹角值,也可为其他函数,即为每两行的分量之间的相似度组成的矩阵如表二,或者项目对应的项目相似度矩阵,即每两列之间的相似度。The collaborative filtering matrix is the user similarity matrix for the user in the matrix of Table 1, such as calculating Sim(File1, File2), File1=(70, 60, 80, 90, 0), File2=(40, 90, 50, 0, 70), the sim calculation can be the calculation of the cosine angle value of two vectors, or other functions, that is, a matrix composed of the similarity between the components of each two rows as shown in Table 2, or the item similarity matrix corresponding to the item , that is, the similarity between every two columns.
表二Table II
那么在计算针对用户计算推荐项目时,即可利用表一和表二的矩阵进行计算。Then, when calculating recommended items for users, the matrix in Table 1 and Table 2 can be used for calculation.
在本发明实施例中,针对并行计算系统的N个计算服务器,每个服务器则可获取分配给自己的各种分块矩阵数据,比如括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据。In the embodiment of the present invention, for the N computing servers of the parallel computing system, each server can obtain various block matrix data assigned to itself, such as block matrix data including subject-item weight matrix, collaborative filtering matrix The block matrix data for .
在本发明实施例中,矩阵数据可将类似表一的矩阵按行分块发送给个计算服务器,也可将类似表一的矩阵进行转置按列分块发送给计算服务器。In the embodiment of the present invention, the matrix data can send a matrix similar to Table 1 to a calculation server by row blocks, or transpose a matrix similar to Table 1 and send it to a calculation server by column blocks.
在本发明实施例的该步骤之前,还包括:Before this step of the embodiment of the present invention, it also includes:
各计算服务器利用所述主体-项目权重矩阵计算事实协同过滤矩阵。Each calculation server calculates a fact collaborative filtering matrix using the subject-item weight matrix.
本发明实施例中,参与计算的计算服务器可包括N个,N大于等于2。In the embodiment of the present invention, there may be N computing servers participating in computing, and N is greater than or equal to 2.
步骤120,每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;Step 120, each calculation server participating in the calculation confirms each server that the current calculation server depends on according to the subject-item weight matrix and the collaborative filtering matrix, and the components of each block matrix data in each dependent calculation server;
然后每个参与计算的计算服务器则基于主体-项目权重矩阵、协同过滤矩阵确认其依赖的某个或者某几个计算服务器中的分块矩阵的分量。Then each computing server participating in the calculation confirms the components of the block matrix in one or several computing servers that it depends on based on the subject-item weight matrix and the collaborative filtering matrix.
可选的,所述每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量,包括:Optionally, each computing server participating in the computing confirms each server that the current computing server depends on according to the subject-item weight matrix and the collaborative filtering matrix, and each block matrix data in the dependent computing servers components, including:
步骤S121,每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;Step S121, each calculation server participating in the calculation obtains the component identification in each block matrix data processed by all other calculation servers;
在本发明实施例中,各计算服务器需要确认其所依赖的计算服务器,那么其就需要知道各矩阵的各行分量的行标识(如果在步骤110获取的按行分块的矩阵数据的情况下),或者就需要知道各矩阵的各列分量的列标识(如果在步骤110获取的按列分块的矩阵数据的情况下)。In the embodiment of the present invention, each calculation server needs to confirm the calculation server it depends on, then it needs to know the row identification of each row component of each matrix (if in the case of the matrix data obtained in step 110 divided by row) , or it is necessary to know the column identification of each column component of each matrix (if the matrix data obtained in step 110 is divided into blocks by column).
因此,在本发明中,可以由存储源矩阵数据的服务器将各处理某一个分块矩阵的计算服务器标识及对应分量标识发送给每个参与计算的计算服务器,也可以由各计算服务器本身将自己处理的分量标识发送给其他所有计算服务器,各计算服务器在发送时可调用MPI(MessagePassingInterface,讯息传递接口;一种消息传递编程接口,同时提供了实现其一系列接口的多语言函数库)进行发送。Therefore, in the present invention, the server that stores the source matrix data can send the calculation server ID and the corresponding component ID that each process a block matrix to each calculation server that participates in the calculation, or each calculation server itself can send its own The processed component identifier is sent to all other computing servers, and each computing server can call MPI (MessagePassingInterface, message passing interface; a message passing programming interface, and provides a multilingual function library that implements a series of interfaces) to send .
步骤S122,每个参与计算的计算服务器根据所述协同过滤距离矩阵中的分块矩阵数据,和/或主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识;Step S122, each calculation server participating in the calculation is based on the block matrix data in the collaborative filtering distance matrix, and/or the block matrix data of the subject-item weight matrix, and the block matrix data processed by all other calculation servers Middle component identification, confirming the calculation servers that the current calculation server depends on, and the component identifications of the block matrix data in the dependent calculation servers;
如果基于userbased的情况,每个参与计算的计算服务器在获得由自身处理的前述用户相似矩阵的分块矩阵数据,和用户标识-项目主体-项目权重矩阵的分块矩阵数据,并获取到其他所有计算服务器处理的各分块矩阵数据中分量标识后,即可根据该3者确认其依赖的某个或者某几个计算服务器中的分量。If it is based on the userbased situation, each computing server participating in the calculation obtains the block matrix data of the aforementioned user similarity matrix processed by itself, and the block matrix data of the user ID-item subject-item weight matrix, and obtains all other After the components in each block matrix data processed by the computing server are identified, the components in one or several computing servers that it depends on can be confirmed according to the three.
如果基于Itembased,每个参与计算的计算服务器在获得由自身处理的前述项目相似矩阵的分块矩阵数据,和用户标识-项目主体-项目权重矩阵的分块矩阵数据,并获取到其他所有计算服务器处理的各分块矩阵数据中分量标识后,即可根据该3者确认其依赖的某个或者某几个计算服务器中的分量。If it is based on Itembased, each computing server participating in the calculation obtains the block matrix data of the aforementioned item similarity matrix processed by itself, and the block matrix data of the user ID-item subject-item weight matrix, and obtains all other computing servers After the components in the processed block matrix data are identified, the components in one or several computing servers that it depends on can be confirmed according to the three.
步骤130,每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;Step 130, each calculation server participating in the calculation sends the dependent component to each calculation server that depends on the component; and receives the components sent by each calculation server;
在每个计算服务器确认其依赖的某个或者某几个计算服务器中的各分块矩阵的分量后,相应依赖的服务器则将其被依赖的分量发送给依赖该分量的各计算服务器。After each computing server confirms the components of each block matrix in one or several computing servers that it depends on, the corresponding dependent server sends its dependent component to each computing server that depends on the component.
可选的,还包括:Optionally, also include:
步骤S131,每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;Step S131, each calculation server participating in the calculation sends the component identification of each block matrix data in each dependent calculation server to each calculation server that the current calculation server depends on;
在每个参与计算的计算服务器确认其依赖的计算服务器及该计算服务器中的分量后,通知其依赖的计算服务器需要将当前计算服务器依赖的向量发送至当前服务器中。即每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器。After each computing server participating in the calculation confirms the computing server it depends on and the components in the computing server, it notifies the computing server it depends on that it needs to send the vectors that the current computing server depends on to the current server. That is, each computing server participating in the calculation sends the component identifiers of the block matrix data in the dependent computing servers to the computing servers that the current computing server depends on.
其中,每个参与计算的计算服务器发送和接收分量标识时也通过调用MPI发送和接收数据。Wherein, each computing server participating in computing also sends and receives data by invoking MPI when sending and receiving component identifiers.
进一步的,每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量包括:Further, each calculation server participating in the calculation sends the dependent component to each calculation server that depends on the component; and receiving the components sent by each calculation server includes:
步骤S132,每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;Step S132, each calculation server participating in the calculation sends the corresponding component to each calculation server dependent on the component according to the dependent component identifier; and receives the components sent by each calculation server;
每个参与计算的计算服务器调用MPI发送被依赖的分量给依赖该分量的计算服务器,并接收其他服务器发送的分量。Each computing server participating in the calculation calls MPI to send the dependent component to the computing server that depends on the component, and receives the components sent by other servers.
步骤160,每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。Step 160, for each subject in the block matrix data of the local subject-item weight matrix, each computing server participating in the calculation utilizes the block matrix data of the local subject-item weight matrix and the block matrix data of the local collaborative filtering matrix. A correspondence between the block matrix data and the received components recommends at least one item to the subject.
每个参与计算的计算服务器接收到需要的分量后,则进行转置、排列、求和等处理获得最终的推荐数据。After each calculation server participating in the calculation receives the required components, it performs transposition, arrangement, summation and other processing to obtain the final recommendation data.
参照图2,其示出了本发明一种协同过滤处理方法实施例二的流程示意图,具体可以包括:Referring to FIG. 2 , it shows a schematic flowchart of Embodiment 2 of a collaborative filtering processing method according to the present invention, which may specifically include:
步骤200,接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程;Step 200, receiving a request for the subject to recommend at least one item, and starting at least two computing servers to perform the recommendation computing process according to the request;
在本发明实施例中,所述主体可比如网络中的用户标识,那么对于用户标识在网络中使用过的或者未使用过的各种项目,系统或者用户则可请求针对各个用户标识推荐某个或者某几个项目,比如针对在网络中购买的产品,向用户推荐相关产品。In the embodiment of the present invention, the subject may be a user ID in the network, then for various items that the user ID has used or not used in the network, the system or the user may request to recommend a certain item for each user ID Or certain projects, such as recommending related products to users for products purchased on the Internet.
那么所述至少两个计算服务器进行推荐计算过程包括:Then, the at least two calculation servers performing the recommendation calculation process include:
步骤210,针对至少两个服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块数据、用户标识相似矩阵的分块矩阵数据;Step 210, for at least two servers, each calculation server participating in the calculation obtains various block matrix data; the block matrix data includes user identification-item subject-item weight matrix and user identification-weight mean value matrix block data, block matrix data of user identity similarity matrix;
本发明实施例针对Userbased的具体并行过程,为了方便说明Userbased的计算过程,首先介绍Userbased的协同过滤推荐计算过程:The embodiment of the present invention is aimed at the specific parallel process of Userbased. In order to facilitate the description of the calculation process of Userbased, the collaborative filtering recommendation calculation process of Userbased is firstly introduced:
参照图3,其为Userbased的计算矩阵对照图。其中201为user对item进行打分的用户标识-项目主体-项目权重矩阵R,202为R的转置矩阵R’,203为user对同一item进行打分的均值向量,即用户标识-权重均值矩阵,也即A,203为A的转置矩阵A’,204为用户标识相似矩阵S,也即每两个用户之间的相似度。那么Referring to Figure 3, it is a comparison diagram of Userbased calculation matrix. Among them, 201 is the user ID-item subject-item weight matrix R in which the user scores the item, 202 is the transposition matrix R' of R, and 203 is the mean vector of the user scoring the same item, that is, the user ID-weight mean matrix, That is, A, 203 is the transposition matrix A' of A, and 204 is the user identification similarity matrix S, that is, the similarity between every two users. So
根据相似矩阵对useru和itemi进行预测的公式如下:其中sim(u,u’)为用户u和u’的相似度,可以根据余弦,pearson系数等算法进行计算。The formula for predicting useru and itemi according to the similarity matrix is as follows: where sim(u, u’) is the similarity between users u and u’, which can be calculated according to cosine, pearson coefficient and other algorithms.
其计算过程大致如下:The calculation process is roughly as follows:
1.根据R得到u对应的一行,找出u没有进行打分的item,即推荐候选的全集I;1. Obtain the row corresponding to u according to R, find out the item that u has not scored, that is, recommend the complete set I of candidates;
2.根据R’得到u’对应的一列,根据A’得到u’对应的均值。对于I中的每个item,计算sim(u,u’)(ru’i-avg(ru’));2. Obtain a column corresponding to u' according to R', and obtain the mean value corresponding to u' according to A'. For each item in I, calculate sim(u, u')(r u'i -avg(r u' ));
3.根据2计算对于u的所有u’∈U的sim(u,u’)(ru’i-avg(ru’)),以及sim(u,u’),并求和;3. According to 2, calculate sim(u, u')(r u'i -avg(r u' )) for all u'∈U of u, and sim(u, u'), and sum;
4.根据A的到u对应的均值,并将3的结果代入,即可得到I中所有的预测值;4. According to the mean value corresponding to A to u, and substituting the result of 3, all the predicted values in I can be obtained;
5.根据需求选出u对I的打分中最高的若干项,即可得到最终的推荐item。5. According to the requirements, select the items with the highest scores from u to I to get the final recommended item.
针对以上描述,那么步骤210,则每个参与计算的计算服务器要获取分配给其进行处理的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块数据、用户标识相似矩阵的分块矩阵数据。For the above description, in step 210, each calculation server participating in the calculation needs to obtain the block data of the user identification-item subject-item weight matrix and the user identification-weight mean matrix assigned to it for processing, and the user identification similarity matrix. Block matrix data.
其中用户标识-权重均值矩阵的分块数据可由用户标识-项目主体-项目权重矩阵的分块数据计算得到,也可以预先计算得到。其中用户标识相似矩阵也可由用户标识-项目主体-项目权重矩阵计算得到。The block data of the user ID-weight mean matrix can be calculated from the block data of the user ID-item subject-item weight matrix, and can also be pre-calculated. The user ID similarity matrix can also be calculated from the user ID-item subject-item weight matrix.
为了方便以后说明,优选设置本发明获取图3中的按行分块的分块矩阵。For convenience of description later, it is preferable to configure the present invention to obtain the block matrix block-by-row in FIG. 3 .
步骤220,每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;Step 220, each calculation server participating in the calculation obtains the component identification in each block matrix data processed by all other calculation servers;
在本发明实施例中,每个参与计算的计算服务器则获取其他计算服务器中的各行用户标识-项目分块矩阵中各行分量的行标识,以及其他计算服务器中各用户标识-权重分块矩阵的行标识。In the embodiment of the present invention, each calculation server participating in the calculation obtains the row identification of each row component in the user identification-item block matrix in other calculation servers, and the row identification of each user identification-weight block matrix in other calculation servers. Row ID.
步骤230,每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算;Step 230, each calculation server participating in the calculation performs pruning calculation on the block matrix data in the user identification similarity matrix;
由于可能存在海量用户,为了降低与当前行的用户相关度不高的用户对计算效率的影响,本步骤则需要将相关度不高的用户进行剪枝。Since there may be a large number of users, in order to reduce the impact of users who are not highly related to the users in the current row on the calculation efficiency, in this step, users who are not highly related need to be pruned.
可选的,本步骤包括:Optionally, this step includes:
步骤S11,针对用户标识相似矩阵中的分块矩阵数据中每行或每列矩阵数据的每个维度,将每个维度的值进行排序,保留每行或每列中排序靠前的至少一个维度。Step S11, for each dimension of each row or column of matrix data in the block matrix data in the user identification similarity matrix, sort the values of each dimension, and retain at least one dimension that is ranked first in each row or column .
比如前述S矩阵中,第一行为对应用户u0的相似度分量,即Sim(u0,u0),Sim(u0,u1)......Sim(u0,uM),那么可保留该行分量中排序靠前的n个维度的相似度值,其他值置为空。For example, in the aforementioned S matrix, the first row corresponds to the similarity component of user u 0 , namely Sim(u 0 , u 0 ), Sim(u 0 , u 1 )...Sim(u 0 , u M ) , then the similarity values of the top n dimensions in the row component can be reserved, and other values can be left blank.
步骤240,每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。Step 240, each computing server participating in the calculation confirms the current Each computing server that the computing server depends on, and the component identification of each block matrix data in each dependent computing server.
然后针对当前计算服务器,其用户相似度分块矩阵S存在所有的列标识,但是每一行的分量中只存在部分列的值,那么根据前述计算原理,需要将用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵转置,即使用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的行标识转置后得到的行向量与S的行向量对齐,然后确认需要计算的某些行标识,然后根据该行标识与计算服务器的对应关系即可确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。Then for the current computing server, its user similarity block matrix S has all the column identifiers, but only some column values exist in the components of each row, then according to the aforementioned calculation principle, it is necessary to combine the user identifier-project subject-project weight matrix and the user identity-weight mean matrix transposition, even if the user identity-item subject-item weight matrix and the row identity of the user identity-weight mean matrix are transposed, the row vector obtained after the row vector is aligned with the row vector of S, and then it is confirmed that a certain Then, according to the corresponding relationship between the row identifier and the computing server, the computing servers that the current computing server depends on, and the component identifiers of the block matrix data in the dependent computing servers can be confirmed.
可选的,本步骤包括:Optionally, this step includes:
步骤S21,每个参与计算的计算服务器将用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的各分量标识进行行分量标识或列分量标识转置;Step S21, each calculation server participating in the calculation performs row component identification or column component identification transposition of each component identification of the user identification-project subject-item weight matrix and the user identification-weight mean matrix;
参照图4,其为本发明实施例确认依赖的计算服务器以及依赖向量的示例。如图4所示为一个两计算服务器的示意图,数据按行进行散布,如上示意,其中奇数行分配到计算服务器N0,偶数行分配到计算服务器N1。其中关系矩阵S中由于计算中减枝,可能会舍弃某些数据,使得矩阵为一个稀疏矩阵。图左边为矩阵R和向量A。对于计算中要用到转置R’和A’,由于数据没有发生变化,只是数据的位置变化了,所以计算中不要额外对数据进行存储,只是取数据时进行行列转换。为了方便描述,这里将R和A以转置的形式放在图的上方。Referring to FIG. 4 , it is an example of a computing server and a dependency vector for confirming dependencies according to an embodiment of the present invention. Figure 4 is a schematic diagram of two computing servers, and the data is distributed by rows, as shown above, where the odd rows are assigned to the computing server N0, and the even rows are assigned to the computing server N1. In the relationship matrix S, some data may be discarded due to branch reduction in the calculation, making the matrix a sparse matrix. On the left side of the figure are matrix R and vector A. For the transposition R' and A' used in the calculation, since the data has not changed, only the position of the data has changed, so do not store the data additionally in the calculation, but only perform row-column conversion when fetching the data. For the convenience of description, R and A are placed above the figure in transposed form.
本发明实施例中以按行分配给计算服务器进行描述,按列分配给计算服务器进行计算的过程与按行分配给计算服务器的计算过程类似,只需转置即可,在此不加以详述。In the embodiment of the present invention, the calculation process of assigning to computing servers by row is described, and the calculation process of assigning to computing servers by columns is similar to the calculation process of assigning to computing servers by rows, which only needs to be transposed, and will not be described in detail here .
如图4中,原始的用户标识-项目主体-项目权重矩阵R包括5行,每行存在一个行标识,即行号,为u0,u1,u2,u3,u4;用户标识-权重均值矩阵A包括也相应包括5行,每行存在一个行标识,即行号,为u0,u1,u2,u3,u4;用户相似度矩阵S包括5列,每列存在一个列标识,即列号,为u0,u1,u2,u3,u4;As shown in Figure 4, the original user ID-item subject-item weight matrix R includes 5 rows, and each row has a row ID, that is, the row number, which is u 0 , u 1 , u 2 , u 3 , u 4 ; user ID- The weight mean matrix A includes and correspondingly includes 5 rows, and each row has a row identifier, that is, the row number, which is u 0 , u 1 , u 2 , u 3 , u 4 ; the user similarity matrix S includes 5 columns, and each column has a The column identification, that is, the column number, is u 0 , u 1 , u 2 , u 3 , u 4 ;
那么,在N0获得N1计算的R的行分量标识u1,u3后,得到R的所有行分量标识,N0获得N1计算的A的行分量标识u1,u3后,得到A的所有行分量标识;那么N0将R的行分量标识(u0,u1,u2,u3,u4)’,进行转置得到(u0,u1,u2,u3,u4);将A的行分量标识(u0,u1,u2,u3,u4)’,进行转置得到(u0,u1,u2,u3,u4)。Then, after N0 obtains the row component identifiers u 1 and u 3 of R calculated by N1, all row component identifiers of R are obtained, and after N0 obtains the row component identifiers u 1 and u 3 of A calculated by N1, all rows of A are obtained Component identification; then N0 transposes the row component identification (u 0 , u 1 , u 2 , u 3 , u 4 )' of R to obtain (u 0 , u 1 , u 2 , u 3 , u 4 ); The row components of A are marked (u 0 , u 1 , u 2 , u 3 , u 4 )', and transposed to obtain (u 0 , u 1 , u 2 , u 3 , u 4 ).
步骤S22,将由行分量标识转置后得到的结果与所述用户标识相似矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;Step S22, aligning the result obtained by transposing the row component identification with the column component identification of the user identification similarity matrix, or aligning the result obtained by transposing the column component identification with the row component identification of the co-user identification similarity matrix ;
将对R和A的行分量标识进行转置得到的结果与S的列分量标识(u0,u1,u2,u3,u4)对齐。The result obtained by transposing the row component identifiers of R and A is aligned with the column component identifiers (u 0 , u 1 , u 2 , u 3 , u 4 ) of S.
步骤S23,针对当前各行或者各列保留的维度,标记对应保留维度的行分量标识或者列分量标识;Step S23, for the currently reserved dimensions of each row or column, mark the row component identifier or column component identifier of the corresponding reserved dimension;
步骤S24,根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;Step S24, according to the row component identification or column component identification of the mark, and the row component identification or column component identification in the block matrix data of the user identification-item subject-item weight matrix and user identification-weight mean matrix obtained locally Compare and judge the row component ID or column component ID that does not exist locally;
步骤S25,根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。Step S25, according to the computing server to which the row component ID or column component ID that does not exist locally, confirm the computing servers that the current server depends on, and the user ID-item subject-item weight matrix and user ID-weight in each dependent computing server Component identification of the mean matrix.
对于计算服务器N0,其处理的S矩阵中的行为第一行、第三行和第五行,每行均保留了(u0,u1,u2,-,u4),那么需要计算的R矩阵和A矩阵中的行向量即为u0,u1,u2,u4行,而由前述可知道u1是在N1中的,那么N0依赖N1中的行分量u1。For computing server N0, the row 1, row 3, and row 5 in the S matrix it processes are reserved for each row (u 0 , u 1 , u 2 , -, u 4 ), then the R that needs to be calculated The row vectors in the matrix and the A matrix are u 0 , u 1 , u 2 , and u 4 rows, and it can be known from the above that u 1 is in N1, so N0 depends on the row component u 1 in N1.
对于计算服务器N0,其处理的S矩阵中的行为第一行、第三行和第五行,每行均保留了(-,u1,u2,u3,u4),那么需要计算的R矩阵和A矩阵中的行向量即为u1,u2,u3,u4行,而由前述可知道u2、u4是在N0中的,那么N1依赖N0中的行分量u2、u4。For the computing server N0, the row 1, row 3, and row 5 in the S matrix it processes are reserved for each row (-, u 1 , u 2 , u 3 , u 4 ), then the R that needs to be calculated The row vectors in the matrix and the A matrix are u 1 , u 2 , u 3 , and u 4 rows, and it can be known from the above that u 2 and u 4 are in N0, then N1 depends on the row components u 2 , u 4 in N0 u 4 .
步骤250,每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;Step 250, each calculation server participating in the calculation sends the component identification of each block matrix data in each dependent calculation server to each calculation server that the current calculation server depends on;
然后N0通知N1需要将u1行分量量发送给N0,N1通知N0需要将u2、u4行分量发送给N1。Then N0 notifies N1 that u 1 line components need to be sent to N0, and N1 notifies N0 that u 2 and u 4 line components need to be sent to N1.
步骤260,每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;Step 260, each calculation server participating in the calculation sends the corresponding component to each calculation server dependent on the component according to the dependent component identifier; and receives the components sent by each calculation server;
如前述N0向N1发送(u2、u4),N1接收(u2、u4);N1向N0发送(u1),N0接收(u1)。As mentioned above, N0 sends (u 2 , u 4 ) to N1, and N1 receives (u 2 , u 4 ); N1 sends (u 1 ) to N0, and N0 receives (u 1 ).
步骤270,每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。Step 270, for each subject in the block matrix data of the local subject-item weight matrix, each computing server participating in the calculation uses the block matrix data of the local subject-item weight matrix and the block matrix data of the local collaborative filtering matrix. A correspondence between the block matrix data and the received components recommends at least one item to the subject.
每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体For each subject in the block matrix data of the local subject-item weight matrix, each computing server participating in the calculation uses the block matrix data of the local subject-item weight matrix and the block matrix data of the local collaborative filtering matrix and the correspondence between the received components, recommending at least one item to the subject
然后每个计算节点即可根据公式(1)计算当前计算节点中的针对每个用户ui的推荐项目。即N0计算针对u0、u1,u3的推荐项目,N1计算针对u2,u4的推荐项目。Then each computing node can calculate the recommended item for each user ui in the current computing node according to the formula (1). That is, N0 calculates recommended items for u 0 , u 1 , u 3 , and N1 calculates recommended items for u 2 , u 4 .
其中,所述各计算服务器之间通过调用讯息传递接口发送和接收数据。Wherein, the computing servers send and receive data by calling the message transfer interface.
本实施例是针对Userbased情况的优选实施例,某些步骤的先后顺序可根据具体情况进行改变,在此不加以限制。This embodiment is a preferred embodiment for the Userbased situation, and the sequence of some steps may be changed according to specific situations, and no limitation is imposed here.
参照图5,其示出了本发明一种协同过滤处理方法实施例三的流程示意图,具体可以包括:Referring to FIG. 5 , it shows a schematic flowchart of Embodiment 3 of a collaborative filtering processing method according to the present invention, which may specifically include:
步骤300,接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程;Step 300, receiving a request for the subject to recommend at least one item, and starting at least two computing servers to perform the recommendation computing process according to the request;
在本发明实施例中,所述主体可比如网络中的用户标识,那么对于用户标识在网络中使用过的或者未使用过的各种项目,系统或者用户则可请求针对各个用户标识推荐某个或者某几个项目,比如针对在网络中购买的产品,向用户推荐相关产品。In the embodiment of the present invention, the subject may be a user ID in the network, then for various items that the user ID has used or not used in the network, the system or the user may request to recommend a certain item for each user ID Or certain projects, such as recommending related products to users for products purchased on the Internet.
那么所述至少两个计算服务器进行推荐计算过程包括:Then, the at least two calculation servers performing the recommendation calculation process include:
步骤310,针对N个计算服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括用户标识-项目主体-项目权重矩阵的分块矩阵数据、项目-项目相似矩阵的分块矩阵数据;Step 310, for the N calculation servers, each calculation server participating in the calculation obtains various block matrix data; the block matrix data includes the block matrix data of the user identification-item subject-item weight matrix, item-item block matrix data for similarity matrix;
参照图6,其示出了本发明实施例Itembased的计算原理示意图。其中图301和302为用户标识-项目主体-项目权重矩阵R,303为项目-项目相似矩阵S。Referring to FIG. 6 , it shows a schematic diagram of the calculation principle of Itembased according to the embodiment of the present invention. Figures 301 and 302 are the matrix R of user identification-item subject-item weight, and figure 303 is the matrix S of item-item similarity.
其计算公式如下:Its calculation formula is as follows:
计算过程如下:The calculation process is as follows:
1.根据R得到u对应的一行,找出u没有进行打分的item,即推荐候选的全集I;1. Obtain the row corresponding to u according to R, find out the item that u has not scored, that is, recommend the complete set I of candidates;
2.对于I中的每个i,得到sim矩阵的一列,得到和i相似的item集合,根据公式进行求和,可得预测的打分值;2. For each i in I, get a column of the sim matrix, get a set of items similar to i, and sum them up according to the formula to get the predicted scoring value;
3.根据需求选出u对I的打分中最高的若干项,即可得到最终的推荐item。3. According to the requirements, select the items with the highest scores from u to I to get the final recommended item.
在本发明实施例中,项目-项目相似矩阵S可由用户标识-项目主体-项目权重矩阵R计算得到。In the embodiment of the present invention, the item-item similarity matrix S can be calculated from the user identification-item subject-item weight matrix R.
为了方便描述,本发明实施例也以按行将矩阵进行分块提供给各计算服务器进行计算,对于按列对矩阵进行分块提供给各计算服务器进行计算的过程与前述按行分块进行分块的计算原理及过程类似,只需进行相应转置计算即可,在此以及后续步骤中不再加以详述。For the convenience of description, in the embodiment of the present invention, the matrix is divided into rows and provided to each computing server for calculation. The process of dividing the matrix into columns and providing it to each calculation server for calculation is the same as the above-mentioned row-based division into blocks. The calculation principle and process of are similar, only need to carry out the corresponding transposition calculation, which will not be described in detail in this and subsequent steps.
步骤320,每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;Step 320, each calculation server participating in the calculation obtains the component identification in each block matrix data processed by all other calculation servers;
即各计算服务器将其获取到的S分块矩阵中的各行标识发送给其他N-1个服务器。That is, each computing server sends the obtained identifiers of each row in the S block matrix to other N-1 servers.
步骤330,每个参与计算的计算服务器根据用户标识-项目主体-项目权重矩阵计算候选推荐集;Step 330, each calculation server participating in the calculation calculates the candidate recommendation set according to the user identification-item subject-item weight matrix;
每个参与计算的计算服务器根据当前的R分块矩阵得到u对应的一行,找出u没有进行打分的item,即推荐候选的候选推荐集I。Each calculation server participating in the calculation obtains a row corresponding to u according to the current R block matrix, and finds out the item that u has not scored, that is, the candidate recommendation set I of the recommendation candidate.
步骤340,每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识;Step 340, each calculation server participating in the calculation confirms the calculation servers that the current calculation server depends on according to the candidate recommendation set, the block matrix data of the item-item similarity matrix and the user identification-item subject-item weight matrix, and the The component identification of each block matrix data in each dependent calculation server;
然后可将整个S矩阵的行标识对应的向量进行转置,与R矩阵的列标识对齐,根据R矩阵中对应候选推荐集I的维度,确认需要S矩阵的哪些行分量,以及该行分量在哪个计算服务器中。Then the vector corresponding to the row identifier of the entire S matrix can be transposed, aligned with the column identifier of the R matrix, and according to the dimensions of the corresponding candidate recommendation set I in the R matrix, it is confirmed which row components of the S matrix are needed, and where the row components are which computing server.
可选的,本步骤包括:Optionally, this step includes:
步骤S31,每个参与计算的计算服务器将所述项目-项目相似矩阵的各分量标识进行行分量标识或列分量标识转置;Step S31, each calculation server participating in the calculation performs row component identification or column component identification transposition of each component identification of the item-item similarity matrix;
比如计算服务器N0获取R的u0,u2,u4行,和S的i0,i2,i4行,计算服务器N1获取R的u1,u3行和S的i1,i3行,那么N0获取到N1的行分量标识后,将各行分量标识对应的向量(i0,i1,i2,i3,i4)’转置。For example, computing server N0 obtains u 0 , u 2 , u 4 rows of R, and i 0 , i 2 , i 4 rows of S, and computing server N1 obtains u 1 , u 3 rows of R and i 1 , i 3 of S. row, then after N0 obtains the row component identifier of N1, it transposes the vector (i 0 , i 1 , i 2 , i 3 , i 4 )' corresponding to each row component identifier.
步骤S32,将由行分量标识转置后得到的结果与所述用户标识-项目主体-项目权重矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;Step S32, aligning the result obtained by transposing the row component identification with the column component identification of the user identification-item subject-item weight matrix, or aligning the result obtained by transposing the column component identification with the co-user identification similarity matrix The line components of the mark are aligned;
计算服务器N0将S的行分量标识的转置结果与R矩阵的一行对齐,即与R的各列分量(u0,u1,u2,u3,u4)对齐。The calculation server N0 aligns the transposed result of the row component identification of S with a row of the R matrix, that is, with each column component (u 0 , u 1 , u 2 , u 3 , u 4 ) of R.
步骤S33,根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;Step S33, according to the row component identifier or column component identifier of the mark, and the row component identifier or column component identifier in the block matrix data of the user identifier-item subject-item weight matrix and user identifier-weight mean matrix obtained locally Compare and judge the row component ID or column component ID that does not exist locally;
比如,对于N0中的u0行,其对应推荐候选集为(i2,i3,i4),那么i3在N1中;对于u2行,其对应推荐候选集为(i2,i3,i4),那么i3在N1中,对于u4行,其对应推荐候选集为(i0,i3,i4),那么i3在N1中。For example, for row u 0 in N0, its corresponding recommendation candidate set is (i 2 , i 3 , i 4 ), then i 3 is in N1; for row u 2 , its corresponding recommendation candidate set is (i 2 , i 3 , i 4 ), then i 3 is in N1, and for row u 4 , the corresponding recommendation candidate set is (i 0 , i 3 , i 4 ), then i 3 is in N1.
对于N1中的u1行,其对应推荐候选集为(i1,i3,i4),那么i4在N0中;对于u3行,其对应推荐候选集为(i0,i3,i4),那么i0、i4在N0中;For row u 1 in N1, its corresponding recommendation candidate set is (i 1 , i 3 , i 4 ), then i 4 is in N0; for u 3 row, its corresponding recommendation candidate set is (i 0 , i 3 , i 4 ), then i 0 and i 4 are in N0;
步骤S34,根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。Step S34, according to the computing server to which the row component ID or column component ID that does not exist locally, confirm the computing servers that the current server depends on, and the user ID-item subject-item weight matrix and user ID-weight in each dependent computing server Component identification of the mean matrix.
那么N0依赖N1的i3,N1依赖N0的i0、i4。Then N0 depends on i 3 of N1, and N1 depends on i 0 and i 4 of N0.
步骤350,每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;Step 350, each calculation server participating in the calculation sends the component identification of each block matrix data in each dependent calculation server to each calculation server that the current calculation server depends on;
那么N0通知N1将i3行向量发送给N0,N1通知N0将i0、i4行向量发送给N1。Then N0 notifies N1 to send i 3 row vectors to N0, and N1 notifies N0 to send i 0 , i 4 row vectors to N1.
步骤360,每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;Step 360, each calculation server participating in the calculation sends the corresponding component to each calculation server dependent on the component according to the dependent component identifier; and receives the components sent by each calculation server;
那么N1将i3行向量发送给N0,N0将i0、i4行向量发送给N1。Then N1 sends i 3 row vectors to N0, and N0 sends i 0 , i 4 row vectors to N1.
步骤370,每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。Step 370, for each subject in the block matrix data of the local subject-item weight matrix, each computing server participating in the calculation utilizes the block matrix data of the local subject-item weight matrix and the block matrix data of the local collaborative filtering matrix. A correspondence between the block matrix data and the received components recommends at least one item to the subject.
然后根据公式(2)进行根据本地的权重分块矩阵数据、协同过滤分块矩阵数据和接收到的分量,计算针对每个ui的推荐项目。Then according to the formula (2), the recommended items for each u i are calculated according to the local weight block matrix data, collaborative filtering block matrix data and received components.
本实施例是针对Itembased情况的优选实施例,某些步骤的先后顺序可根据具体情况进行改变,在此不加以限制。This embodiment is a preferred embodiment for the case of Itembased, and the order of certain steps may be changed according to the actual situation, which is not limited here.
参照图7,其示出了本发明一种协同过滤处理系统实施例一的结构示意图,具体可以包括:Referring to FIG. 7, it shows a schematic structural diagram of Embodiment 1 of a collaborative filtering processing system according to the present invention, which may specifically include:
请求接收模块700和至少两个计算服务器;request receiving module 700 and at least two computing servers;
所述请求接收模块700,适于接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器;The request receiving module 700 is adapted to receive a request for recommending at least one item for the subject, and start at least two computing servers according to the request;
至少两个计算服务器中每个所述计算服务器包括:Each of the at least two computing servers includes:
分块矩阵数据获取模块710,适于针对至少两个计算服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;The block matrix data acquisition module 710 is suitable for at least two computing servers, and each computing server participating in the calculation obtains various block matrix data; the block matrix data includes the block matrix data of the subject-item weight matrix , the block matrix data of the collaborative filtering matrix;
依赖计算服务器确认模块720,适于每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;Dependent calculation server confirmation module 720, adapted to each calculation server participating in the calculation, according to the subject-item weight matrix and the collaborative filtering matrix to confirm each server that the current calculation server depends on, and each component in each dependent calculation server the components of the block matrix data;
分量发送接收模块730,适于每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;The component sending and receiving module 730 is adapted for each computing server participating in the computing to send the dependent component to each computing server dependent on the component; and receive the components sent by each computing server;
推荐项目计算模块740,适于每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。The recommended item calculation module 740 is suitable for each calculation server participating in the calculation, for each subject in the block matrix data of the local subject-item weight matrix, using the block matrix data of the local subject-item weight matrix, the local A correspondence between the block matrix data of the collaborative filtering matrix and the received components recommends at least one item to the subject.
可选的,所述依赖计算服务器确认模块包括:Optionally, the dependent computing server confirmation module includes:
分量标识获取模块,适于每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;The component identification acquisition module is suitable for each calculation server participating in the calculation to obtain the component identification in each block matrix data processed by all other calculation servers;
第一依赖服务器确认模块,适于每个参与计算的计算服务器根据所述协同过滤距离矩阵中的分块矩阵数据,和/或主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。The first dependent server confirmation module is suitable for each calculation server participating in the calculation according to the block matrix data in the collaborative filtering distance matrix, and/or the block matrix data of the subject-item weight matrix, and all other calculation server processing The component identifiers in each block matrix data of the current calculation server are confirmed, and the component identifiers of each block matrix data in the dependent calculation servers are confirmed.
可选的,还包括:Optionally, also include:
标识发送模块,适于每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;The identification sending module is suitable for each calculation server participating in the calculation to send the component identification of each block matrix data in each calculation server on which it depends, to each calculation server on which the current calculation server depends;
进一步的,所述分量发送接收模块包括:Further, the component sending and receiving module includes:
第一分量发送接收模块,适于每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量。The first component sending and receiving module is adapted for each computing server participating in computing to send the corresponding component to each computing server dependent on the component according to the dependent component identifier; and receive the components sent by each computing server.
可选的,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵;Optionally, the subject-item weight matrix includes: user identification-item subject-item weight matrix and user identification-weight mean matrix;
所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的用户标识相似矩阵;The collaborative filtering matrix is a user identification similarity matrix corresponding to the user identification-item subject-item weight matrix;
进一步的,所述分量包括用户标识-项目主体-项目权重矩阵的分量进和用户标识-权重均值矩阵分量。Further, the components include the components of the user identification-item subject-item weight matrix and the user identification-weight mean matrix components.
可选的,所述第一依赖服务器确认模块包括:Optionally, the first dependent server confirmation module includes:
剪枝模块,适于每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算;A pruning module, adapted to perform pruning calculations on the block matrix data in the user identification similarity matrix for each calculation server participating in the calculation;
第二依赖服务器确认模块,适于每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。The second dependent server confirmation module is suitable for each computing server participating in the calculation according to the block matrix data in the user identification similarity matrix after the pruning, and the block matrix data processed by all other computing servers The middle component identification is to confirm the calculation servers that the current calculation server depends on, and the component identifications of the block matrix data in the dependent calculation servers.
可选的,所述剪枝模块包括:Optionally, the pruning module includes:
第一剪枝模块,适于针对用户标识相似矩阵中的分块矩阵数据中每行或每列矩阵数据的每个维度,将每个维度的值进行排序,保留每行或每列中排序靠前的至少一个维度。The first pruning module is suitable for sorting the values of each dimension for each dimension of each row or column of matrix data in the block matrix data in the user identification similarity matrix, and retaining the sorting order in each row or column At least one dimension before .
可选的,所述第二依赖服务器确认模块包括:Optionally, the second dependent server confirmation module includes:
第一行/列分量标识转置模块,适于每个参与计算的计算服务器将用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的各分量标识进行行分量标识或列分量标识转置;The first row/column component identification transposition module is suitable for each calculation server participating in the calculation to perform row component identification or column component identification conversion of each component identification of the user identification-project subject-item weight matrix and the user identification-weight mean matrix set;
第一行/列分量标识对齐模块,适于将由行分量标识转置后得到的结果与所述用户标识相似矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;The first row/column component identification alignment module is suitable for aligning the result obtained by the row component identification transposition with the column component identification of the user identification similarity matrix, or aligning the result obtained by the column component identification transposition with the coordinating The row component identification alignment of the user identification similarity matrix;
第一保留模块,适于针对当前各行或者各列保留的维度,标记对应保留维度的行分量标识或者列分量标识;The first reservation module is adapted to mark the row component identifier or the column component identifier of the corresponding reserved dimension for the dimension reserved for each row or column at present;
第一判断模块,适于根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;The first judging module is adapted to use the tagged row component ID or column component ID with the row component ID in the block matrix data of the locally acquired user ID-item subject-item weight matrix and user ID-weight mean matrix or column component identifier comparison, to determine the row component identifier or column component identifier that does not exist locally;
第三依赖服务器确认模块,适于根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。The third dependent server confirmation module is adapted to confirm the computing servers that the current server depends on, and the user ID-project subject-project weight in the dependent computing servers according to the computing server to which the row component identifier or column component identifier that does not exist locally matrix and userid - Component identity of the weight mean matrix.
可选的,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵;Optionally, the subject-item weight matrix includes: user identification-item subject-item weight matrix;
所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的项目-项目相似矩阵;The collaborative filtering matrix is an item-item similarity matrix corresponding to the user identification-item subject-item weight matrix;
进一步的,所述分量包括项目-项目相似矩阵的分量。Further, the components include components of an item-item similarity matrix.
可选的,所述依赖服务器确认模块包括:Optionally, the dependent server confirmation module includes:
第一候选推荐集计算模块,适于每个参与计算的计算服务器根据用户标识-项目主体-项目权重矩阵计算候选推荐集;The first candidate recommendation set calculation module is suitable for each calculation server participating in the calculation to calculate the candidate recommendation set according to the user identification-item subject-item weight matrix;
第四依赖服务器确认模块,适于每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。The fourth dependent server confirmation module is suitable for each calculation server participating in the calculation to confirm each calculation server that the current calculation server depends on according to the candidate recommendation set, the block matrix data of the item-item similarity matrix and the user identification-item subject-item weight matrix. The computing server, and the component identifiers of the block matrix data in the dependent computing servers.
可选的,所述第四依赖服务器确认模块包括:Optionally, the fourth dependent server confirmation module includes:
第二行/列分量标识转置模块,适于每个参与计算的计算服务器将所述项目-项目相似矩阵的各分量标识进行行分量标识或列分量标识转置;The second row/column component identification transposition module is suitable for each calculation server participating in the calculation to perform row component identification or column component identification transposition for each component identification of the item-item similarity matrix;
第二行/列分量标识对齐模块,适于将由行分量标识转置后得到的结果与所述用户标识-项目主体-项目权重矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;The second row/column component identification alignment module is adapted to align the result obtained after the row component identification is transposed with the column component identification of the user identification-item subject-item weight matrix, or to align the result obtained after the column component identification is transposed The result is aligned with the row component identification of the co-user identification similarity matrix;
第二判断模块,适于根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;The second judging module is adapted to use the row component identifier or column component identifier of the mark, and the row component identifier in the block matrix data of the locally acquired user identifier-item subject-item weight matrix and user identifier-weight mean matrix or column component identifier comparison, to determine the row component identifier or column component identifier that does not exist locally;
第五依赖服务器确认模块,适于根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。The fifth dependent server confirmation module is adapted to confirm the calculation servers that the current server depends on and the user identification-project subject-item weight in each dependent calculation server according to the calculation server to which the row component identifier or column component identifier that does not exist locally belongs matrix and userid - Component identity of the weight mean matrix.
可选的,所述各计算服务器之间通过调用讯息传递接口发送和接收数据。Optionally, the computing servers send and receive data by calling a message transfer interface.
参照图8,示出了根据本发明一个实施例的一种协同过滤处理系统实施例二的结构示意图,具体可以包括:Referring to FIG. 8 , it shows a schematic structural diagram of Embodiment 2 of a collaborative filtering processing system according to an embodiment of the present invention, which may specifically include:
请求接收模块800和至少两个计算服务器;request receiving module 800 and at least two computing servers;
所述请求接收模块800,适于接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器;The request receiving module 800 is adapted to receive a request for recommending at least one item for the subject, and start at least two computing servers according to the request;
至少两个计算服务器中每个所述计算服务器包括:Each of the at least two computing servers includes:
第一分块矩阵数据获取模块810,适于每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块数据、用户标识相似矩阵的分块矩阵数据;The first block matrix data acquisition module 810 is suitable for each computing server participating in the calculation to obtain various block matrix data; the block matrix data includes user identification-item subject-item weight matrix and user identification-weight mean value block data of matrix, block matrix data of user identity similarity matrix;
第一分量标识获取模块820,适于每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;The first component identification acquisition module 820 is adapted for each calculation server participating in the calculation to obtain the component identification in each block matrix data processed by all other calculation servers;
剪枝模块830,适于每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算;The pruning module 830 is adapted to perform pruning calculations on the block matrix data in the user identification similarity matrix for each calculation server participating in the calculation;
第二依赖服务器确认模块840,适于每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识;The second dependent server confirmation module 840 is suitable for each computing server participating in the calculation according to the block matrix data in the user identification similarity matrix after the pruning, and the block matrix data processed by all other computing servers The component identification in the data, confirming the calculation servers that the current calculation server depends on, and the component identification of each block matrix data in the dependent calculation servers;
第一标识发送模块850,适于每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;The first identification sending module 850 is suitable for each calculation server participating in the calculation to send the component identification of each block matrix data in each calculation server on which it depends to each calculation server that the current calculation server depends on;
第一分量发送接收模块860,适于每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;The first component sending and receiving module 860 is adapted for each computing server participating in the calculation to send the corresponding component to each computing server dependent on the component according to the dependent component identifier; and receive the components sent by each computing server;
第一推荐项目计算模块870,适于每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。The first recommended item calculation module 870 is adapted to use the local block matrix data of the subject-item weight matrix for each subject in the block matrix data of the local subject-item weight matrix, A correspondence between block matrix data of the local collaborative filtering matrix and the received components recommends at least one item to the subject.
参照图9,示出了根据本发明一个实施例的一种协同过滤处理系统实施例三的结构示意图,具体可以包括:Referring to FIG. 9 , it shows a schematic structural diagram of Embodiment 3 of a collaborative filtering processing system according to an embodiment of the present invention, which may specifically include:
请求接收模块900和至少两个计算服务器;request receiving module 900 and at least two computing servers;
所述请求接收模块900,适于接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器;The request receiving module 900 is adapted to receive a request for recommending at least one item for the subject, and start at least two computing servers according to the request;
至少两个计算服务器中每个所述计算服务器包括:Each of the at least two computing servers includes:
第二分块矩阵数据获取模块910,适于每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块数据、用户标识相似矩阵的分块矩阵数据;The second block matrix data acquisition module 910 is suitable for each computing server participating in the calculation to obtain various block matrix data; each block matrix data includes user identification-item subject-item weight matrix and user identification-weight mean value block data of matrix, block matrix data of user identity similarity matrix;
第二分量标识获取模块920,适于每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;The second component identification acquisition module 920 is adapted for each calculation server participating in the calculation to obtain the component identification in each block matrix data processed by all other calculation servers;
第一候选推荐集计算模块930,适于每个参与计算的计算服务器根据用户标识-项目主体-项目权重矩阵计算候选推荐集;The first candidate recommendation set calculation module 930 is suitable for each calculation server participating in the calculation to calculate the candidate recommendation set according to the user identification-item subject-item weight matrix;
第四依赖服务器确认模块940,适于每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识;The fourth dependent server confirmation module 940 is suitable for each calculation server participating in the calculation to confirm the current calculation server depends on according to the candidate recommendation set, the block matrix data of the item-item similarity matrix and the user identification-item subject-item weight matrix Each calculation server, and the component identification of each block matrix data in each calculation server that depends on;
第二标识发送模块950,适于每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;The second identification sending module 950 is suitable for each calculation server participating in the calculation to send the component identification of each block matrix data in each calculation server it depends on to each calculation server that the current calculation server depends on;
第二分量发送接收模块960,适于每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;The second component sending and receiving module 960 is adapted for each computing server participating in the calculation to send the corresponding component to each computing server dependent on the component according to the dependent component identifier; and receive the components sent by each computing server;
第二推荐项目计算模块970,适于每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。The second recommended item calculation module 970 is suitable for each calculation server participating in the calculation, for each subject in the block matrix data of the local subject-item weight matrix, using the block matrix data of the local subject-item weight matrix, A correspondence between block matrix data of the local collaborative filtering matrix and the received components recommends at least one item to the subject.
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对齐的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or in the description of the alignment. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的协同过滤处理设备中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the collaborative filtering processing device according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.
本文公开了A1、一种协同过滤处理方法,包括:接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器进行推荐计算过程,所述过程包括:针对至少两个计算服务器,每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。A2、如A1所述的方法,所述每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量,包括:每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;每个参与计算的计算服务器根据本地的协同过滤距离矩阵中的分块矩阵数据,主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。A3、如A2所述的方法,还包括:每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;进一步的,每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量包括:所述每个计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量。A4、如A3所述的方法,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵;所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的用户标识相似矩阵;进一步的,所述分量包括用户标识-项目主体-项目权重矩阵的分量进和用户标识-权重均值矩阵分量。A5、如A4所述的方法,所述每个参与计算的计算服务器根据本地的协同过滤距离矩阵中的分块矩阵数据,主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识包括:每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算;每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。A6、如A5所述的方法,所述每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算包括:针对用户标识相似矩阵中的分块矩阵数据中每行或每列矩阵数据的每个维度,将每个维度的值进行排序,保留每行或每列中排序靠前的至少一个维度。A7、如A6所述的方法,所述每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识包括:每个参与计算的计算服务器将用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的各分量标识进行行分量标识或列分量标识转置;将由行分量标识转置后得到的结果与所述用户标识相似矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;针对当前各行或者各列保留的维度,标记对应保留维度的行分量标识或者列分量标识;根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。A8、如A3所述的方法,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵;所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的项目-项目相似矩阵;进一步的,所述分量包括项目-项目相似矩阵的分量。A9、如A8所述的方法,所述每个参与计算的计算服务器根据本地的协同过滤距离矩阵中的分块矩阵数据,主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识包括:每个参与计算的计算服务器根据用户标识-项目主体-项目权重矩阵计算候选推荐集;每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。A10、如A9所述的方法,所述每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识包括:每个参与计算的计算服务器将所述项目-项目相似矩阵的各分量标识进行行分量标识或列分量标识转置;将由行分量标识转置后得到的结果与所述用户标识-项目主体-项目权重矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。A11、如A1-A10其中之一所述的方法,所述各计算服务器之间通过调用讯息传递接口发送和接收数据。This paper discloses A1, a collaborative filtering processing method, including: receiving a request for recommending at least one item for a subject, and starting at least two computing servers to perform a recommendation computing process according to the request, and the process includes: targeting at least two computing servers , each calculation server participating in the calculation obtains various block matrix data; the block matrix data includes the block matrix data of the subject-item weight matrix and the block matrix data of the collaborative filtering matrix; each calculation participating in the calculation According to the subject-item weight matrix and the collaborative filtering matrix, the server confirms each server that the current calculation server depends on, and the components of each block matrix data in each dependent calculation server; each calculation server participating in the calculation will The above-mentioned dependent component is sent to each calculation server that depends on the component; and the component sent by each calculation server is received; each calculation server participating in the calculation is directed at each subject in the block matrix data of the local subject-item weight matrix, using The local block matrix data of the subject-item weight matrix, the block matrix data of the local collaborative filtering matrix and the corresponding relationship between the received components recommend at least one item to the subject. A2. The method as described in A1, each of the computing servers participating in the computing confirms the servers that the current computing server depends on according to the subject-item weight matrix and the collaborative filtering matrix, and the computing servers that the computing servers depend on The components of each block matrix data include: each computing server participating in the calculation obtains the component identifiers in each block matrix data processed by all other computing servers; each computing server participating in the calculation according to the local collaborative filtering distance matrix Block matrix data, block matrix data of the subject-item weight matrix, and component identifiers in each block matrix data processed by all other computing servers, confirm the computing servers that the current computing server depends on, and the dependent computing servers Component IDs for each block matrix data for . A3. The method as described in A2, further comprising: each computing server participating in the calculation sends the component identification of each block matrix data in each computing server it depends on to each computing server that the current computing server depends on; further , each computing server participating in the calculation sends the dependent component to each computing server that depends on the component; and receiving the components sent by each computing server includes: each computing server according to the dependent component identifier, sending The corresponding component is sent to each computing server depending on the component; and the component sent by each computing server is received. A4, the method as described in A3, the subject-item weight matrix includes: user identification-item subject-item weight matrix and user identification-weight mean matrix; the collaborative filtering matrix is the corresponding user identification-project subject-item weight The user ID similarity matrix of the matrix; further, the components include the components of the user ID-item subject-item weight matrix and the user ID-weight mean matrix components. A5, the method as described in A4, each of the computing servers participating in the calculation is based on the block matrix data in the local collaborative filtering distance matrix, the block matrix data of the subject-item weight matrix, and all other computing servers. The component identification in each block matrix data, confirming the calculation servers that the current calculation server depends on, and the component identification of each block matrix data in the dependent calculation servers include: each calculation server participating in the calculation for the user Identify the block matrix data in the similarity matrix to perform pruning calculations; each computing server participating in the calculation identifies the block matrix data in the similarity matrix according to the pruned user, and all other computing servers process The component identifiers in each block matrix data identify the computing servers that the current computing server depends on, and the component identifiers of each block matrix data in the dependent computing servers. A6. The method as described in A5, wherein each calculation server participating in the calculation performs pruning calculation for the block matrix data in the user identification similarity matrix including: for each of the block matrix data in the user identification similarity matrix For each dimension of the row or column matrix data, the values of each dimension are sorted, and at least one dimension that is ranked first in each row or column is reserved. A7. The method as described in A6, each of the computing servers participating in the computing is based on the block matrix data in the user identification similarity matrix after the pruning, and the blocks processed by all other computing servers The component identification in the matrix data, confirming the calculation servers that the current calculation server depends on, and the component identification of each block matrix data in the dependent calculation servers include: each calculation server participating in the calculation will user identification - project subject - Each component identification of the item weight matrix and the user identification-weight mean matrix is subjected to row component identification or column component identification transposition; the result obtained after the transposition of the row component identification is aligned with the column component identification of the user identification similarity matrix, or will be obtained by The result obtained after the column component identification is transposed is aligned with the row component identification of the co-user identification similarity matrix; for the dimensions reserved for each current row or column, mark the row component identification or column component identification of the corresponding reserved dimension; according to the label Compared with the row or column component identifiers in the block matrix data of the user identifier-item subject-item weight matrix and user identifier-weight mean matrix obtained locally, it is judged that there is no local Row component ID or column component ID; according to the calculation server to which the row component ID or column component ID does not exist locally, confirm the calculation servers that the current server depends on, and the user ID-project subject-item weight matrix in each dependent calculation server and the component ids of the user id-weight mean matrix. A8. The method as described in A3, the subject-item weight matrix includes: user identification-item subject-item weight matrix; the collaborative filtering matrix is an item-item similarity matrix corresponding to the user identification-item subject-item weight matrix ; Further, the components include components of an item-item similarity matrix. A9, the method as described in A8, each of the computing servers participating in the calculation is based on the block matrix data in the local collaborative filtering distance matrix, the block matrix data of the subject-item weight matrix, and all other computing servers. The component identification in each block matrix data, confirming the calculation servers that the current calculation server depends on, and the component identification of each block matrix data in the dependent calculation servers include: each calculation server participating in the calculation according to the user identification - The project subject-project weight matrix calculates the candidate recommendation set; each computing server participating in the calculation confirms that the current computing server relies on Each computing server of , and the component identification of each block matrix data in each dependent computing server. A10, the method as described in A9, each of the computing servers participating in the computing confirms that the current computing server depends on the Each calculation server of each calculation server, and the component identification of each block matrix data in each dependent calculation server includes: each calculation server participating in the calculation performs row component identification or column identification of each component identification of the item-item similarity matrix Component identification transposition; align the result obtained by the row component identification with the column component identification of the user identification-item subject-item weight matrix, or align the result obtained by the column component identification with the co-user identification The row component identification of the similarity matrix is aligned; according to the row component identification or column component identification of the mark, the row component in the block matrix data of the user identification-item subject-item weight matrix and the user identification-weight mean matrix obtained locally ID or column component ID comparison to determine the row component ID or column component ID that does not exist locally; according to the computing server to which the row component ID or column component ID that does not exist locally belongs, confirm the computing servers that the current server depends on, and the dependent computing servers. Compute the component IDs of the user ID-item subject-item weight matrix and the user ID-weight mean matrix in the server. A11. The method described in any one of A1-A10, wherein the computing servers send and receive data by calling a message transfer interface.
本文公开了B12、一种协同过滤处理系统,包括:请求接收模块和至少两个计算服务器;所述请求接收模块,适于接收针对主体推荐至少一个项目的请求,根据所述请求启动至少两个计算服务器;至少两个计算服务器中每个所述计算服务器包括:分块矩阵数据获取模块,适于每个参与计算的计算服务器获取各种分块矩阵数据;所述各分块矩阵数据包括主体-项目权重矩阵的分块矩阵数据、协同过滤矩阵的分块矩阵数据;依赖计算服务器确认模块,适于每个参与计算的计算服务器根据所述主体-项目权重矩阵、和协同过滤矩阵确认当前计算服务器依赖的各个服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量;分量发送接收模块,适于每个参与计算的计算服务器将所述被依赖的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量;推荐项目计算模块,适于每个参与计算的计算服务器针对本地的主体-项目权重矩阵的分块矩阵数据中的每个主体,利用本地的主体-项目权重矩阵的分块矩阵数据、本地的协同过滤矩阵的分块矩阵数据和接收到的分量之间的对应关系,将至少一个项目推荐给所述主体。B13、如B12所述的系统,所述依赖计算服务器确认模块包括:分量标识获取模块,适于每个参与计算的计算服务器获取其他所有计算服务器处理的各分块矩阵数据中分量标识;第一依赖服务器确认模块,适于每个参与计算的计算服务器根据所述协同过滤距离矩阵中的分块矩阵数据,和/或主体-项目权重矩阵的分块矩阵数据,和其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。B14、如B13所述的系统,还包括:标识发送模块,适于每个参与计算的计算服务器将所依赖的各计算服务器中的各分块矩阵数据的分量标识,发送给当前计算服务器依赖的各计算服务器;进一步的,所述分量发送接收模块包括:第一分量发送接收模块,适于每个参与计算的计算服务器根据所述被依赖的分量标识,将所述相应的分量发送给依赖该分量各计算服务器;并接收各计算服务器发送的分量。B15、如B14所述的系统,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵;所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的用户标识相似矩阵;进一步的,所述分量包括用户标识-项目主体-项目权重矩阵的分量进和用户标识-权重均值矩阵分量。B16、如B15所述的系统,所述第一依赖服务器确认模块包括:剪枝模块,适于每个参与计算的计算服务器针对所述用户标识相似矩阵中的分块矩阵数据进行剪枝计算;第二依赖服务器确认模块,适于每个参与计算的计算服务器根据所述剪枝后的所述用户标识相似矩阵中的分块矩阵数据,以及所述其他所有计算服务器处理的各分块矩阵数据中分量标识,确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。B17、如B16所述的装置,所述剪枝模块包括:第一剪枝模块,适于针对用户标识相似矩阵中的分块矩阵数据中每行或每列矩阵数据的每个维度,将每个维度的值进行排序,保留每行或每列中排序靠前的至少一个维度。B18、如B17所述的系统,所述第二依赖服务器确认模块包括:第一行/列分量标识转置模块,适于每个参与计算的计算服务器将用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的各分量标识进行行分量标识或列分量标识转置;第一行/列分量标识对齐模块,适于将由行分量标识转置后得到的结果与所述用户标识相似矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;第一保留模块,适于针对当前各行或者各列保留的维度,标记对应保留维度的行分量标识或者列分量标识;第一判断模块,适于根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;第三依赖服务器确认模块,适于根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。B19、如B14所述的系统,其特征在于,所述主体-项目权重矩阵包括:用户标识-项目主体-项目权重矩阵;所述协同过滤矩阵为对应用户标识-项目主体-项目权重矩阵的项目-项目相似矩阵;进一步的,所述分量包括项目-项目相似矩阵的分量。B20、如B19所述的系统,所述依赖服务器确认模块包括:第一候选推荐集计算模块,适于每个参与计算的计算服务器根据用户标识-项目主体-项目权重矩阵计算候选推荐集;第四依赖服务器确认模块,适于每个参与计算的计算服务器根据所述候选推荐集、项目-项目相似矩阵的分块矩阵数据和用户标识-项目主体-项目权重矩阵确认当前计算服务器依赖的各计算服务器,以及所述依赖的各计算服务器中的各分块矩阵数据的分量标识。B21、如B20所述的系统,所述第四依赖服务器确认模块包括:第二行/列分量标识转置模块,适于每个参与计算的计算服务器将所述项目-项目相似矩阵的各分量标识进行行分量标识或列分量标识转置;第二行/列分量标识对齐模块,适于将由行分量标识转置后得到的结果与所述用户标识-项目主体-项目权重矩阵的列分量标识对齐,或者将由列分量标识转置后得到的结果与所述协用户标识相似矩阵的行分量标识对齐;第二判断模块,适于根据所述标记的行分量标识或列分量标识,与本地获取的用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分块矩阵数据中的行分量标识或列分量标识比较,判断本地不存在的行分量标识或列分量标识;第五依赖服务器确认模块,适于根据本地不存在的行分量标识或列分量标识所属的计算服务器,确认当前服务器依赖的各计算服务器,以及依赖的各计算服务器中用户标识-项目主体-项目权重矩阵和用户标识-权重均值矩阵的分量标识。B22、如B12-B21其中之一所述的系统,所述各计算服务器之间通过调用讯息传递接口发送和接收数据。This paper discloses B12, a collaborative filtering processing system, including: a request receiving module and at least two computing servers; the request receiving module is adapted to receive a request for recommending at least one item for the subject, and start at least two according to the request Computing server; each of the computing servers in at least two computing servers includes: a block matrix data acquisition module, which is suitable for each computing server participating in the calculation to obtain various block matrix data; each of the block matrix data includes a main body - block matrix data of the item weight matrix, block matrix data of the collaborative filtering matrix; rely on the calculation server confirmation module, suitable for each calculation server participating in the calculation to confirm the current calculation according to the subject-item weight matrix and the collaborative filtering matrix Each server that the server depends on, and the components of each block matrix data in the dependent computing servers; the component sending and receiving module is suitable for each computing server participating in the calculation to send the dependent component to the dependent component each calculation server; and receive the components sent by each calculation server; the recommended item calculation module is suitable for each calculation server participating in the calculation for each subject in the block matrix data of the local subject-item weight matrix, using the local subject - a correspondence between the block matrix data of the item weight matrix, the block matrix data of the local collaborative filtering matrix and the received components, recommending at least one item to said subject. B13, the system as described in B12, said dependent calculation server confirmation module includes: component identification acquisition module, suitable for each calculation server participating in the calculation to obtain the component identification in each block matrix data processed by other all calculation servers; the first Relying on the server confirmation module, suitable for each computing server participating in the calculation according to the block matrix data in the collaborative filtering distance matrix, and/or the block matrix data of the subject-item weight matrix, and all other computing servers. The component identifiers in the block matrix data confirm the computing servers that the current computing server depends on, and the component identifiers of the block matrix data in the dependent computing servers. B14. The system as described in B13, further comprising: an identification sending module, which is suitable for each calculation server participating in the calculation to send the component identification of each block matrix data in each calculation server on which the current calculation server depends. Each computing server; further, the component sending and receiving module includes: a first component sending and receiving module, adapted for each computing server participating in the calculation to send the corresponding component to the dependent component according to the dependent component identifier component to each computing server; and receive the components sent by each computing server. B15, the system as described in B14, the subject-item weight matrix includes: user identification-item subject-item weight matrix and user identification-weight mean matrix; the collaborative filtering matrix is the corresponding user identification-item subject-item weight The user ID similarity matrix of the matrix; further, the components include the components of the user ID-item subject-item weight matrix and the user ID-weight mean matrix components. B16, the system as described in B15, the first dependent server confirmation module includes: a pruning module, adapted to perform pruning calculations for the block matrix data in the user identification similarity matrix for each calculation server participating in the calculation; The second dependent server confirmation module is suitable for each computing server participating in the calculation according to the block matrix data in the user identification similarity matrix after the pruning, and the block matrix data processed by all other computing servers The middle component identification is to confirm the calculation servers that the current calculation server depends on, and the component identifications of the block matrix data in the dependent calculation servers. B17. The device as described in B16, the pruning module includes: a first pruning module, suitable for each dimension of each row or column of matrix data in the block matrix data in the user identification similarity matrix, each Dimension values are sorted, and at least one dimension that ranks first in each row or column is retained. B18, the system as described in B17, the second dependent server confirmation module includes: the first row/column component identification transposition module, suitable for each calculation server participating in the calculation user identification-item subject-item weight matrix and Each component identification of the user identification-weight mean matrix is subjected to row component identification or column component identification transposition; the first row/column component identification alignment module is suitable for transposing the result obtained by the row component identification and the user identification similarity matrix Align the column component identification of the column component identification, or align the result obtained after the transposition of the column component identification with the row component identification of the co-user identification similarity matrix; the first reservation module is suitable for the dimensions reserved for each current row or column, and the corresponding marking The row component identifier or column component identifier of the reserved dimension; the first judgment module is adapted to combine the locally acquired user identifier-item subject-item weight matrix and user identifier-weight mean value according to the marked row component identifier or column component identifier Comparing the row component identifiers or column component identifiers in the block matrix data of the matrix, judging the row component identifiers or column component identifiers that do not exist locally; the third depends on the server confirmation module, which is suitable for judging the row component identifiers or column component identifiers that do not exist locally Identify the computing server to which the current server depends, and confirm the computing servers that the current server depends on, and the component identifications of the user ID-project subject-item weight matrix and the user ID-weight mean matrix in each dependent computing server. B19, the system as described in B14, is characterized in that, described main body-item weight matrix comprises: user identification-item main body-item weight matrix; Described collaborative filtering matrix is the item of corresponding user identification-item main body-item weight matrix - Item similarity matrix; further, the components include items of item-item similarity matrix. B20, the system as described in B19, said dependent server confirmation module includes: a first candidate recommendation set calculation module, suitable for each calculation server participating in the calculation to calculate the candidate recommendation set according to the user identification-item subject-item weight matrix; Four dependent server confirmation modules, suitable for each calculation server participating in the calculation to confirm each calculation that the current calculation server depends on according to the candidate recommendation set, the block matrix data of the item-item similarity matrix and the user identification-item subject-item weight matrix The server, and the component identification of each block matrix data in each dependent calculation server. B21, the system as described in B20, the fourth dependent server confirmation module includes: the second row/column component identification transposition module, which is suitable for each component of the item-item similarity matrix for each calculation server participating in the calculation The logo performs row component logo or column component logo transposition; the second row/column component logo alignment module is suitable for combining the result obtained after the row component logo transposition with the column component logo of the user logo-item subject-item weight matrix Align, or align the result obtained by the column component identifier transposition with the row component identifier of the co-user identifier similarity matrix; the second judging module is suitable for matching the locally acquired Compare the row component ID or column component ID in the block matrix data of the user ID-item subject-item weight matrix of the user ID-weight mean matrix, and judge the row component ID or column component ID that does not exist locally; the fifth depends on the server The confirmation module is adapted to confirm the calculation servers that the current server depends on, and the user identification-project subject-item weight matrix and user identification in each dependent calculation server according to the calculation server to which the row component identifier or column component identifier that does not exist locally - Component identification of the weight mean matrix. B22. The system according to one of B12-B21, wherein the computing servers send and receive data by calling a message transfer interface.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210518155.5A CN103049488B (en) | 2012-12-05 | 2012-12-05 | A kind of collaborative filtering disposal route and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210518155.5A CN103049488B (en) | 2012-12-05 | 2012-12-05 | A kind of collaborative filtering disposal route and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103049488A CN103049488A (en) | 2013-04-17 |
CN103049488B true CN103049488B (en) | 2015-11-25 |
Family
ID=48062129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210518155.5A Active CN103049488B (en) | 2012-12-05 | 2012-12-05 | A kind of collaborative filtering disposal route and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103049488B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103309967B (en) * | 2013-06-05 | 2016-10-26 | 清华大学 | Collaborative filtering method based on similarity transmission and system |
CN105718488A (en) * | 2014-12-04 | 2016-06-29 | 阿里巴巴集团控股有限公司 | Computer system based recommendation method and apparatus |
CN105426392B (en) * | 2015-10-28 | 2019-03-26 | 浪潮软件集团有限公司 | Collaborative filtering recommendation method and system |
CN112268560B (en) * | 2020-10-09 | 2022-09-09 | 广州杰赛科技股份有限公司 | AGV moving path monitoring method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547351A (en) * | 2003-12-04 | 2004-11-17 | 上海交通大学 | Collaborative filtering recommendation method for dealing with a large number of users |
CN102346751A (en) * | 2010-08-03 | 2012-02-08 | 阿里巴巴集团控股有限公司 | Information transmitting method and equipment |
CN103019860A (en) * | 2012-12-05 | 2013-04-03 | 北京奇虎科技有限公司 | Collaborative-filtering-based processing method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2447868A (en) * | 2007-03-29 | 2008-10-01 | Motorola Inc | A distributed content item recommendation system |
-
2012
- 2012-12-05 CN CN201210518155.5A patent/CN103049488B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547351A (en) * | 2003-12-04 | 2004-11-17 | 上海交通大学 | Collaborative filtering recommendation method for dealing with a large number of users |
CN102346751A (en) * | 2010-08-03 | 2012-02-08 | 阿里巴巴集团控股有限公司 | Information transmitting method and equipment |
CN103019860A (en) * | 2012-12-05 | 2013-04-03 | 北京奇虎科技有限公司 | Collaborative-filtering-based processing method and system |
Also Published As
Publication number | Publication date |
---|---|
CN103049488A (en) | 2013-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11847612B2 (en) | Social media profiling for one or more authors using one or more social media platforms | |
US7752198B2 (en) | Method and device for efficiently ranking documents in a similarity graph | |
CN102508870B (en) | Individualized recommending method in combination of rating data and label data | |
CN103019860B (en) | Based on disposal route and the system of collaborative filtering | |
CN105912669A (en) | Method and device for complementing search terms and establishing individual interest model | |
CN106407420B (en) | Multimedia resource recommendation method and system | |
CN105247507A (en) | Brand Impact Score | |
CN102467709A (en) | Product information sending method and device | |
CN104838376A (en) | Generating snippets for prominent users for information retrieval queries | |
CN103049488B (en) | A kind of collaborative filtering disposal route and system | |
CN103455613A (en) | Interest aware service recommendation method based on MapReduce model | |
CN102682046A (en) | Member searching and analyzing method in social network and searching system | |
CN104199836B (en) | A kind of mark user model constructing method divided based on sub- interest | |
CN105808541A (en) | Information matching processing method and apparatus | |
CN104462327A (en) | Computing method, search processing method, computing device and search processing device for sentence similarity | |
CN104915860A (en) | Commodity recommendation method and device | |
CN103294692A (en) | Information recommendation method and system | |
CN105095474A (en) | Method and device for establishing recommendation relation between searching terms and application data | |
CN104050243A (en) | Network searching method and system combined with searching and social contact | |
CN103049486B (en) | A kind of disposal route of collaborative filtering distance and system | |
CN109635072B (en) | Public opinion data distributed storage method, public opinion data distributed storage device, storage medium and terminal equipment | |
CN102737038B (en) | Degree of association defining method and device, information providing method and device | |
CN104156375A (en) | Message inputting method and system based on crowdsourcing | |
CN101639856B (en) | Webpage correlation evaluation device for detecting internet information spreading | |
CN111242239B (en) | Training sample selection method, training sample selection device and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220725 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |