TW201942834A

TW201942834A - Item recommendation

Info

Publication number: TW201942834A
Application number: TW108101008A
Authority: TW
Inventors: 陳超超; 周俊
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2018-03-27
Filing date: 2019-01-10
Publication date: 2019-11-01
Also published as: CN108647985A; WO2019184480A1; CN108647985B

Abstract

A method and a device for predicting a rating of an item by a user and a method and a device for item recommendation. The rating method for predicting a rating comprises: acquiring a plurality of sample pairs, a sample pair comprising any user identifier selected from a plurality of user identifiers and any item identifier selected from a plurality of item identifiers; acquiring a plurality of existing ratings, the plurality of existing ratings corresponding to some of the plurality of sample pairs; acquiring multiple sets of contextual features corresponding to the respective sample pairs respectively; on the basis of the multiple sets of contextual features, clustering the plurality of sample pairs into a plurality of sub-categories; and with regard to each sub-category, on the basis of the plurality of first user identifiers and the plurality of first item identifiers, as well as the plurality of existing ratings of the plurality of first items by the plurality of first users, predicting, by means of a collaborative filtering algorithm, the ratings of the first items not rated by the first users.

Description

Article recommendation method and device

本說明書實施例涉及資料處理領域，更具體地，涉及一種預測用戶對物品的評分的方法和裝置、以及一種物品推薦方法和裝置。The embodiments of the present specification relate to the field of data processing, and more particularly, to a method and device for predicting a user's rating of an item, and an item recommendation method and device.

在網際網路中，推薦功能是頻繁使用的一種功能。在現有的推薦系統中，一般依據已有的用戶對物品的評分進行推薦。然而，在系統中，除了評分資訊之外，還存在多種多樣的資訊。以電影推薦為例，除了用戶對電影的評分資訊以外，還有許多潛在的上下文特徵，比如評分的時間(是否節假日、早上、中午、晚上等)，用戶的年齡(青少年、中年還是老年)，電影的類型(如愛情、動作、恐怖)等等。因此，需要一種更有效的推薦方案，其除了利用顯式的評分資訊外，還可以利用所述上下文特徵，以更有效地進行推薦。In the Internet, the recommendation function is a frequently used function. In existing recommendation systems, items are generally recommended based on existing user ratings. However, in the system, in addition to the scoring information, there is a variety of information. Taking movie recommendation as an example, in addition to the user's rating information on the movie, there are many potential contextual features, such as the time of rating (whether it is a holiday, morning, noon, evening, etc.), and the user's age (teen, middle-aged, or elderly) , The type of movie (such as love, action, horror) and so on. Therefore, there is a need for a more effective recommendation scheme, which can use the contextual features in addition to explicit scoring information for more effective recommendation.

本說明書實施例旨在提供一種更有效的物品推薦方案，以解決現有技術中的不足。
為實現上述目的，本說明書一個態樣提供一種預測用戶對物品的評分的方法，包括：獲取多個樣本對，所述樣本對包括選自於多個用戶標識的任一個用戶標識和選自於多個物品標識的任一個物品標識；獲取多個已有評分，所述多個已有評分對應於所述多個樣本對中的部分樣本對；獲取分別與各個樣本對對應的多組上下文特徵，其中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵；基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類，其中，每個子類包括取自於所述多個樣本對中的多個第一樣本對，每個所述第一樣本對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識；以及關於每個子類，基於多個所述第一用戶標識和多個所述第一物品標識、和多個所述第一用戶相對於多個所述第一物品的多個已有評分，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分。
在一個實施例中，在所述預測用戶對物品的評分的方法中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵。
在一個實施例中，在所述預測用戶對物品的評分的方法中，所述用戶特徵包括用戶屬性特徵和/或用戶評分統計特徵，所述物品特徵包括物品屬性特徵和/或物品評分統計特徵。
在一個實施例中，在所述預測用戶對物品的評分的方法中，所述聚類演算法為k-means演算法或gmm演算法。
在一個實施例中，在所述預測用戶對物品的評分的方法中，基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類包括：在所述多個樣本對中隨機選擇預定數目的初始質心；基於所述上下文特徵，計算每個非質心的樣本對到各個質心的距離；根據所述距離，將每個非質心的樣本對歸類到距離最近的質心；根據所述預定數目的質心及其對應的非質心樣本對，計算相同數目的新的質心；判斷所述新的質心是否滿足預定條件；以及在滿足所述預定條件的情況中，輸出對所述多個樣本對的聚類結果。
在一個實施例中，在所述預測用戶對物品的評分的方法中，所述協同過濾演算法為矩陣分解演算法或knn演算法。
在一個實施例中，在所述預測用戶對物品的評分的方法中，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分包括：對於每個子類，基於所述多個第一用戶標識、所述多個第一物品標識及所述多個第一用戶相對於所述多個第一物品的所述多個已有評分，獲取用戶-物品評分矩陣；將所述用戶-物品評分矩陣分解為兩個低維矩陣，使得所述兩個低維矩陣的乘積最接近所述用戶-物品評分矩陣；根據將兩個低維矩陣相乘獲得的矩陣，預測所述用戶-物品評分矩陣中各個第一用戶對其未評分的第一物品的評分。
在一個實施例中，在所述預測用戶對物品的評分的方法中，所述已有評分為用戶直接評分或基於用戶操作獲取的評分。
本說明書另一態樣提供一種物品推薦方法，包括：獲取多個第二樣本對，所述第二樣本對包括第二用戶標識和第二物品標識，其中，所述第二用戶標識為待推薦用戶的用戶標識，所述第二物品標識為對應於多個待推薦物品的多個物品標識中的任一個物品標識；在透過上述預測評分的方法獲取的多個子類中，確定各個所述第二樣本對所在的子類；從通過上述預測評分的方法預測的評分中，獲取每個所述第二樣本對在其所屬子類中對應的預測評分；根據所述預測評分，對所述各個第二樣本對中包括的第二物品標識進行排序；以及根據所述排序，對所述第二用戶推薦所述第二物品。
本說明書另一態樣提供一種預測用戶對物品的評分的裝置，包括：樣本對獲取單元，配置為，獲取多個樣本對，所述樣本對包括選自於多個用戶標識的任一個用戶標識和選自於多個物品標識的任一個物品標識；評分獲取單元，配置為，獲取多個已有評分，所述多個已有評分對應於所述多個樣本對中的部分樣本對；上下文特徵獲取單元，配置為，獲取分別與各個樣本對對應的多組上下文特徵，其中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵；聚類單元，配置為，基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類，其中，每個子類包括取自於所述多個樣本對中的多個第一樣本對，每個所述第一樣本對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識；以及評分預測單元，配置為，關於每個子類，基於多個所述第一用戶標識和多個所述第一物品標識、和多個所述第一用戶相對於多個所述第一物品的多個已有評分，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分。
在一個實施例中，在所述預測用戶對物品的評分的裝置中，所述聚類單元包括：選擇單元，配置為，在所述多個樣本對中隨機選擇預定數目的初始質心；第一計算單元，配置為，基於所述上下文特徵，計算每個非質心的樣本對到各個質心的距離；歸類單元，配置為，根據所述距離，將每個非質心的樣本對歸類到距離最近的質心；第二計算單元，配置為，根據所述預定數目的質心及其對應的非質心樣本對，計算相同數目的新的質心；判斷單元，配置為，判斷所述新的質心是否滿足預定條件；以及輸出單元，配置為，在滿足所述預定條件的情況中，輸出對所述多個樣本對的聚類結果。
在一個實施例中，在所述預測用戶對物品的評分的裝置中，所述評分預測單元包括：獲取單元，配置為，對於每個子類，基於所述多個第一用戶標識、所述多個第一物品標識及所述多個第一用戶相對於所述多個第一物品的所述多個已有評分，獲取用戶-物品評分矩陣；分解單元，配置為，將所述用戶-物品評分矩陣分解為兩個低維矩陣，使得所述兩個低維矩陣的乘積最接近所述用戶-物品評分矩陣；以及預測單元，配置為，根據將兩個低維矩陣相乘獲得的矩陣，預測所述用戶-物品評分矩陣中各個第一用戶對其未評分的第一物品的評分。
本說明書另一態樣提供一種物品推薦裝置，包括：樣本對獲取單元，配置為，獲取多個第二樣本對，所述第二樣本對包括第二用戶標識和第二物品標識，其中，所述第二用戶標識為待推薦用戶的用戶標識，所述第二物品標識為對應於多個待推薦物品的多個物品標識中的任一個物品標識；確定單元，配置為，在透過上述預測評分的方法獲取的多個子類中，確定各個所述第二樣本對所在的子類；預測評分獲取單元，配置為，從透過所述預測評分的方法預測的評分中，獲取每個所述第二樣本對在其所屬子類中對應的預測評分；排序單元，配置為，根據所述預測評分，對所述各個第二樣本對中包括的第二物品標識進行排序；以及推薦單元，配置為，根據所述排序，對所述第二用戶推薦所述第二物品。
在根據本說明書實施例的物品推薦方法中，透過使用用戶-物品的上下文特徵對用戶-物品對進行聚類，使得每個子類的評分噪音更小，相關性更高，因此，在每個子類中使用協同過濾方法，可以獲得更好的推薦性能。The embodiments of the present specification aim to provide a more effective item recommendation scheme to solve the deficiencies in the prior art.
In order to achieve the above object, one aspect of the present specification provides a method for predicting a user's rating of an item, including: obtaining a plurality of sample pairs, the sample pairs including any user ID selected from a plurality of user IDs and selected from Any item identifier of multiple item identifiers; obtaining multiple existing scores, the multiple existing scores corresponding to a part of the sample pairs in the multiple sample pairs; obtaining multiple sets of context features respectively corresponding to each sample pair Where a set of context features includes at least one of the following features: user features, item features, and interaction features; based on the plurality of sets of context features, clustering the plurality of sample pairs into multiple subclasses, where each subclass Including a plurality of first sample pairs taken from the plurality of sample pairs, each of the first sample pairs including a first user identifier and a first article identifier, wherein the first user identifier is the first A user's identification, the first item identification is an identification of the first item; and regarding each sub-category, based on a plurality of the first user identification and a plurality of the first item identification And a plurality of said first plurality of users with respect to the first article has a plurality of score, the first score prediction filter algorithms each article a first user through their synergistic unrated.
In one embodiment, in the method for predicting a user's rating of an item, a set of contextual features includes at least one of the following types of features: user features, item features, and interaction features.
In one embodiment, in the method for predicting a user's rating of an item, the user characteristic includes a user attribute characteristic and / or a user rating statistical characteristic, and the item characteristic includes an item attribute characteristic and / or an item rating statistical characteristic. .
In one embodiment, in the method for predicting a user's rating of an item, the clustering algorithm is a k-means algorithm or a gmm algorithm.
In one embodiment, in the method for predicting a user's rating of an item, clustering the plurality of sample pairs into a plurality of subclasses based on the plurality of sets of context features includes: among the plurality of sample pairs Randomly select a predetermined number of initial centroids; calculate the distance from each non-centroid sample pair to each centroid based on the context feature; classify each non-centroid sample pair to the nearest distance based on the distance Calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs; determine whether the new centroids meet a predetermined condition; and meet the predetermined condition In the case of, a clustering result of the plurality of sample pairs is output.
In one embodiment, in the method for predicting a user's rating of an item, the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.
In one embodiment, in the method for predicting a user's rating of an item, predicting the score of an unrated first item by each first user through a collaborative filtering algorithm includes: for each sub-category, based on the multiple A first user identification, the plurality of first item identifications, and the plurality of existing ratings of the plurality of first users relative to the plurality of first items to obtain a user-item rating matrix; The user-item scoring matrix is decomposed into two low-dimensional matrices so that the product of the two low-dimensional matrices is closest to the user-item scoring matrix; the user is predicted based on the matrix obtained by multiplying the two low-dimensional matrices. -Each first user's rating in the item scoring matrix for its unrated first item.
In one embodiment, in the method for predicting a user's rating of an item, the existing rating is a direct rating by a user or a rating obtained based on a user operation.
Another aspect of this specification provides an item recommendation method, including: obtaining a plurality of second sample pairs, where the second sample pair includes a second user identifier and a second item identifier, wherein the second user identifier is to be recommended The user identification of the user, the second item identification is any one of a plurality of item identifications corresponding to a plurality of items to be recommended; among the plurality of subclasses obtained through the above-mentioned prediction scoring method, each of the first The sub-category where the two sample pairs are located; from the score predicted by the above-mentioned prediction scoring method, obtaining the prediction score corresponding to each second sample pair in the sub-category to which the second sample pair belongs; Sorting the second item identifiers included in the second sample pair; and recommending the second item to the second user according to the ranking.
Another aspect of the present specification provides a device for predicting a user's rating of an article, including: a sample pair obtaining unit configured to obtain a plurality of sample pairs, the sample pair including any user identifier selected from a plurality of user identifiers And any one item identifier selected from a plurality of item identifiers; a score obtaining unit configured to obtain a plurality of existing scores, the plurality of existing scores corresponding to a part of the plurality of sample pairs; a context A feature obtaining unit is configured to obtain multiple sets of context features corresponding to each sample pair, wherein a set of context features includes at least one of the following types of features: user features, item features, and interaction features; the clustering unit is configured to, based on The plurality of sets of context features, clustering the plurality of sample pairs into a plurality of subclasses, wherein each subclass includes a plurality of first sample pairs taken from the plurality of sample pairs, each of which The first sample pair includes a first user identification and a first item identification, wherein the first user identification is an identification of the first user and the first item identification is a first Product identification; and a score prediction unit configured to, for each sub-category, based on a plurality of the first user identification and a plurality of the first item identification, and a plurality of the first user with respect to a plurality of the The multiple existing ratings of the first item are predicted by each first user through a collaborative filtering algorithm.
In one embodiment, in the device for predicting a user's rating of an item, the clustering unit includes a selection unit configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs; A calculation unit configured to calculate a distance from each non-centroid sample pair to each centroid based on the context feature; a classification unit configured to, according to the distance, each non-centroid sample pair Classify to the nearest centroid; the second calculation unit is configured to calculate the same number of new centroids based on the predetermined number of centroids and their corresponding non-centroid sample pairs; the judgment unit is configured to, Determining whether the new centroid satisfies a predetermined condition; and an output unit configured to output a clustering result of the plurality of sample pairs if the predetermined condition is satisfied.
In one embodiment, in the apparatus for predicting a user's rating of an item, the rating prediction unit includes: an obtaining unit configured to, for each sub-category, based on the plurality of first user identifiers, the plurality of A first item identifier and the plurality of existing scores of the plurality of first users relative to the plurality of first items to obtain a user-item score matrix; a decomposition unit configured to divide the user-item The rating matrix is decomposed into two low-dimensional matrices such that the product of the two low-dimensional matrices is closest to the user-item scoring matrix; and a prediction unit is configured to, according to a matrix obtained by multiplying the two low-dimensional matrices, Each first user in the user-item scoring matrix is predicted to score the first unrated item.
Another aspect of the present specification provides an article recommendation device, including: a sample pair obtaining unit configured to obtain a plurality of second sample pairs, where the second sample pair includes a second user identifier and a second item identifier, wherein The second user identifier is a user identifier of a user to be recommended, and the second item identifier is any one of a plurality of item identifiers corresponding to a plurality of items to be recommended; a determining unit is configured to, based on the prediction score, Of the plurality of subclasses obtained by the method, determine the subclass in which each of the second sample pairs is located; a prediction score acquisition unit configured to acquire each of the second from a score predicted by the prediction score method; A sample pair corresponding to a prediction score in the subclass to which it belongs; a sorting unit configured to sort the second item identifiers included in each of the second sample pairs according to the prediction score; and a recommendation unit configured to, According to the ranking, the second item is recommended to the second user.
In the article recommendation method according to the embodiment of the present specification, the user-item pair is clustered by using the context feature of the user-item, so that the score noise of each sub-class is smaller and the correlation is higher. Therefore, in each sub-class In the use of collaborative filtering method, better recommendation performance can be obtained.

下面將結合圖式描述本說明書實施例。
圖1顯示根據本說明書實施例的系統100的示意圖。如圖1所示，系統100包括聚類模組11、預測評分模組12和推薦模組13。首先，將多個用戶-物品對及其對應的多組上下文特徵輸入給聚類模組11。聚類模組11透過對由每組上下文特徵構成的多個特徵向量進行聚類，而獲得對用戶-物品對的聚類，即，將每個用戶-物品對都聚類到對應的子類中。然後，聚類模組11將透過聚類獲得的多個子類發送給預測評分模組12。同時，將各個子類包括的用戶對物品的已有評分發送給預測評分模組12。預測評分模組12在各個子類中利用所述已有評分，透過協同過濾演算法預測子類中的用戶對物品的缺失的評分。在透過推薦模組13對用戶進行推薦時，推薦模組13透過用戶標識和待推薦物品標識，確定用戶-待推薦物品對所在的子類，從預測評分模組12獲取關於該子類的該用戶-待推薦物品對的預測評分，並根據多個待推薦物品的預測評分的排序，向用戶推薦物品。
圖2示意顯示根據本說明書實施例的一種預測用戶對物品的評分的方法的流程圖，包括：在步驟S21，獲取多個樣本對，所述樣本對包括選自於多個用戶標識的任一個用戶標識和選自於多個物品標識的任一個物品標識；在步驟S22，獲取多個已有評分，所述多個已有評分對應於所述多個樣本對中的部分樣本對；在步驟S23，獲取分別與各個樣本對對應的多組上下文特徵，其中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵；在步驟S24，基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類，其中，每個子類包括取自於所述多個樣本對中的多個第一樣本對，每個所述第一樣本對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識；以及在步驟S25，關於每個子類，基於多個所述第一用戶標識和多個所述第一物品標識、和多個所述第一用戶相對於多個所述第一物品的多個已有評分，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分。
首先，在步驟S21，獲取多個樣本對，所述樣本對包括選自於多個用戶標識的任一個用戶標識和選自於多個物品標識的任一個物品標識。所述樣本對即用戶-物品對，其可以表示為(用戶標識，物品標識)。所述用戶可以是推薦系統中的全部用戶，例如，在豆瓣電影APP中包括的全部用戶、淘寶中包括的全部用戶等。當然，所述多個用戶不必須是推薦系統中的全部用戶，其例如也可以是推薦系統中的一個單元所涉及的系統部分用戶。所述物品可以是推薦系統中包括的全部物品，例如，豆瓣電影中的電影、淘寶中的商品等。同理，所述多個物品不必須是系統中的全部物品，其也可以是系統中一定範圍內的部分物品。透過將多個用戶中的每個用戶與多個物品中的每個物品兩兩組合，從而獲得多個用戶-物品對。
在步驟S22，獲取多個已有評分，所述多個已有評分對應於所述多個樣本對中的部分樣本對。這裡，已有評分可以是用戶的直接評分，例如，在豆瓣電影中，用戶會以1到5的分值對每個電影進行評分。在另一個實例中，透過用戶的操作間接獲取所述已有評分。例如，在淘寶中，可基於用戶對物品的點擊、購買等操作，計算出用戶對物品的評分。在推薦系統中，通常只有部分用戶對部分物品的評分，例如，在豆瓣電影中，有的用戶只是瀏覽，不對電影進行打分，或者，有的電影過於生僻，沒有用戶對其進行打分。因此，只有部分樣本對具有對應的用戶對物品的已有評分。
在步驟S23，獲取分別與各個樣本對對應的多組上下文特徵，其中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵。不同的推薦場景存在不同的特徵類型，例如，在豆瓣電影中，與用戶-物品對對應的上下文特徵通常可分為以下幾類特徵：用戶靜態特徵，例如用戶的年齡特徵，青少年、中年和老年，用戶的性別特徵等等；物品靜態特徵，如電影類別，愛情、動作、恐怖，等等；用戶評分統計特徵，如用戶評分的平均分，方差等；物品評分統計特徵，如電影的平均評分，方差等；交互特徵，如評分時間是否節假日，早上、中午、晚上等。可從用戶資料、物品屬性及用戶-物品交互資訊獲取所述上下文特徵。
圖3示意顯示與用戶-物品對應的多組上下文特徵。圖中u₁ 、u₂ 、u₃ 和u₄ 為用戶標識，v₁ 、v₂ 、v₃ 和v₄ 為物品標識，u_i 與v_j 相交的方格表示一個用戶-物品對，方格中的數字3、4、5等為對應的用戶對物品的評分。在每個用戶-物品對方格的後方，都包括一列方塊，其示意表示對應於該用戶-物品對的上下文特徵組。該上下文特徵組包括與該用戶-物品對中包括的用戶、物品及其交互相關的至少一個特徵。
在步驟S24，基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類，其中，每個子類包括取自於所述多個樣本對中的多個第一樣本對，每個所述第一樣本對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識。
可以將上下文特徵組以特徵向量的形式表示，該特徵向量的維度為一組上下文特徵中包括的特徵數，並且，該特徵向量中的每個分量表示在對應的特徵維度中的特徵值。例如，一組上下文特徵可能包括：年齡，中年；電影類型，愛情。透過將年齡維度中的取值量化為：1(青少年)、2(中年)、3(老年)，將電影類型維度中的取值量化為：1(愛情)、2(動作)、3(恐怖)，從而獲得對應於該組上下文特徵的特徵向量：(2，1)，其中，第一分量表示年齡特徵維度，第二個分量表示電影類型特徵維度。從而可在由各個特徵維度構成的特徵空間中以向量點定位與所述上下文特徵組對應的特徵向量。不同用戶-物品對對應的特徵向量可能是相等的，即在維度空間中重合在一點上，即，該點對應於多個用戶-物品對。
透過以上述方式將上下文特徵組表示為特徵空間中的向量點之後，可透過各種聚類演算法對這些向量點進行聚類，例如K-means演算法、gmm(高斯混合模型)演算法、BIRCH演算法、OPTICS演算法等等。
下面將以K-means為例說明根據本說明書實施例的聚類過程。圖4顯示根據本說明書實施例的透過K-means演算法進行聚類的流程圖。在步驟S41，在所述多個特徵向量點中隨機選擇預定數目的初始質心。該預定數目即K-means演算法中需預先確定的k。在本說明書實施例中，可透過預估的子類數確定k，例如，針對豆瓣電影，預估的子類可包括：(青少年，愛情)、(青少年、動作)、(青少年、恐怖)、(中年、愛情)、(中年、動作)、(中年、恐怖)、(老年、愛情)、(老年、動作)、(老年、恐怖)，因此，可將k設定為9。亦即，k的值與特徵數及其組合相關。在確定好k之後，在選擇初始質心時，較佳選擇分散的k個初始質心。
在步驟S42，基於各個特徵向量點，計算每個非質心點到各個質心點的距離。所述距離可以採用各種計算形式，例如，其可以為歐式距離、明氏(Minkowsky)距離、馬氏(Manhattan)距離等。在步驟S43，根據所述距離，將每個非質心點對歸類到距離最近的質心，從而獲得k個叢集。
在步驟S44，根據所述預定數目的質心點及其對應的非質心點，計算相同數目的新的質心，使得全部點到自己所屬的叢集中心的距離之和最小，即，如公式(1)所示，新的質心為叢集中的全部向量點的平均向量。

(1)
在步驟S45，判斷所述新的質心是否滿足預定條件，例如，預定條件為，新的質心相對於原有的質心未發生變化。
在不滿足所述預定條件的情況中，流程回到步驟S42，以重複步驟S42-S45，在滿足所述預定條件的情況中，流程進到步驟S46。在步驟S46，輸出聚類結果，所述聚類結果包括多個叢集及每個叢集中包括的點，所述點對應於特徵向量，即，對應於用戶-物品對。從而基於上下文特徵，將多個用戶-物品對聚類到多個子類中。其中，每個子類包括取自於所述多個用戶-物品對中的多個第一用戶-物品對，每個所述第一用戶-物品對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識。
再參考圖2，在步驟S25，關於每個子類，基於多個所述第一用戶標識和多個所述第一物品標識、和多個所述第一用戶相對於多個所述第一物品的多個已有評分，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分。
這裡的協同過濾演算法可採用各種演算法，例如knn演算法或矩陣分解演算法。下面以矩陣分解演算法為例說明根據本說明書實施例的預測評分的過程。圖5顯示根據本說明書實施例的透過協同過濾演算法預測評分的方法流程圖。
如圖5所示，首先在步驟S51，對於每個子類，基於所述多個第一用戶標識、所述多個第一物品標識及所述多個第一用戶相對於所述多個第一物品的所述多個已有評分，獲取用戶-物品評分矩陣。圖6示意顯示矩陣分解的過程。圖6中的左側的矩陣示意顯示一個用戶-物品評分矩陣，其中u₁ 、u₂ 、u₃ 和u₄ 為用戶標識，v₁ 、v₂ 、v₃ 、 v₄ 和v₅ 為物品標識，u_i 與v_j 相交的方格中的數字表示u_i 對v_j 的評分，其中的“？”表示u_i 對v_j 未評分。
在步驟S52，將所述用戶-物品評分矩陣分解為兩個低維矩陣，使得所述兩個低維矩陣的乘積最接近所述用戶-物品評分矩陣。設用戶-評分矩陣為R，可將其分解為用戶矩陣的轉置矩陣U^T 和物品矩陣V，即R=U^T V。使得所述兩個低維矩陣的乘積最接近所述用戶-物品評分矩陣，也就是使得所述兩個低維矩陣的乘積與所述用戶-物品評分矩陣的差最小。因此，目標函數可設為以下公式(2)：
(2)
可透過例如梯度下降演算法迭代計算U和V，從而獲得使得所述目標函數最小的兩個低維矩陣U和V。例如，如圖6所示，圖6中間相乘的兩個矩陣即為透過例如梯度下降演算法獲得的兩個低維矩陣U^T 和V。
在步驟S53，根據將兩個低維矩陣相乘獲得的矩陣，預測所述用戶-物品評分矩陣中各個用戶對其未評分的物品的評分。例如，如圖6所示，透過將U^T 與V相乘，獲得圖6右側所示的預測矩陣。對比圖6中的的評分矩陣與預測矩陣，可見，預測矩陣中的灰色方格中的評分等於(或盡可能接近)評分矩陣中的已有評分，而預測矩陣中的白色方格中的評分即為透過矩陣分解演算法預測的評分。
圖7顯示根據本說明書實施例的一種物品推薦方法的流程圖。所述方法包括：在步驟S71，獲取多個樣本對，所述樣本對包括用戶標識和物品標識，其中，所述用戶標識為待推薦用戶的用戶標識，所述物品標識為對應於多個待推薦物品的多個物品標識中的任一個物品標識；在步驟S72，在透過上述預測評分的方法獲取的多個子類中，確定各個樣本對所在的子類；在步驟S73，從透過上述預測評分的方法預測的評分中，獲取每個所述樣本對在其所屬子類中對應的預測評分；在步驟S74，根據所述預測評分，對所述各個樣本對中包括的物品標識進行排序；以及，在步驟S75，根據所述排序，對所述用戶推薦物品。
首先，在步驟S71，獲取多個樣本對，所述樣本對包括用戶標識和物品標識，其中，所述用戶標識為待推薦用戶的用戶標識，所述物品標識為對應於多個待推薦物品的多個物品標識中的任一個物品標識。例如，當用戶u₁ 在豆瓣電影中打開關於電影v₁ 的頁面之後，或者當用戶u₁ 在淘寶中打開商品v₁ 的購買頁面之後，在諸如此類的場景中，系統會啟動物品推薦流程。此時，系統根據用戶標識u₁ 和用戶操作的物品的物品標識v₁ 召回向用戶u₁ 推薦的物品候選集。這裡的召回是根據預定條件對推薦物品的粗篩，例如根據用戶的初始喜好產生候選集、根據物品的屬性(例如，當物品為推薦飯店時，該屬性例如為地理位置)產生候選集等。將用戶標識u₁ 分別與候選集中的每個物品的物品標識v_i 相組合，從而可獲得多個樣本對。
在步驟S72，在透過上述預測評分的方法獲取的多個子類中，確定各個樣本對所在的子類。根據上述預測評分方法，可以明確，一個樣本對對應於一個特徵向量，即對應於向量空間中的一個點。因此，一個樣本對只可能被歸類到一個子類中。從而，透過樣本對中的用戶標識和物品標識，可以在上述獲得的多個子類中搜索出該樣本對，從而確定該樣本對所在的子類。類似地，可以獲得這裡的各個樣本對所在的子類。
在步驟S73，從透過上述預測評分的方法預測的評分中，獲取每個所述樣本對在其所屬子類中對應的預測評分。如上述參考圖5中所述，在每個子類中，透過協同過濾演算法預測子類中的各個用戶對其未評分的子類中的物品的評分。從而，在確定樣本對所在的子類之後，可從與該子類關聯的全部預測評分中獲取與該樣本對對應的預測評分。
在步驟S74，根據所述預測評分，對所述各個樣本對中包括的物品標識進行排序。預測評分越高，表示用戶對該物品的預估喜好程度越大。從而，可將預測評分高的物品排在靠前的位置。
在步驟S75，根據所述排序，對所述用戶推薦物品。根據所述排序，可以以多種方式向用戶推薦物品。例如，可僅向用戶推薦排序靠前的物品，可向用戶優先推薦排序靠前的物品，可以根據排序，順序(時間順序或空間順序)向用戶推薦物品，等等。
圖8顯示根據本說明書實施例的一種預測用戶對物品的評分的裝置800，包括：樣本對獲取單元81，配置為，獲取多個樣本對，所述樣本對包括選自於多個用戶標識的任一個用戶標識和選自於多個物品標識的任一個物品標識；評分獲取單元82，配置為，獲取多個已有評分，所述多個已有評分對應於所述多個樣本對中的部分樣本對；上下文特徵獲取單元83，配置為，獲取分別與各個樣本對對應的多組上下文特徵，其中，一組上下文特徵包括以下至少一類特徵：用戶特徵、物品特徵、以及交互特徵；聚類單元84，配置為，基於所述多組上下文特徵，將所述多個樣本對聚類為多個子類，其中每個子類包括取自於所述多個樣本對中的多個第一樣本對，每個所述第一樣本對包括第一用戶標識和第一物品標識，其中，所述第一用戶標識為第一用戶的標識，所述第一物品標識為第一物品的標識；以及評分預測單元85，配置為，關於每個子類，基於多個所述第一用戶標識和多個所述第一物品標識、和多個所述第一用戶相對於多個所述第一物品的多個已有評分，透過協同過濾演算法預測各個第一用戶對其未評分的第一物品的評分。
在一個實施例中，在上述預測用戶對物品的評分的裝置800中，所述聚類單元84包括：選擇單元841，配置為，在所述多個樣本對中隨機選擇預定數目的初始質心；第一計算單元842，配置為，基於所述上下文特徵，計算每個非質心的樣本對到各個質心的距離；歸類單元843，配置為，根據所述距離，將每個非質心的樣本對歸類到距離最近的質心；第二計算單元844，配置為，根據所述預定數目的質心及其對應的非質心樣本對，計算相同數目的新的質心；判斷單元845，配置為，判斷所述新的質心是否滿足預定條件；以及輸出單元846，配置為，在滿足所述預定條件的情況中，輸出對所述多個樣本對的聚類結果。
在一個實施例中，在上述預測用戶對物品的評分的裝置中，所述評分預測單元85包括：獲取單元851，配置為，對於每個子類，基於所述多個第一用戶標識、所述多個第一物品標識及所述多個第一用戶相對於所述多個第一物品的所述多個已有評分，獲取用戶-物品評分矩陣；分解單元852，配置為，將所述用戶-物品評分矩陣分解為兩個低維矩陣，使得所述兩個低維矩陣的乘積最接近所述用戶-物品評分矩陣；以及預測單元853，配置為，根據將兩個低維矩陣相乘獲得的矩陣，預測所述用戶-物品評分矩陣中各個第一用戶對其未評分的第一物品的評分。
圖9顯示根據本說明書實施例的一種物品推薦裝置900，包括：樣本對獲取單元91，配置為，獲取多個第二樣本對，所述第二樣本對包括第二用戶標識和第二物品標識，其中，所述第二用戶標識為待推薦用戶的用戶標識，所述第二物品標識為對應於多個待推薦物品的多個物品標識中的任一個物品標識；確定單元92，配置為，在透過上述預測評分的方法獲取的多個子類中，確定各個所述第二樣本對所在的子類；預測評分獲取單元93，配置為，從透過所述預測評分的方法預測的評分中，獲取每個所述第二樣本對在其所屬子類中對應的預測評分；排序單元94，配置為，根據所述預測評分，對所述各個第二樣本對中包括的第二物品標識進行排序；以及推薦單元95，配置為，根據所述排序，對所述第二用戶推薦所述第二物品。
在根據本說明書實施例的物品推薦方法中，透過使用用戶-物品的上下文特徵對用戶-物品對進行聚類，使得每個子類的評分噪音更小，相關性更高，因此，在每個子類中使用協同過濾方法，可以獲得更好的推薦性能。
本發明所屬技術領域中具有通常知識者應該還可以進一步意識到，結合本文中所揭示的實施例描述的各示例的單元及演算法步驟，能夠以電子硬體、計算機軟體或者二者的結合來實現，為了清楚地說明硬體和軟體的可互換性，在上述說明中已經按照功能一般性地描述了各示例的組成及步驟。這些功能究竟以硬體還是軟體方式來執軌道，取決於技術方案的特定應用和設計約束條件。本發明所屬技術領域中具有通常知識者可以對每個特定的應用來使用不同方法來實現所描述的功能，但是這種實現不應認為超出本發明的範圍。
結合本文中所揭示的實施例描述的方法或演算法的步驟可以用硬體、處理器執軌道的軟體模組，或者二者的結合來實施。軟體模組可以置於隨機記憶體(RAM)、記憶體、唯讀記憶體(ROM)、電可編程ROM、電可抹除可編程ROM、暫存器、硬碟、可卸除磁碟、CD-ROM、或技術領域內所公知的任意其它形式的儲存媒體中。
以上所述的具體實施方式，對本發明的目的、技術方案和有益效果進行了進一步詳細說明，所應理解的是，以上所述僅為本發明的具體實施方式而已，並不用來限定本發明的保護範圍，凡在本發明的精神和原則之內，所做的任何修改、等同替換、改進等，均應包含在本發明的保護範圍之內。The embodiments of this specification will be described below with reference to the drawings.
FIG. 1 shows a schematic diagram of a system 100 according to an embodiment of the present specification. As shown in FIG. 1, the system 100 includes a clustering module 11, a prediction scoring module 12, and a recommendation module 13. First, a plurality of user-item pairs and their corresponding sets of context features are input to the clustering module 11. The clustering module 11 obtains clusters of user-item pairs by clustering multiple feature vectors composed of each set of contextual features, that is, each user-item pair is clustered into a corresponding subclass in. Then, the clustering module 11 sends a plurality of subclasses obtained through clustering to the prediction scoring module 12. At the same time, the existing scores of the users included in each sub-category are sent to the predictive scoring module 12. The prediction scoring module 12 uses the existing scoring in each sub-category to predict the user's missing scoring of the item in the sub-category through a collaborative filtering algorithm. When recommending the user through the recommendation module 13, the recommendation module 13 determines the sub-category of the user-item to-be-recommended item pair through the user identification and the identification of the item to be recommended, and obtains the sub-category from the prediction scoring module 12. The predicted scores of the user-to-be-recommended item pair, and the items are recommended to the user according to the ranking of the predicted scores of multiple to-be-recommended items.
FIG. 2 schematically shows a flowchart of a method for predicting a user's rating of an item according to an embodiment of the present specification, including: in step S21, obtaining a plurality of sample pairs, the sample pair including any one selected from a plurality of user identifiers; A user identifier and any item identifier selected from a plurality of item identifiers; in step S22, obtaining a plurality of existing scores, the plurality of existing scores corresponding to a part of the plurality of sample pairs; in step S23: Obtain multiple sets of context features corresponding to each sample pair, where a set of context features includes at least one of the following types of features: user features, item features, and interaction features. In step S24, based on the multiple sets of context features, The plurality of sample pairs are clustered into a plurality of subclasses, wherein each subclass includes a plurality of first sample pairs taken from the plurality of sample pairs, and each of the first sample pairs includes a first A user identification and a first item identification, wherein the first user identification is an identification of the first user and the first item identification is an identification of the first item; and in step S25, regarding each Sub-category, based on a plurality of first user identifications and a plurality of first item identifications, and a plurality of existing ratings of the plurality of first users with respect to a plurality of first items, through a collaborative filtering algorithm The method predicts the rating of each unrated first item by each first user.
First, in step S21, a plurality of sample pairs are obtained, and the sample pairs include any user ID selected from a plurality of user IDs and any item ID selected from a plurality of item IDs. The sample pair is a user-item pair, which can be expressed as (user ID, item ID). The users may be all users in the recommendation system, for example, all users included in the Douban Movie APP, all users included in Taobao, and the like. Of course, the multiple users do not have to be all users in the recommendation system, for example, they may also be part of the users of the system involved in one unit in the recommendation system. The items may be all items included in the recommendation system, for example, movies in Douban movies, products in Taobao, and the like. Similarly, the plurality of items need not be all the items in the system, and they may also be part of the items in a certain range in the system. Multiple user-item pairs are obtained by combining each of the multiple users with each of the multiple items.
In step S22, a plurality of existing scores are obtained, and the plurality of existing scores correspond to a part of the sample pairs in the plurality of sample pairs. Here, the existing rating may be a direct rating of the user. For example, in Douban movies, the user will rate each movie with a score of 1 to 5. In another example, the existing score is obtained indirectly through a user operation. For example, in Taobao, the user's rating of an item can be calculated based on the user's click on or purchase of the item. In a recommendation system, usually only some users rate some items. For example, in Douban movies, some users just browse and do not rate movies, or some movies are too infrequent and no users rate them. Therefore, only some sample pairs have a corresponding user's existing score on the item.
In step S23, multiple sets of context features corresponding to each sample pair are obtained, where a set of context features includes at least one of the following features: user features, item features, and interaction features. Different recommendation scenarios have different feature types. For example, in Douban movies, the context features corresponding to user-item pairs can generally be divided into the following types of features: user static features, such as the user ’s age characteristics, teens, middle-aged, and Age, gender characteristics of users, etc .; static characteristics of items, such as movie category, love, action, horror, etc .; statistical characteristics of user ratings, such as average score and variance of user ratings; statistical characteristics of item ratings, such as movie average Ratings, variances, etc .; interactive features, such as whether the time of rating is a holiday, morning, noon, evening, etc. The contextual features can be obtained from user data, item attributes, and user-item interaction information.
FIG. 3 schematically shows a plurality of sets of context features corresponding to user-items. In the figure, u ₁ , u ₂ , u _3, and u ₄ are user IDs, v ₁ , v ₂ , v _3, and v ₄ are item IDs. The square where u _i and v _j intersect represents a user-item pair. The numbers 3, 4, 5 and so on are the corresponding user's rating of the item. Behind each user-item checkbox, a column is included, which schematically represents the context feature group corresponding to the user-item pair. The context feature set includes at least one feature related to the users, items, and interactions included in the user-item pair.
In step S24, the plurality of sample pairs are clustered into a plurality of subclasses based on the plurality of sets of context features, wherein each subclass includes a plurality of first sample pairs taken from the plurality of sample pairs. Each of the first sample pairs includes a first user ID and a first item ID, wherein the first user ID is an ID of the first user and the first item ID is an ID of the first item.
The context feature group may be represented in the form of a feature vector, the dimension of the feature vector is the number of features included in a set of context features, and each component in the feature vector represents a feature value in a corresponding feature dimension. For example, a set of contextual characteristics may include: age, middle age; movie type, love. By quantifying the values in the age dimension as: 1 (teen), 2 (middle-aged), and 3 (elderly), the values in the movie genre dimension are quantified as: 1 (love), 2 (action), 3 ( Horror) to obtain a feature vector corresponding to the set of contextual features: (2, 1), where the first component represents the dimension of the age feature and the second component represents the dimension of the movie type feature. Therefore, a feature vector corresponding to the context feature group can be located with a vector point in a feature space composed of each feature dimension. The feature vectors corresponding to different user-item pairs may be equal, that is, they coincide at a point in the dimensional space, that is, the point corresponds to multiple user-item pairs.
After the context feature group is represented as vector points in the feature space in the above manner, these vector points can be clustered through various clustering algorithms, such as K-means algorithm, gmm (Gaussian mixture model) algorithm, BIRCH Algorithms, OPTICS algorithms, and more.
In the following, K-means will be taken as an example to explain the clustering process according to the embodiment of the present specification. FIG. 4 shows a flowchart of clustering through a K-means algorithm according to an embodiment of the present specification. In step S41, a predetermined number of initial centroids are randomly selected among the plurality of feature vector points. The predetermined number is a predetermined k in the K-means algorithm. In the embodiment of the present specification, k may be determined through the number of estimated sub-categories. For example, for Douban movies, the estimated sub-categories may include: (teen, love), (teen, action), (teen, horror), (Middle age, love), (Middle age, action), (Middle age, horror), (Old age, love), (Old age, Action), (Old age, horror), so k can be set to 9. That is, the value of k is related to the number of features and their combinations. After determining k, when selecting the initial centroids, it is better to select the scattered k initial centroids.
In step S42, the distance from each non-centroid point to each centroid point is calculated based on each feature vector point. The distance may take various calculation forms, for example, it may be a European distance, a Minkowsky distance, a Manhattan distance, or the like. In step S43, according to the distance, each non-centroid point pair is classified into the closest centroid, thereby obtaining k clusters.
In step S44, the same number of new centroids are calculated according to the predetermined number of centroid points and their corresponding non-centroid points, so that the sum of the distances of all points to the center of the cluster to which they belong is the smallest, that is, as in the formula (1), the new center of mass Is the average vector of all vector points in the cluster.

(1)
In step S45, it is determined whether the new centroid satisfies a predetermined condition. For example, the predetermined condition is that the new centroid has not changed from the original centroid.
In the case where the predetermined condition is not satisfied, the flow returns to step S42 to repeat steps S42-S45. In the case where the predetermined condition is satisfied, the flow proceeds to step S46. In step S46, a clustering result is output, the clustering result includes a plurality of clusters and points included in each cluster, and the points correspond to feature vectors, that is, to user-item pairs. Based on the context features, multiple user-item pairs are clustered into multiple subclasses. Wherein, each subclass includes a plurality of first user-item pairs obtained from the plurality of user-item pairs, and each of the first user-item pairs includes a first user identifier and a first article identifier, where , The first user identifier is an identifier of the first user, and the first item identifier is an identifier of the first item.
Referring again to FIG. 2, in step S25, regarding each sub-category, based on a plurality of the first user ID and a plurality of the first item ID, and a plurality of the first user with respect to the plurality of the first item A plurality of existing ratings of each of the first items are predicted by each first user through a collaborative filtering algorithm.
The collaborative filtering algorithm here can use various algorithms, such as knn algorithm or matrix factorization algorithm. In the following, a matrix decomposition algorithm is used as an example to explain the process of predicting a score according to an embodiment of the present specification. FIG. 5 shows a flowchart of a method for predicting a score through a collaborative filtering algorithm according to an embodiment of the present specification.
As shown in FIG. 5, first in step S51, for each sub-category, based on the plurality of first user identifiers, the plurality of first article identifiers, and the plurality of first users with respect to the plurality of first users, The user-item scoring matrix is obtained from the multiple existing scoring items. Figure 6 schematically shows the process of matrix decomposition. The matrix on the left in FIG. 6 schematically shows a user-item scoring matrix, where u ₁ , u ₂ , u _3, and u ₄ are user identifications, and v ₁ , v ₂ , v ₃ , v _4, and v ₅ are item identifications. The number in the box where u _i intersects v _j represents the score of u _j on v _j , where "?" indicates that u _i has not scored on v _j .
In step S52, the user-item scoring matrix is decomposed into two low-dimensional matrices, so that a product of the two low-dimensional matrices is closest to the user-item scoring matrix. Let the user-score matrix be R, which can be decomposed into the transpose matrix U ^T of the user matrix and the item matrix V, that is, R = U ^T V. Make the product of the two low-dimensional matrices closest to the user-item scoring matrix, that is, minimize the difference between the product of the two low-dimensional matrices and the user-item scoring matrix. Therefore, the objective function can be set as the following formula (2):
(2)
U and V can be calculated iteratively by, for example, a gradient descent algorithm to obtain two low-dimensional matrices U and V that minimize the objective function. For example, as shown in FIG. 6, the two matrices multiplied in the middle of FIG. 6 are two low-dimensional matrices U ^T and V obtained through, for example, a gradient descent algorithm.
In step S53, according to a matrix obtained by multiplying two low-dimensional matrices, each user in the user-item scoring matrix is predicted to score the ungraded item. For example, as shown in FIG. 6, by multiplying U ^T and V, a prediction matrix shown on the right side of FIG. 6 is obtained. Comparing the scoring matrix in Figure 6 with the prediction matrix, it can be seen that the score in the gray box in the prediction matrix is equal to (or as close to) the existing score in the scoring matrix, and the score in the white box in the prediction matrix This is the score predicted by the matrix factorization algorithm.
FIG. 7 shows a flowchart of an article recommendation method according to an embodiment of the present specification. The method includes: in step S71, obtaining a plurality of sample pairs, the sample pairs including a user identifier and an item identifier, wherein the user identifier is a user identifier of a user to be recommended, and the item identifier is corresponding to a plurality of Any one of the multiple item identifiers of the recommended item; in step S72, among the multiple subclasses obtained through the above-mentioned prediction scoring method, the subclass in which each sample pair is located; in step S73, from the above-mentioned prediction score In the score predicted by the method, obtain the prediction score corresponding to each of the sample pairs in the subclass to which it belongs; in step S74, sort the item identifiers included in the respective sample pairs according to the prediction score; and In step S75, an item is recommended to the user according to the ranking.
First, in step S71, a plurality of sample pairs are obtained, and the sample pairs include a user identifier and an item identifier, wherein the user identifier is a user identifier of a user to be recommended, and the item identifier is a corresponding to a plurality of items to be recommended. Any one of a plurality of article identifiers. For example, when the user u ₁ Open the page on the film v ₁ in the watercress movie, or when the user u ₁ purchase page opens goods v Taobao in _1, in the sort of scenario, the system will start items recommended procedure. At this time, the system recalls the candidate set of items recommended to the user u ₁ according to the user ID u ₁ and the item ID v ₁ of the items operated by the user. The recall here is a rough screening of recommended items according to predetermined conditions, such as generating a candidate set according to the user's initial preferences, and generating a candidate set based on the attributes of the items (for example, when the item is a recommended restaurant, the attribute is, for example, geographical location). The user identification u _{1 is} respectively combined with the item identification v _i of each item in the candidate set, thereby obtaining multiple sample pairs.
In step S72, among the plurality of subclasses obtained through the above-mentioned prediction scoring method, a subclass in which each sample pair is located is determined. According to the above prediction scoring method, it is clear that a sample pair corresponds to a feature vector, that is, a point in a vector space. Therefore, a sample pair can only be classified into one subclass. Therefore, through the user identifier and the article identifier in the sample pair, the sample pair can be searched out of the multiple subclasses obtained above, thereby determining the subclass in which the sample pair is located. Similarly, you can get the subclass of each sample pair here.
In step S73, a prediction score corresponding to each of the sample pairs in the subclass to which the sample pair belongs is obtained from the score predicted by the above-mentioned prediction score method. As described above with reference to FIG. 5, in each sub-category, each user in the sub-category is predicted through a collaborative filtering algorithm to score the user in the un-rated sub-category. Therefore, after determining the subclass in which the sample pair is located, the prediction score corresponding to the sample pair may be obtained from all the prediction scores associated with the subclass.
In step S74, the item identifiers included in the respective sample pairs are sorted according to the prediction score. The higher the predicted score, the greater the degree of user preference for the item. Therefore, items with a high prediction score can be ranked higher.
In step S75, an item is recommended to the user according to the ranking. According to the ranking, items can be recommended to the user in a variety of ways. For example, only the top-ranked items may be recommended to the user, the top-ranked items may be recommended to the user first, and the items may be recommended to the user according to the sort, order (chronological or spatial order), and so on.
FIG. 8 shows a device 800 for predicting a user's rating of an article according to an embodiment of the present specification, including: a sample pair obtaining unit 81 configured to obtain a plurality of sample pairs, the sample pairs including Any user identification and any item identification selected from a plurality of item identifications; the score obtaining unit 82 is configured to obtain a plurality of existing ratings, the plurality of existing ratings corresponding to the plurality of sample pairs Partial sample pairs; a context feature acquisition unit 83 configured to obtain multiple sets of context features corresponding to each sample pair, wherein a set of context features includes at least one of the following features: user features, item features, and interaction features; clustering Unit 84 is configured to cluster the plurality of sample pairs into a plurality of subclasses based on the plurality of sets of context features, where each subclass includes a plurality of first samples taken from the plurality of sample pairs Yes, each of the first sample pairs includes a first user identifier and a first article identifier, wherein the first user identifier is an identifier of the first user and the first object identifier The identification is the identification of the first item; and the score prediction unit 85 is configured to, with respect to each sub-category, based on a plurality of the first user identification and a plurality of the first item identification, and a plurality of the first user Based on a plurality of existing ratings of a plurality of the first items, a collaborative filtering algorithm is used to predict each first user's rating of the unrated first item.
In one embodiment, in the above apparatus 800 for predicting a user's rating of an item, the clustering unit 84 includes a selection unit 841 configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs. A first calculation unit 842 configured to calculate a distance from each non-centroid sample pair to each centroid based on the context feature; a classification unit 843 configured to, based on the distance, each non-centroid The sample pairs of the heart are classified to the nearest centroid; the second calculation unit 844 is configured to calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs; A unit 845 is configured to determine whether the new centroid satisfies a predetermined condition; and an output unit 846 is configured to output a clustering result of the plurality of sample pairs if the predetermined condition is satisfied.
In one embodiment, in the above apparatus for predicting a user's rating of an item, the rating prediction unit 85 includes: an obtaining unit 851 configured to, for each sub-category, based on the plurality of first user identifiers, the A plurality of first item identifiers and the plurality of existing ratings of the plurality of first users with respect to the plurality of first items to obtain a user-item rating matrix; a decomposition unit 852 is configured to convert the users -The item scoring matrix is decomposed into two low-dimensional matrices such that the product of the two low-dimensional matrices is closest to the user-item scoring matrix; and a prediction unit 853 is configured to obtain the multi-dimensional matrix by Matrix to predict the first user ’s unrated first item rating in the user-item rating matrix.
FIG. 9 shows an article recommendation device 900 according to an embodiment of the present specification, including: a sample pair obtaining unit 91 configured to obtain a plurality of second sample pairs, where the second sample pair includes a second user ID and a second item ID Wherein the second user identifier is a user identifier of a user to be recommended, and the second item identifier is any one of a plurality of item identifiers corresponding to a plurality of items to be recommended; the determining unit 92 is configured to: Among the plurality of subclasses obtained by the above-mentioned prediction scoring method, a subclass in which each of the second sample pairs is located is determined; the prediction score acquisition unit 93 is configured to obtain from the scores predicted by the prediction scoring method. Each of the second sample pairs has a corresponding prediction score in the subclass to which it belongs; a sorting unit 94 is configured to sort the second item identifiers included in the respective second sample pairs according to the prediction score; And the recommendation unit 95 is configured to recommend the second item to the second user according to the ranking.
In the article recommendation method according to the embodiment of the present specification, the user-item pair is clustered by using the context feature of the user-item, so that the score noise of each sub-class is smaller and the correlation is higher. In the use of collaborative filtering method, better recommendation performance can be obtained.
Those with ordinary knowledge in the technical field to which the present invention pertains should further realize that the units and algorithm steps of the examples described in connection with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. Implementation, in order to clearly illustrate the interchangeability of hardware and software, the composition and steps of each example have been described generally in terms of functions in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those with ordinary knowledge in the technical field to which the present invention belongs may use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
The steps of the method or algorithm described in connection with the embodiments disclosed herein can be implemented by hardware, a software module that executes a track, or a combination of the two. The software module can be placed in random memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, scratchpad, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.
The specific embodiments described above further describe the objectives, technical solutions, and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the present invention. The scope of protection, any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention shall be included in the scope of protection of the present invention.

10‧‧‧風控客戶端10‧‧‧ Risk Control Client

11‧‧‧聚類模組 11‧‧‧Clustering Module

12‧‧‧預測評分模組 12‧‧‧ Predictive Scoring Module

13‧‧‧推薦模組 13‧‧‧Recommended Module

S21‧‧‧方法步驟 S21‧‧‧Method steps

S22‧‧‧方法步驟 S22‧‧‧Method steps

S23‧‧‧方法步驟 S23‧‧‧Method steps

S24‧‧‧方法步驟 S24‧‧‧Method steps

S25‧‧‧方法步驟 S25‧‧‧Method steps

S41‧‧‧方法步驟 S41‧‧‧Method steps

S42‧‧‧方法步驟 S42‧‧‧Method steps

S43‧‧‧方法步驟 S43‧‧‧Method steps

S44‧‧‧方法步驟 S44‧‧‧Method steps

S45‧‧‧方法步驟 S45‧‧‧Method steps

S46‧‧‧方法步驟 S46‧‧‧Method steps

S51‧‧‧方法步驟 S51‧‧‧Method steps

S52‧‧‧方法步驟 S52‧‧‧Method steps

S53‧‧‧方法步驟 S53‧‧‧Method steps

S71‧‧‧方法步驟 S71‧‧‧Method steps

S72‧‧‧方法步驟 S72‧‧‧Method steps

S73‧‧‧方法步驟 S73‧‧‧Method steps

S74‧‧‧方法步驟 S74‧‧‧Method steps

S75‧‧‧方法步驟 S75‧‧‧Method steps

81‧‧‧樣本對獲取單元 81‧‧‧Sample pair acquisition unit

82‧‧‧評分獲取單元 82‧‧‧Score acquisition unit

83‧‧‧上下文特徵獲取單元 83‧‧‧Context feature acquisition unit

84‧‧‧聚類單元 84‧‧‧clustering unit

85‧‧‧評分預測單元 85‧‧‧Scoring prediction unit

91‧‧‧樣本對獲取單元 91‧‧‧Sample pair acquisition unit

92‧‧‧確定單元 92‧‧‧ confirm unit

93‧‧‧預測評分獲取單元 93‧‧‧ prediction score acquisition unit

94‧‧‧排序單元 94‧‧‧Sort by

95‧‧‧推薦單元 95‧‧‧ Recommended Unit

100‧‧‧系統 100‧‧‧ system

800‧‧‧預測用戶對物品的評分的裝置 800‧‧‧ A device that predicts user ratings

841‧‧‧選擇單元 841‧‧‧select unit

842‧‧‧第一計算單元 842‧‧‧first computing unit

843‧‧‧歸類單元 843‧‧‧Classification Unit

844‧‧‧第二計算單元 844‧‧‧Second Computing Unit

845‧‧‧判斷單元 845‧‧‧Judgment unit

846‧‧‧輸出單元 846‧‧‧output unit

851‧‧‧獲取單元 851‧‧‧ Acquisition Unit

852‧‧‧分解單元 852‧‧‧ decomposition unit

853‧‧‧預測單元 853‧‧‧ prediction unit

900‧‧‧物品推薦裝置 900‧‧‧ Item recommendation device

透過結合圖式描述本說明書實施例，可以使得本說明書實施例更加清楚：By describing the embodiments of this specification in combination with the drawings, the embodiments of this specification can be made clearer:

圖1顯示根據本說明書實施例的系統100的示意圖； FIG. 1 shows a schematic diagram of a system 100 according to an embodiment of the present specification;

圖2示意顯示根據本說明書實施例的一種預測用戶對物品的評分的方法的流程圖； 2 schematically shows a flowchart of a method for predicting a user's rating of an item according to an embodiment of the present specification;

圖3示意顯示與用戶-物品對應的多組上下文特徵； FIG. 3 schematically shows a plurality of sets of context features corresponding to a user-item;

圖4顯示根據本說明書實施例的透過K-means演算法進行聚類的流程圖； FIG. 4 shows a flowchart of clustering through a K-means algorithm according to an embodiment of the present specification;

圖5顯示根據本說明書實施例的透過協同過濾演算法預測評分的方法流程圖； 5 shows a flowchart of a method for predicting a score through a collaborative filtering algorithm according to an embodiment of the present specification;

圖6示意顯示矩陣分解的過程； Figure 6 schematically shows the process of matrix factorization;

圖7顯示根據本說明書實施例的一種物品推薦方法的流程圖； 7 shows a flowchart of an article recommendation method according to an embodiment of the present specification;

圖8顯示根據本說明書實施例的一種預測用戶對物品的評分的裝置800； FIG. 8 shows a device 800 for predicting a user's rating of an item according to an embodiment of the present specification;

圖9示出根據本說明書實施例的一種物品推薦裝置900。 FIG. 9 illustrates an article recommendation device 900 according to an embodiment of the present specification.

Claims

A method for predicting a user's rating of an item includes: Obtaining a plurality of sample pairs, the sample pair including any user ID selected from a plurality of user IDs and any item ID selected from a plurality of item IDs; Obtaining multiple existing scores, the multiple existing scores corresponding to some sample pairs of the multiple sample pairs; Obtaining multiple sets of context features corresponding to each sample pair, wherein a set of context features includes at least one of the following features: user features, item features, and interaction features; Based on the multiple sets of context features, the multiple sample pairs are clustered into multiple subclasses, where each subclass includes multiple first sample pairs taken from the multiple sample pairs, each of the first samples The pair includes a first user ID and a first item ID, wherein the first user ID is an ID of the first user and the first item ID is an ID of the first item; and With regard to each sub-category, based on multiple first user identifications, multiple first item identifications, and multiple existing ratings of the first user relative to multiple first items, predicted by a collaborative filtering algorithm Each first user rates its first unrated item.

The method for predicting a user's rating of an item according to item 1 of the scope of patent application, wherein the user characteristics include user attribute characteristics and / or user rating statistical characteristics, and the item characteristics include item attribute characteristics and / or item rating statistical characteristics .

The method for predicting a user's rating of an item according to item 1 of the scope of the patent application, wherein the clustering algorithm is a k-means algorithm or a gmm algorithm.

The method for predicting a user's rating of an item according to item 1 of the scope of patent application, wherein the clustering of the plurality of sample pairs into a plurality of subclasses based on the plurality of sets of context features includes: Randomly selecting a predetermined number of initial centroids in the plurality of sample pairs; Based on the multiple sets of contextual features, calculate the distance from each non-centroid sample pair to each centroid; Classify each non-centroid sample pair to the nearest centroid according to the distance; Calculating the same number of new centroids based on the predetermined set of centroids and their corresponding non-centroid sample pairs based on the multiple sets of contextual features; Judging whether the new centroid satisfies a predetermined condition; and In a case where the predetermined condition is satisfied, a clustering result of the plurality of sample pairs is output.

The method for predicting a user's rating of an item according to item 1 of the scope of the patent application, wherein the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.

According to the method for predicting a user's rating of an item according to item 1 of the scope of patent application, wherein the collaborative filtering algorithm predicts each first user's rating of an unrated first item includes: For each sub-category, a user-item rating matrix is obtained based on the plurality of first user identifications, the plurality of first item identifications, and the plurality of existing ratings of the plurality of first users relative to the plurality of first items. ; Decomposing the user-item scoring matrix into two low-dimensional matrices such that the product of the two low-dimensional matrices is closest to the user-item scoring matrix; and According to the matrix obtained by multiplying the two low-dimensional matrices, each first user in the user-item scoring matrix is predicted to evaluate the first unrated first item.

The method for predicting a user's rating of an item according to item 1 of the scope of the patent application, wherein the existing rating is a direct rating by a user or a rating obtained based on a user operation.

An article recommendation method includes: A plurality of second sample pairs are obtained, and the second sample pair includes a second user identifier and a second article identifier, wherein the second user identifier is a user identifier of a user to be recommended, and the second article identifier is corresponding to a plurality of Any one of a plurality of article identifiers of the recommended article; Determining the subclass in which each of the second sample pairs resides from a plurality of subclasses obtained through the method according to any one of claims 1 to 7; Obtaining the prediction score corresponding to each of the second sample pairs in the sub-class to which the score is predicted by using the method described in any one of items 1 to 7 of the scope of the patent application; Sorting the second item identifiers included in each respective second sample pair based on the prediction score; and According to the ranking, the second item is recommended to the second user.

A device for predicting a user's rating of an article includes: A sample pair obtaining unit configured to obtain a plurality of sample pairs, the sample pair including any user ID selected from a plurality of user IDs and any item ID selected from a plurality of item IDs; The score obtaining unit is configured to obtain a plurality of existing scores, the plurality of existing scores corresponding to a part of the sample pairs in the plurality of sample pairs; The context feature obtaining unit is configured to obtain multiple sets of context features corresponding to each sample pair, wherein a set of context features includes at least one of the following types of features: user features, item features, and interaction features; A clustering unit configured to cluster the plurality of sample pairs into a plurality of subclasses based on the plurality of sets of context features, wherein each subclass includes a plurality of first sample pairs taken from the plurality of sample pairs Each of the first sample pairs includes a first user ID and a first item ID, wherein the first user ID is an ID of the first user and the first item ID is an ID of the first item; and The score prediction unit is configured to, with respect to each sub-category, based on a plurality of the first user identification and a plurality of the first item identification, and a plurality of existing ratings of the first user with respect to a plurality of the first item , Predict each first user's rating on the unrated first item through a collaborative filtering algorithm.

The device for predicting a user's rating of an item according to item 9 of the scope of patent application, wherein the user characteristics include user attribute characteristics and / or user rating statistical characteristics, and the item characteristics include item attribute characteristics and / or item rating statistical characteristics .

The device for predicting a user's rating of an item according to item 9 of the scope of the patent application, wherein the clustering algorithm is a k-means algorithm or a gmm algorithm.

The device for predicting a user's rating of an item according to item 9 of the scope of patent application, wherein the clustering unit includes: A selection unit configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs; A first calculation unit configured to calculate a distance from each non-centroid sample pair to each centroid based on the context feature; A classification unit configured to classify each non-centroid sample pair to the closest centroid according to the distance; A second calculation unit configured to calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs; A judging unit configured to judge whether the new centroid satisfies a predetermined condition; and The output unit is configured to output a clustering result of the plurality of sample pairs if the predetermined condition is satisfied.

The device for predicting a user's rating of an item according to item 9 of the scope of the patent application, wherein the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.

The device for predicting a user's rating of an item according to item 9 of the scope of patent application, wherein the rating prediction unit includes: The obtaining unit is configured to, for each sub-category, based on the plurality of first user identifications, the plurality of first item identifications, and the plurality of existing ratings of the plurality of first users relative to the plurality of first items, Get user-item scoring matrix; A decomposition unit configured to decompose the user-item scoring matrix into two low-dimensional matrices such that a product of the two low-dimensional matrices is closest to the user-item scoring matrix; and The prediction unit is configured to predict, based on a matrix obtained by multiplying two low-dimensional matrices, a score of each unused first item by each first user in the user-item rating matrix.

The device for predicting a user's rating of an item according to item 9 of the scope of the patent application, wherein the existing rating is a direct rating by a user or a rating obtained based on a user operation.

An article recommendation device includes: The sample pair obtaining unit is configured to obtain a plurality of second sample pairs, where the second sample pair includes a second user identifier and a second item identifier, wherein the second user identifier is a user identifier of a user to be recommended, and the second The item identifier is any one of a plurality of item identifiers corresponding to a plurality of items to be recommended; A determining unit configured to determine, among a plurality of subclasses obtained through the method according to any one of claims 1 to 7, a subclass in which each second sample pair is located; A prediction score obtaining unit configured to obtain a prediction score corresponding to each second sample pair in a subclass to which the second sample pair belongs from a score predicted by using the method described in any one of claims 1 to 7 of the scope of patent application; ; A sorting unit configured to sort the second item identifiers included in each second sample pair according to the prediction score; and The recommendation unit is configured to recommend the second item to the second user according to the ranking.