JP6566515B2

JP6566515B2 - Item recommendation system and item recommendation method

Info

Publication number: JP6566515B2
Application number: JP2015147269A
Authority: JP
Inventors: 一夫原; 郁美鈴木
Original assignee: Inter University Research Institute Corp Research Organization of Information and Systems
Current assignee: Inter University Research Institute Corp Research Organization of Information and Systems
Priority date: 2015-07-24
Filing date: 2015-07-24
Publication date: 2019-08-28
Anticipated expiration: 2035-07-24
Also published as: JP2017027480A

Description

本発明はアイテム推薦システム及びアイテム推薦方法に関する。詳しくは、ユーザベースあるいはアイテムベースに代表される協調フィルタリング（ＣＦ）において、シリングアタック、すなわち、システムがユーザに推薦するアイテムを決定する工程に介入するために偽ユーザを不正投入する攻撃に対して、頑健なアイテム推薦システム及びアイテム推薦方法に関する。 The present invention relates to an item recommendation system and an item recommendation method. Specifically, in collaborative filtering (CF) typified by a user base or item base, against a shilling attack, that is, an attack that illegally inputs a fake user to intervene in the process of determining an item recommended to the user by the system The present invention relates to a robust item recommendation system and item recommendation method.

ユーザベースのＣＦは、類似度演算に例えばｋ近傍法を用い、アイテムに対する嗜好が類似する他のユーザの過去の評価値を参照してアイテムをユーザに推薦するシステムである。すなわち、アイテムに対する評価値の与え方が類似する他のユーザｋ人を選んで、アイテムに係る評価値を予測し、高い評価値が得られたアイテムをユーザに推薦する。
しかしながら、例えばアイテムが商品で、評価値が嗜好度の場合、平均的な嗜好度を有するように設計された偽ユーザがユーザベースのＣＦシステムに投入される（アベレジアタックと呼ばれるシリングアタック）と、偽ユーザはどのユーザとも高い類似度を示すハブユーザ、すなわち、インフルエンサとなるため、偽ユーザの嗜好する商品が何時も推薦されるようになるおそれがある。
アイテムベースのＣＦ、すなわち、類似する他のアイテムに対するユーザの過去の評価を参照してユーザに推薦するアイテムを決める推薦システム及び推薦方法に対しては、セグメントアタックあるいはポピュラーアタックと呼ばれるシリングアタックが効果を持つ。 The user-based CF is a system that recommends an item to a user by using, for example, a k-nearest neighbor method for similarity calculation and referring to past evaluation values of other users with similar preferences for the item. In other words, k other users who are similar in how to give an evaluation value to an item are selected, an evaluation value related to the item is predicted, and an item with a high evaluation value is recommended to the user.
However, for example, when the item is a product and the evaluation value is a preference level, a fake user designed to have an average preference level is inserted into a user-based CF system (shilling attack called average attack). Since the fake user becomes a hub user that shows a high degree of similarity with any user, that is, an influencer, there is a possibility that a product that the fake user likes is always recommended.
A shilling attack called segment attack or popular attack is effective for item-based CF, that is, a recommendation system and a recommendation method for determining an item to be recommended to a user by referring to the user's past evaluation of other similar items. have.

他方、発明者達は、ｋ近傍法でハブを軽減する方法を提案した。すなわち、大規模高次元データセットに対して類似度尺度にラプラシアンベースのカーネルを適用する方法（非特許文献１参照）、センタリングを適用する方法（非特許文献２参照）、及び、局在的センタリングを適用する方法（非特許文献３参照）を提案した。
これらのハブを軽減する方法をユーザベースのＣＦあるいはアイテムベースのＣＦに適用することにより、ターゲットアイテムの評価を不正に高めるために攻撃者により偽ユーザが投入されたとしても、ターゲットアイテムの評価が変動しないようにするアイテム推薦システム及びアイテム推薦方法を提供できると期待される。 On the other hand, the inventors have proposed a method of reducing the hub by the k-nearest neighbor method. That is, a method of applying a Laplacian-based kernel to a similarity measure for a large-scale high-dimensional data set (see Non-Patent Document 1), a method of applying centering (see Non-Patent Document 2), and local centering Has proposed a method (see Non-Patent Document 3).
By applying these hub mitigation methods to user-based CF or item-based CF, even if a fake user is introduced by an attacker to illegally increase the evaluation of the target item, the evaluation of the target item It is expected that an item recommendation system and an item recommendation method that prevent fluctuations can be provided.

ＩｋｕｍｉＳｕｚｕｋｉ，ＫａｚｕｏＨａｒａ，ＭａｓａｓｈｉＳｈｉｍｂｏ，ＹｕｊｉＭａｔｓｕｍｏｔｏ，ＭａｒｃｏＳａｅｒｅｎｓ，「ＩｎｖｅｓｔｉｇａｔｉｎｇｔｈｅＥｆｆｅｃｔｉｖｅｎｅｓｓｏｆＬａｐｌａｃｉａｎ−ｂａｓｅｄＫｅｒｎｅｌｓｉｎＨｕｂＲｅｄｕｃｔｉｏｎ」、ＩｎＰｒｏｃ．２６ｔｈＡＡＡＩＣｏｎｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ、ｐｐ．１１１２−１１１８、２０１２年“Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Matsumoto, Marco Saerens,“ Investigating the Effect of Produced-Lanced Intensive In Kern. ” 26th AAAI Conference on Artificial Intelligence, pp. 1112-1118, 2012 鈴木郁美、原一夫、新保仁「ｋ近傍法でハブを軽減する類似度尺度」、情報処理学会研究報告、自然言語処理研究会、２０１２−ＮＬ−２０９、Ｎｏ．１１、ｐｐ．１−８、２０１２年Satomi Suzuki, Kazuo Hara, Hitoshi Shinho, “Similarity Measure to Reduce Hubs with k-Nearest Method”, Information Processing Society of Japan Research Report, Natural Language Processing Study Group, 2012-NL-209, No. 11, pp. 1-8, 2012 ＫａｚｕｏＨａｒａ，ＩｋｕｍｉＳｕｚｕｋｉ，ＭａｓａｓｈｉＳｈｉｍｂｏ，ＫｅｉＫｏｂａｙａｓｈｉ，ＫｅｎｊｉＦｕｋｕｍｉｚｕ，ＭｉｌｏｓＲａｄｏｖａｎｏｖｉｃ，「ＬｏｃａｌｉｚｅｄＣｅｎｔｅｒｉｎｇ：ＲｅｄｕｃｉｎｇＨｕｂｎｅｓｓｉｎＬａｒｇｅ−ＳａｍｐｌｅＤａｔａ」、ＩｎＰｒｏｃ．２９ｔｈＡＡＡＩＣｏｎｆｅｒｅｎｃｅｏｎＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ、ｐｐ．２６４５−２６５１、２０１５年Kazuo Hara, Ikumi Suzuki, Masashi Shimbo, Kei Kobayashi, Kenji Fukumizu, Milos Radovanovic, “Localized Centering: Reducating HubLense. 29th AAAI Conference on Artificial Intelligence, pp. 2645-2651, 2015

ユーザベースのＣＦは、アベレジアタックと呼ばれる攻撃、すなわち、どのユーザとも高い類似度を示す多数の偽ユーザを投入する攻撃を受けると、偽ユーザの嗜好する商品が何時も推薦されるようになるおそれがある。
また、アイテムベースのＣＦは、セグメントアタックあるいはポピュラーアタックと呼ばれる攻撃、すなわち、ある特定のトピック（例えば、アクション映画、ホラー映画などのトピック）において、ポピュラーなアイテムと高い類似度をターゲットアイテムに持たせるために、ターゲットアイテムとポピュラーアイテムの両方に高い評価値を与える偽ユーザを多数投入する攻撃を受けると、当該トピックに属するアイテムを好むユーザに対して、ターゲットアイテムが推薦され易くなる。
上記のような推薦は不自然であり、推薦システムの本来の機能を阻害するという問題があった。 When a user-based CF receives an attack called average attack, that is, an attack in which a large number of fake users showing high similarity with any user are injected, the product preferred by the fake user may be recommended at any time. There is.
In addition, the item-based CF has an attack called a segment attack or a popular attack, that is, a target item having a high degree of similarity to a popular item in a specific topic (for example, a topic such as an action movie or a horror movie). For this reason, when an attack is performed in which a large number of fake users who give high evaluation values to both the target item and the popular item are input, the target item is easily recommended to a user who likes an item belonging to the topic.
The recommendation as described above is unnatural and has a problem of hindering the original function of the recommendation system.

本発明は、ハブの出現が抑制された類似度尺度を用いる、あるいは、与えられた類似度尺度をハブが出現しにくくなるように変換して用いることにより、インフルエンサとなるユーザ、あるいは、インフルエンサとなるアイテムの出現を抑制し、これらの影響力を低減することによって、結果的に攻撃者の意図通りにターゲットアイテムの評価値を変更されることがないようにする。
本発明は、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ない、アイテム推薦システム及びアイテム推薦方法を提供することを目的とする。 The present invention uses a similarity measure in which the appearance of a hub is suppressed, or converts a given similarity measure so that the hub is less likely to appear. By suppressing the appearance of items serving as sensors and reducing their influence, the evaluation value of the target item is not changed as intended by the attacker.
An object of the present invention is to provide an item recommendation system and an item recommendation method that are less affected by an attack as a result even when subjected to an attack that introduces a fake user.

上記課題を解決するために、本発明の第１の態様に係るアイテム推薦システム１は、例えば図５に示すように、ユーザｕのアイテムｉに係る評価値Ｒ（ｕ，ｉ）を記入する評価マトリックスＲを記憶する評価マトリックス記憶部２１と、ハブの出現を抑制する類似度尺度を用いてユーザ間の類似度を演算する第１の類似度演算部１３１と、第１の類似度演算部１３１にて演算された類似度を用いて、対象ユーザとの類似度の高い方からｋ人のユーザを抽出する第１の近傍データ抽出部１４１と、第１の近傍データ抽出部１４１にて抽出されたｋ人のユーザのアイテムに係る評価値を用いて、対象ユーザに係る未記入のセルに記入すべき評価値を予測する第１の評価値予測部１５１と、第１の評価値予測部１５１にて予測された評価値の高いアイテムから対象ユーザに推薦すべきアイテムを抽出して、対象ユーザに推薦するアイテム推薦部１６とを備える。 In order to solve the above-described problem, the item recommendation system 1 according to the first aspect of the present invention is an evaluation in which an evaluation value R (u, i) related to an item i of the user u is entered as shown in FIG. An evaluation matrix storage unit 21 that stores the matrix R, a first similarity calculation unit 131 that calculates similarity between users using a similarity measure that suppresses the appearance of a hub, and a first similarity calculation unit 131 Are extracted by the first neighborhood data extraction unit 141 and the first neighborhood data extraction unit 141 that extract k users from the ones with higher similarity to the target user, using the similarity degree calculated in (1). A first evaluation value predicting unit 151 that predicts an evaluation value to be entered in an unfilled cell related to the target user using an evaluation value related to the items of k users, and a first evaluation value predicting unit 151 A high evaluation value predicted by It extracts an item to be recommended to the target user from Temu, and a item recommendation unit 16 to be recommended to the target user.

ここにおいて、アイテムは典型的には商品又はサービスである。さらに、商品又はサービスの種類、提供時期、提供地方、価格帯を限定する（夏季果物、Ｘ月公開映画等）等の条件を定めても良い。ただし、商品又はサービスに限定されず、評価可能であれば動植物、山河、都市、建築、絵画、音楽、演劇、武道、学問、生産性、効果でも良い。
また、マトリックスＲは典型的にはユーザ数×アイテム数の評価マトリックスである。評価値Ｒ（ｕ，ｉ）として、典型的にはユーザｕのアイテムｉに係る嗜好度が使用される。ただし、嗜好度に限られず、定量的に評価可能であれば良い。例えば、健康への寄与度でも、不動産の価値でも、目的地への所要時間でも良い。また、定量的な評価はランク、レベルで表現するものでも良い。 Here, the item is typically a product or service. Furthermore, conditions such as limiting the type of product or service, the provision period, the provision region, and the price range (summer fruits, movies released in X month, etc.) may be defined. However, it is not limited to goods or services, and may be animals and plants, mountains, cities, architecture, paintings, music, theater, martial arts, academics, productivity, and effects as long as they can be evaluated.
The matrix R is typically an evaluation matrix of the number of users × the number of items. As the evaluation value R (u, i), typically, the preference degree related to the item i of the user u is used. However, it is not limited to the degree of preference, and it is sufficient if it can be quantitatively evaluated. For example, the degree of contribution to health, the value of real estate, or the time required for the destination may be used. Further, quantitative evaluation may be expressed by rank and level.

また、類似度尺度とは、２つのデータの類似性を測る尺度として使用できるものすべてを含む。典型的には、内積、コサイン、ピアソン相関、距離が使用される。内積は２つのベクトルデータのスカラー積であり、コサインは長さ１に規格化されたベクトルデータの内積である。ピアソン相関は要素和がゼロになるように各要素値から要素和を差し引いた後に長さ１に規格化されたベクトルデータの内積である。さらに、内積の一般化とみなせる（機械学習分野で主に呼ばれるところの各種の）カーネルも含む。距離の典型は、２つのベクトルデータ間のユークリッド距離（Ｌ２ノルム）であるが、ユークリッド距離を一般化した距離（マンハッタン距離やＬｐノルムなど）も含む。さらに、ドメインの知識を持つ人間が各タスクの目的に応じて適宜定めた類似度スコア計算方法（ＢＬＡＳＴなど）が出力する類似度も、ここでの類似度尺度に含まれる。これらを一般的な類似度尺度ということとする。 The similarity measure includes all those that can be used as a measure for measuring the similarity between two data. Typically, dot product, cosine, Pearson correlation, distance are used. The inner product is a scalar product of two vector data, and the cosine is an inner product of vector data normalized to length 1. The Pearson correlation is an inner product of vector data normalized to length 1 after subtracting the element sum from each element value so that the element sum becomes zero. It also includes kernels (various that are mainly called in the field of machine learning) that can be regarded as generalizations of inner products. A typical distance is the Euclidean distance (L2 norm) between two vector data, but also includes a generalized distance (Manhattan distance, Lp norm, etc.). Further, the similarity level output by a similarity score calculation method (BLAST or the like) appropriately determined by a person having domain knowledge according to the purpose of each task is also included in the similarity scale here. These are called general similarity measures.

また、ハブの出現を抑制する類似度尺度として、例えば全てのデータ対象がデータ中心に同等に類似になるように変換された類似度尺度、すなわちＳｐａｔｉａｌＣｅｎｔｒａｌｉｔｙのない類似度尺度が該当する。例えば、上記一般的な類似度尺度に対して、原点をデータセットの平均（グローバルセントロイド）に移動する「（グローバル）センタリング」を適用して変換したもの、原点をローカルな部分集合の中心としてのローカルセントロイドに移動する「局在的センタリング」を適用して変換したものが挙げられる。さらに、ラプラシアンベースのカーネル、たとえば「コミュートタイムカーネル」（ＭａｒｃｏＳａｅｒｅｎｓ，ＦｒａｎｃｏｉｓＦｏｕｓｓ，ＬｕｈＹｅｎ，ａｎｄＰｉｅｒｒｅＤｕｐｏｎｔ．「Ｔｈｅｐｒｉｎｃｉｐａｌｃｏｍｐｏｎｅｎｔｓａｎａｌｙｓｉｓｏｆｇｒａｐｈ，ａｎｄｉｔｓｒｅｌａｔｉｏｎｓｈｉｐｓｔｏｓｐｅｃｔｒａｌｃｌｕｓｔｅｒｉｎｇ」、ＩｎＰｒｏｃ．１５ｔｈＥｕｒｏｐｅａｎＣｏｎｆｅｒｅｎｃｅｏｎＭａｃｈｉｎｅＬｅａｒｎｉｎｇ（ＥＣＭＬ），ｐｐ．３７１−３８３、２００４年）を適用して変換したものが該当する。
また、ハブの出現を抑制する類似度尺度として、全てのデータ対象がデータ中心に同等に類似になるように変換された類似度尺度以外にも、ミューチュアルプロキシミティ、ローカルスケーリング等が挙げられる。 In addition, as a similarity measure that suppresses the appearance of the hub, for example, a similarity measure that is converted so that all data objects are equally similar to the data center, that is, a similarity measure that does not have Spatial Centrality is applicable. For example, for the above general similarity measure, transformed by applying “(global) centering” that moves the origin to the average (global centroid) of the data set, with the origin as the center of the local subset And applying “local centering” that moves to the local centroid. In addition, Laplacian-based kernels, such as “commuted time kernels” (Marco Saerens, Francois Focuses, Luh Yen, and Pierre du ce nt and sir ent es s s s s s s s and s i s e s s s s s s s e s s s s s e s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s This corresponds to the one converted by applying Machine Learning (ECML), pp. 371-383 (2004).
In addition to the similarity measure that is converted so that all data objects are equally similar to the data center, there are mutual proximity, local scaling, and the like as the similarity measure that suppresses the appearance of the hub.

また、「ユーザとの類似度の高い方からｋ人のユーザを抽出する」とは、典型的にはｋ近傍法を使用して抽出することをいう。ｋとして任意の数値が可能であるが、たとえば、ムービーレンズデータセットでは、教師あり学習の結果、予測精度を高くするには、３０＜＝ｋ＜＝１００が好ましく、４０＜＝ｋ＜＝７０がより好ましく、ｋ＝５０が最も好ましい（図７参照）。
また、評価値を予測する際に、ｋ人の平均値を使用できる。さらに、平均値を用いる際に、後述のように重み付けした平均値を用いると好ましい。重み付けには例えばユーザ間の類似度、季節による係数（果物の品質は季節に影響を受ける）等を使用できる。 In addition, “extracting k users from those with a high degree of similarity to users” typically means extraction using the k-nearest neighbor method. An arbitrary numerical value can be used as k. For example, in a movie lens data set, 30 <= k <= 100 is preferable and 40 <= k <= 70 in order to increase prediction accuracy as a result of supervised learning. Is more preferable, and k = 50 is most preferable (see FIG. 7).
Moreover, when predicting an evaluation value, an average value of k people can be used. Furthermore, when using the average value, it is preferable to use a weighted average value as described later. For weighting, for example, a similarity between users, a coefficient depending on the season (the quality of the fruit is influenced by the season), and the like can be used.

ユーザベースのＣＦは、最近傍のｋ人（ｋ近傍法（ｋＮＮ）により抽出された最も近い方、及び類似度が高い方からｋ個のデータ）のユーザの過去の評価値を参照して未評価アイテムに対するユーザの評価値を予測する形態の推薦システムである。ユーザベースのＣＦの想定可能な欠点は、シリングアタックへの脆弱性である。シリングアタックは、システムに攻撃者によるバイアスのかかった（恣意的な）推薦を強制させるために、偽ユーザを推薦システムに投入する。ユーザベースのＣＦは、アイテムの特徴に基づく推薦を行わず、アイテムに対する他のユーザの過去の評価値に基づいて推薦を行う。このため、ユーザベースのＣＦは、どのユーザとも似るように偽ユーザを設計し、これを投入してシステムによる推薦アイテムの決定を変えさせようとする攻撃に対して、脆弱性を持つ。 The user-based CF is not yet referred to the past evaluation values of the k nearest users (k data from the closest one extracted by the k-nearest neighbor method (kNN) and the data having the highest similarity). It is the recommendation system of the form which estimates the user's evaluation value with respect to an evaluation item. A possible drawback of user-based CF is vulnerability to shilling attacks. Shilling attacks throw fake users into the recommendation system to force the system to make biased (arbitrary) recommendations by attackers. The user-based CF does not make a recommendation based on the feature of the item, but makes a recommendation based on a past evaluation value of another user for the item. For this reason, the user-based CF is vulnerable to an attack that attempts to change the decision of the recommended item by the system by designing a fake user to be similar to any user.

他方、高次元データセットには、いわゆる「次元の呪い」の結果として、ハブデータが出現し易いことが見出された（ＭｉｌｏｓＲａｄｏｖａｎｏｖｉｃ，ＡｌｅｘａｎｄｒｏｓＮａｎｏｐｏｕｌｏｓ，ＮｉｒｊａｎａＩｖａｎｏｖｉｃ．「ＨｕｂｓｉｎＳｐａｃｅ：ＰｏｐｕｌａｒＮｅａｒｅｓｔＮｅｉｇｈｂｏｒｓｉｎＨｉｇｈ-ＤｉｍｅｎｓｉｏｎａｌＤａｔａ」、ＪｏｕｒｎａｌｏｆＭａｃｈｉｎｅＬｅａｒｎｉｎｇＲｅｓｅａｒｃｈ，ｐｐ．２４８７−２５３１，２０１０年）。すなわち、高次元ではハブと呼ばれる少数のデータが他のデータのｋＮＮに頻繁に現れる。ユーザベースのＣＦシステムにおいて、ｋＮＮが計算される時、各々のユーザはアイテム数の次元を持つベクトルとして表されるが、アイテムは一般に数多く存在するため、ベクトルは高次元ベクトルとなる。したがって、ハブとなるデータ（ハブデータ）が出現する。ハブユーザは推薦工程にインフルエンサとして寄与するので、推薦システムによる推薦アイテムの決定に大きな影響を与える。
シリングアタックは、ハブを利用する攻撃と見て取れる。ユーザベースのＣＦに対する攻撃では、システムによる推薦アイテムの決定を意図的にコントロールすることを目的とし、インフルエンサ、すなわち、ハブとなる偽ユーザを投入する。具体的には、偽ユーザをユーザに関するデータ中心に類似するように設計し、投入する。 On the other hand, it was found that hub data is likely to appear in high-dimensional data sets as a result of so-called “curse of dimension” (Milos Radovanovic, Alexandros Nanopoulos, Nirjana Ivanovic. High-Dimensional Data ", Journal of Machine Learning Research, pp. 2487-2531, 2010). That is, in a high dimension, a small number of data called hubs frequently appear in the kNN of other data. In a user-based CF system, when kNN is calculated, each user is represented as a vector having a dimension of the number of items, but since there are generally many items, the vector is a high-dimensional vector. Therefore, data that serves as a hub (hub data) appears. Since the hub user contributes to the recommendation process as an influencer, it greatly affects the determination of recommended items by the recommendation system.
A shilling attack can be seen as an attack using a hub. In the attack on the user-based CF, an influencer, that is, a fake user serving as a hub is introduced for the purpose of intentionally controlling the determination of recommended items by the system. Specifically, a fake user is designed and put in a manner similar to the data center related to the user.

そこで、攻撃の影響を回避するために、ｋＮＮを求めるために使用される類似度尺度を全てのユーザがデータ中心に同等に類似するように変換することによって、ハブユーザ、すなわち、インフルエンサの出現自体を抑制し、偽ユーザをインフルエンサとしてシステムに送り込む攻撃者の企てを無効化することを提案する。ハブの出現を抑制する方法はいくつか提案されているが、たとえば、与えられた類似度マトリックスからコミュートタイムカーネルを計算することによって、又はより簡易に類似度マトリックスをセンタリングすることによって達成できる。ムービーレンズデータセットを用いて、かかる方法適用後に、偽ユーザがハブユーザと成りにくくなる傾向の存在を確認した（図８及び図９参照）。結果として、かかる類似度尺度の変換は、アイテム推薦の精度を劣化させることなく（図７参照）、攻撃に対して耐性の有るシステムを提供する。 Therefore, in order to avoid the impact of the attack, the similarity measure used to determine kNN is transformed so that all users are equally similar to the data center, so that the appearance of the hub user, ie the influencer itself We propose to disable attackers' attempts to send fake users to the system as influencers. Several methods for suppressing the appearance of hubs have been proposed, but can be achieved, for example, by calculating a commute time kernel from a given similarity matrix, or more simply by centering the similarity matrix. Using a movie lens data set, it was confirmed that a fake user tends not to be a hub user after applying this method (see FIGS. 8 and 9). As a result, such similarity scale conversion provides a system that is resistant to attack without degrading the accuracy of item recommendation (see FIG. 7).

本態様のように構成すると、ハブの出現を抑制する類似度尺度を使用して類似度を演算するのでハブの出現を抑制でき、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ないアイテム推薦システムを提供することができる。 When configured in this manner, the similarity is calculated using a similarity scale that suppresses the appearance of the hub, so the appearance of the hub can be suppressed. It is possible to provide an item recommendation system that is rarely received.

上記課題を解決するために、本発明の第２の態様に係るアイテム推薦システム１は、例えば図５に示すように、ユーザｕのアイテムｉに係る評価値Ｒ（ｕ，ｉ）を記入する評価マトリックスＲを記憶する評価マトリックス記憶部２１と、ハブの出現を抑制する類似度尺度を用いてアイテム間の類似度を演算する第２の類似度演算部１３２と、第２の類似度演算部１３２にて演算された類似度を用いて、対象アイテムとの類似度の高い方からｋ個のアイテムを抽出する第２の近傍データ抽出部１４２と、第２の近傍データ抽出部１４２にて抽出されたｋ個のアイテムに係る対象ユーザの評価値を用いて、対象ユーザに係る未記入のセルに記入すべき評価値を予測する第２の評価値予測部１５２と、第２の評価値予測部１５２にて予測された評価値の高いアイテムから対象ユーザに推薦すべきアイテムを抽出して、対象ユーザに推薦するアイテム推薦部１６を備える。 In order to solve the above-mentioned problem, the item recommendation system 1 according to the second aspect of the present invention is an evaluation in which an evaluation value R (u, i) related to an item i of a user u is entered, as shown in FIG. An evaluation matrix storage unit 21 that stores the matrix R, a second similarity calculation unit 132 that calculates the similarity between items using a similarity measure that suppresses the appearance of a hub, and a second similarity calculation unit 132 Are extracted by the second neighborhood data extraction unit 142 that extracts k items from the one with the higher similarity to the target item, and the second neighborhood data extraction unit 142 using the similarity calculated in A second evaluation value predicting unit 152 that predicts an evaluation value to be entered in an unfilled cell related to the target user using the evaluation values of the target user related to the k items, and a second evaluation value predicting unit Evaluation predicted at 152 Extracts an item to be recommended to the target user from the highest item comprises an item recommendation unit 16 to be recommended to the target user.

第１の態様では、ユーザ間の類似度に基づいて評価値を求めたが、本態様ではアイテム間の類似度に基づいて評価値を求める。第１の態様では、ｋ人の平均値を使用したが、本態様ではｋ個のアイテムの平均値を用いる。しかし、その他のシステム構成は第１の態様と同様であり、第１の態様と同様に、ハブの出現を低減できるので、偽ユーザが投入されても、結果として推薦アイテムの決定が偽ユーザの投入に影響されにくい、すなわち、攻撃に対して頑健なアイテム推薦システムを提供することができる。
このように構成すると、ハブの出現を抑制する類似度尺度を使用するので、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ないアイテム推薦システムを提供することができる。 In the first aspect, the evaluation value is obtained based on the similarity between users, but in this aspect, the evaluation value is obtained based on the similarity between items. In the first aspect, an average value of k people is used, but in this aspect, an average value of k items is used. However, other system configurations are the same as in the first aspect, and as in the first aspect, the appearance of a hub can be reduced. Therefore, even if a fake user is entered, the recommended item is determined as a result of the fake user. It is possible to provide an item recommendation system that is not easily affected by input, that is, robust against attacks.
With this configuration, since the similarity measure that suppresses the appearance of the hub is used, it is possible to provide an item recommendation system that is less affected by an attack even when subjected to an attack that introduces a fake user. .

また、本発明の第３の態様に係るアイテム推薦システム１は、第１又は第２の態様において、ハブの出現を抑制する類似度尺度を記憶する類似度尺度記憶部２２を備える。
このように構成すると、システムに記憶されたハブの出現を抑制する類似度尺度を使用して類似度を演算するのでハブの出現を抑制でき、アイテム推薦時に偽ユーザによる影響を少なくすることができる。 In addition, the item recommendation system 1 according to the third aspect of the present invention includes a similarity measure storage unit 22 that stores a similarity measure that suppresses the appearance of a hub in the first or second aspect.
If comprised in this way, since similarity is calculated using the similarity scale which suppresses the appearance of the hub memorize | stored in the system, the appearance of a hub can be suppressed and the influence by a fake user at the time of item recommendation can be reduced. .

また、本発明の第４の態様に係るアイテム推薦システムは、第３の態様において、一般的な類似度尺度に基づく類似度を前記ハブの出現を抑制する類似度尺度に基づく類似度に変換する類似度尺度変換部１３５を備える。
ここにおいて、類似度の変換には、例えば一般的な類似度尺度の式をハブの出現を抑制する類似度尺度の式に変換して類似度を求める方法、一般的な類似度尺度で求めた類似度を行列によりハブの出現を抑制する類似度尺度に基づく類似度に変換する方法等が使用される。
このように構成すると、一般的な類似度尺度をハブの出現を抑制する類似度尺度に変換して使用することによりハブの出現を抑制でき、アイテム推薦時に偽ユーザによる影響を少なくすることができる。 The item recommendation system according to the fourth aspect of the present invention converts the similarity based on a general similarity scale into a similarity based on a similarity scale that suppresses the appearance of the hub in the third aspect. A similarity scale conversion unit 135 is provided.
Here, for similarity conversion, for example, a general similarity scale formula is converted to a similarity scale formula that suppresses the appearance of the hub, and the similarity is calculated, or the general similarity scale is used. A method of converting the similarity into a similarity based on a similarity measure that suppresses the appearance of a hub by a matrix is used.
If comprised in this way, the appearance of a hub can be suppressed by converting and using a general similarity scale to the similarity scale which suppresses the appearance of a hub, and the influence by a fake user at the time of item recommendation can be reduced. .

また、本発明の第５の態様に係るアイテム推薦システムは、第1ないし第４のいずれかの態様において、対象ユーザに係る未記入のセルに記入すべき評価値を予測するに際し、記入すべき評価値として、重み付けをした平均値を用いる。
ここにおいて、重み付けには例えばユーザ間あるいはアイテム間の類似度、季節による係数（果物の品質は季節に影響を受ける）等を使用できる。本態様のように構成すると予測精度を向上できる。 The item recommendation system according to the fifth aspect of the present invention should be filled in when predicting an evaluation value to be written in an unfilled cell related to the target user in any of the first to fourth aspects. A weighted average value is used as the evaluation value.
Here, for example, similarity between users or items, a coefficient by season (the quality of the fruit is influenced by the season), and the like can be used for weighting. When configured as in this aspect, prediction accuracy can be improved.

上記課題を解決するために、本発明の第６の態様に係るアイテム推薦方法は、例えば図６（ａ）に示すように、ユーザｕのアイテムｉに係る評価値Ｒ（ｕ，ｉ）を記入する評価マトリックスＲを記憶する評価マトリックス記憶工程（Ｓ１０４）と，ハブの出現を抑制する類似度尺度を用いてユーザ間の類似度を演算する第１の類似度演算工程（Ｓ１０７）と，第１の類似度演算工程（Ｓ１０７）にて演算された類似度を用いて、対象ユーザとの類似度の高い方からｋ人のユーザを抽出する第１の近傍データ抽出工程（Ｓ１０８）と、第１の近傍データ抽出工程（Ｓ１０８）にて抽出されたｋ人のユーザのアイテムに係る評価値を用いて、対象ユーザに係る未記入のセルに記入すべき評価値を予測する第１の評価値予測工程（Ｓ１０９）と、第１の評価値予測工程（Ｓ１０９）にて予測された評価値の高いアイテムから対象ユーザに推薦すべきアイテムを抽出して、対象ユーザに推薦するアイテム推薦工程（Ｓ１１０）とを備える。 In order to solve the above-described problem, the item recommendation method according to the sixth aspect of the present invention fills in an evaluation value R (u, i) related to the item i of the user u as shown in FIG. An evaluation matrix storage step (S104) for storing an evaluation matrix R to be performed, a first similarity calculation step (S107) for calculating a similarity between users using a similarity measure for suppressing the appearance of a hub, a first A first neighborhood data extraction step (S108) for extracting k users from those having a higher similarity to the target user using the similarity calculated in the similarity calculation step (S107) of the first, First evaluation value prediction that predicts an evaluation value to be entered in an unfilled cell of the target user using the evaluation value of the k users' items extracted in the neighborhood data extraction step (S108) Step (S109) From high prediction evaluation value item in the evaluation value prediction step (S109) extracting the item to be recommended to the target user of, and a item recommendation process to be recommended to the target user (S110).

本態様は第１の態様に係るアイテム推薦システムに対応するアイテム推薦方法である。
本態様のように構成すると、ハブユーザの出現を低減できるので、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ないアイテム推薦方法を提供することができる。 This aspect is an item recommendation method corresponding to the item recommendation system according to the first aspect.
By configuring as in this aspect, it is possible to reduce the appearance of hub users, and thus it is possible to provide an item recommendation method that is less affected by an attack as a result even when subjected to an attack that introduces a fake user.

上記課題を解決するために、本発明の第７の態様に係るアイテム推薦方法は、例えば図６（ｂ）に示すように、ユーザｕのアイテムｉに係る評価値Ｒ（ｕ，ｉ）を記入する評価マトリックスＲを記憶する評価マトリックス記憶工程（Ｓ１０４）と，ハブの出現を抑制する類似度尺度を用いてアイテム間の類似度を演算する第２の類似度演算工程（Ｓ２０７）と，第２の類似度演算工程（Ｓ２０７）にて演算された類似度を用いて、対象アイテムとの類似度の高い方からｋ個のアイテムを抽出する第２の近傍データ抽出工程（Ｓ２０８）と、第２の近傍データ抽出工程（Ｓ２０８）にて抽出されたｋ個のアイテムに係る対象ユーザの評価値を用いて、対象ユーザに係る未記入のセルに記入すべき評価値を予測する第２の評価値予測工程（Ｓ２０９）と、第２の評価値予測工程（Ｓ２０９）にて予測された評価値の高いアイテムから対象ユーザに推薦すべきアイテムを抽出して、対象ユーザに推薦するアイテム推薦工程（Ｓ１１０）とを備える。 In order to solve the above-described problem, the item recommendation method according to the seventh aspect of the present invention fills in the evaluation value R (u, i) related to the item i of the user u as shown in FIG. An evaluation matrix storage step (S104) for storing an evaluation matrix R to be performed, a second similarity calculation step (S207) for calculating a similarity between items using a similarity measure that suppresses the appearance of a hub, and a second A second neighborhood data extraction step (S208) for extracting k items from the higher similarity to the target item using the similarity calculated in the similarity calculation step (S207) of the second, A second evaluation value for predicting an evaluation value to be entered in an unfilled cell relating to the target user, using the evaluation value of the target user relating to the k items extracted in the neighborhood data extraction step (S208) Prediction process (S20 And an item recommendation step (S110) for extracting an item to be recommended to the target user from the items having a high evaluation value predicted in the second evaluation value prediction step (S209) and recommending it to the target user. .

本態様は第２の態様に係るアイテム推薦システムに対応するアイテム推薦方法である。
本態様のように構成すると、ハブアイテムの出現を低減できるので、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ないアイテム推薦方法を提供することができる。 This aspect is an item recommendation method corresponding to the item recommendation system according to the second aspect.
By configuring as in this aspect, it is possible to reduce the appearance of hub items, and thus it is possible to provide an item recommendation method that is less affected by an attack as a result even when subjected to an attack that introduces a fake user.

また、本発明の第８の態様に係るアイテム推薦システムは、第６又は第７の態様において、一般的な類似度尺度に基づく類似度をハブの出現を抑制する類似度尺度に基づく類似度に変換する類似度尺度変換工程（Ｓ１０６）を備える。
このように構成すると、一般的な類似度尺度に基づく類似度をハブの出現を抑制する類似度尺度に基づく類似度に変換して使用することによりハブの出現を抑制でき、アイテム推薦時に偽ユーザによる影響を少なくすることができる。 The item recommendation system according to the eighth aspect of the present invention is the sixth or seventh aspect, wherein the similarity based on the general similarity measure is changed to the similarity based on the similarity measure that suppresses the appearance of the hub. A similarity scale conversion step (S106) for conversion is provided.
With this configuration, it is possible to suppress the appearance of the hub by converting the similarity based on the general similarity scale into the similarity based on the similarity scale that suppresses the appearance of the hub, and the fake user at the time of item recommendation Can reduce the influence of.

また、本発明の第９の態様に係るプログラムは、第６ないし第８のいずれかの態様のアイテム推薦方法をコンピュータに実行させるためのプログラムである。 A program according to the ninth aspect of the present invention is a program for causing a computer to execute the item recommendation method according to any of the sixth to eighth aspects.

また、本発明の第１０の態様に係る記録媒体は、第９の態様に係るプログラムを記録したコンピュータ読み取り可能な記録媒体である。 A recording medium according to the tenth aspect of the present invention is a computer-readable recording medium recording the program according to the ninth aspect.

本発明によれば、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ないアイテム推薦システム及びアイテム推薦方法を提供できる。 ADVANTAGE OF THE INVENTION According to this invention, even if it receives the attack which throws in a fake user, it can provide the item recommendation system and the item recommendation method which are hardly influenced by an attack as a result.

評価マトリックスＲの例を示す図である。It is a figure which shows the example of the evaluation matrix R. ユーザ間の相関を示す図である。図２（ａ）はユーザｕとユーザｖとの相関を説明するための図、図２（ｂ）は相関が強い場合の散布図、図２（ｃ）は相関が弱い場合の散布図である。It is a figure which shows the correlation between users. 2A is a diagram for explaining the correlation between the user u and the user v, FIG. 2B is a scatter diagram when the correlation is strong, and FIG. 2C is a scatter diagram when the correlation is weak. . 低次元と高次元におけるＮ_１０分布を示す図である。図３（ａ）は低次元、図３（ｂ）は高次元におけるＮ_１０のヒストグラムである。Is a diagram showing an N ₁₀ distribution in low-dimensional and higher-dimensional. 3 (a) is low-dimensional, 3 (b) is a histogram of the _{N 10} in high level. Ｎ_１０値とデータ中心への類似度の関係を示す散布図である。図４（ａ）は低次元、図４（ｂ）は高次元における図である。It is a scatter diagram showing a similarity relationship to N ₁₀ value and the data center. FIG. 4A is a diagram in a low dimension, and FIG. 4B is a diagram in a high dimension. 実施例１及び実施例２におけるアイテム推薦システム１の構成例を示す図である。It is a figure which shows the structural example of the item recommendation system 1 in Example 1 and Example 2. FIG. 実施例１及び実施例２におけるアイテム推薦方法の処理フロー例を示す図である。図６（ａ）は実施例１における処理フロー例を示す図、図６（ｂ）は実施例２における処理フロー例を示す図である。It is a figure which shows the example of a processing flow of the item recommendation method in Example 1 and Example 2. FIG. FIG. 6A is a diagram illustrating an example of a processing flow in the first embodiment, and FIG. 6B is a diagram illustrating an example of a processing flow in the second embodiment. 最近傍パラメータｋを横軸、平均絶対誤差（ＭＡＥ）を縦軸とし、異なる類似度尺度を用いたアイテム推薦システムを比較する図である。It is a figure which compares the item recommendation system which used the nearest neighbor parameter k as a horizontal axis | shaft, made the average absolute error (MAE) the vertical axis | shaft, and used a different similarity measure. ユーザ間類似度尺度として一般的なピアソン相関を用いた場合、投入された偽ユーザがハブになっていることを示す図（その１）で、誠実なユーザ（偽ユーザ以外のユーザ）を含む全ユーザに関するＮ_５０とデータ中心への類似度に係る散布図である。When general Pearson correlation is used as a measure of similarity between users, it is a diagram (part 1) showing that the input fake user is a hub, including all sincere users (users other than fake users) it is a scatter diagram relating to the similarity of the N ₅₀ and the data center about the user. ユーザ間類似度尺度として一般的なピアソン相関を用いた場合、投入された偽ユーザが（ａ）ハブとなること、（ｂ），（ｃ）類似度尺度の変換によってハブとなりにくくなることを示す図（その２）である。図９（ａ）はユーザ間類似度尺度として一般的なピアソン相関を用いた場合のＮ_５０に関するヒストグラムである。図９（ｂｃ）はセンタリングによる変換後のＮ_５０に関するヒストグラム、図９（ｃ）はコミュートタイムカーネルによる変換後のＮ_５０に関するヒストグラムである。When general Pearson correlation is used as a similarity measure between users, it is shown that the input false user becomes (a) a hub, and (b) and (c) it becomes difficult to become a hub by conversion of the similarity measure. It is a figure (the 2). FIG. 9A is a histogram regarding N ₅₀ when a general Pearson correlation is used as the similarity measure between users. FIG. 9B is a histogram regarding N ₅₀ after conversion by centering, and FIG. 9C is a histogram regarding N ₅₀ after conversion by the commuting time kernel.

図面を参照して以下に本発明の実施の形態について説明する。なお、各図において、互いに同一又は相当する部分には同一符号を付し、重複した説明は省略する。 Embodiments of the present invention will be described below with reference to the drawings. In the drawings, the same or corresponding parts are denoted by the same reference numerals, and redundant description is omitted.

〔ユーザベースのＣＦ〕
ユーザ数N_user×アイテム数N_itemのマトリックスＲを、アイテムに対するユーザの過去の反応（評価値）からなるデータセットとする。Ｒ（ｕ，ｉ）はｕ番目のユーザのｉ番目のアイテムへの評価値を示す。マトリックスＲはｎｉｌと称する値の無い空欄を含んでいる。このｎｉｌの値は、ユーザのアイテムに対する評価がまだ与えられていないことを意味する。一般に、マトリックスＲは空欄が多く、大部分がｎｉｌである。ユーザベースのＣＦは（後述するアイテムベースのＣＦも）、ｋ近傍法を利用してこれらの値を予測するものである。 [User-based CF]
The matrix R of the number of users N _user × the number of items N _{item is} a data set composed of user's past responses (evaluation values) to the items. R (u, i) represents an evaluation value for the i-th item of the u-th user. The matrix R contains a blank with no value called nil. This nil value means that the user's item has not yet been rated. In general, the matrix R is blank and most is nil. The user-based CF (and the item-based CF described later) predicts these values using the k-nearest neighbor method.

式（１）はｕ番目のユーザのｉ番目のアイテムへの評価値Ｒ（ｕ，ｉ）を予測する予測関数Ｐｒｅｄ（ｕ，ｉ）を示す。ユーザｕとユーザｖ間の類似度をＳｉｍ（ｕ，ｖ）、類似度Ｓｉｍのもとでユーザｕと最近傍となるｋ人のユーザの集合をＵとし、Ｕに属するユーザｎについて使用する評価値は、Ｒ（ｎ，ｉ）≠ｎｉｌを満たす。

Expression (1) represents a prediction function Pred (u, i) for predicting an evaluation value R (u, i) for the i-th item of the u-th user. Assume that the similarity between the user u and the user v is Sim (u, v), the set of k users closest to the user u under the similarity Sim is U, and the user n belonging to U is used for evaluation The value satisfies R (n, i) ≠ nil.

さらに

である。

はユーザｕが評価したアイテムに対する平均の評価値である。δ〔．〕は〔〕内の条件が満たされれば１、それ以外は０となる指示関数である。 further

It is.

Is an average evaluation value for the items evaluated by the user u. δ [. ] Is an indicator function that is 1 if the condition in [] is satisfied, and 0 otherwise.

図１にマトリックスＲの例を示す。図１（ａ）は偽ユーザ投入前、図１（ｂ）は偽ユーザ投入後の例を示す。列方向にアイテムｉを、行方向にユーザｕを配置し、その交点となるセル（欄）に評価値Ｒ（ｕ，ｉ）が記入されている。ここでは評価値Ｒ（ｕ，ｉ）は１から５の５段階の整数で評価されている。ターゲットアイテムの評価値の引き上げを目的とするアベレジアタックでは、図１（ｂ）の下側のように、ターゲットアイテムであるアイテム１に高い評価を、その他のアイテムには過去にそのアイテムに対して評価を与えたユーザが付与した評価の平均に近い値を与える偽ユーザが投入される。 FIG. 1 shows an example of the matrix R. FIG. 1A shows an example before the fake user is input, and FIG. 1B shows an example after the fake user is input. The item i is arranged in the column direction and the user u is arranged in the row direction, and the evaluation value R (u, i) is entered in the cell (column) that is the intersection. Here, the evaluation value R (u, i) is evaluated by an integer of 5 levels from 1 to 5. In the average attack for the purpose of raising the evaluation value of the target item, as shown in the lower part of FIG. The fake user who gives a value close to the average of the evaluations given by the users who gave the evaluations is input.

図２を用いて、ユーザ間の類似度を、アイテムに与える評価値のピアソン相関により測る方法について説明する。図２（ａ）はユーザｕとユーザｖの相関を説明するための図である。図１（ａ）に示したアイテム２（Ｒ（ｕ，２）＝１，Ｒ（ｖ，２）＝１）、および、アイテム３（Ｒ（ｕ，３）＝５，Ｒ（ｖ，３）＝４）が図２（ａ）にプロットされている。図２（ｂ），（ｃ）は全アイテムｉ（Ｒ（ｕ，ｉ）、Ｒ（ｖ，ｉ））をプロットした散布図であり、図２（ｂ）はユーザｕとユーザｖの（正の）相関が強い場合、図２（ｃ）はユーザｕとユーザｖの相関が弱い場合の例である。相関が強い場合は、全アイテムのプロットは直線に乗り、相関が弱い場合はアトランダムとなる。 A method of measuring the similarity between users by the Pearson correlation of evaluation values given to items will be described with reference to FIG. FIG. 2A is a diagram for explaining the correlation between the user u and the user v. Item 2 (R (u, 2) = 1, R (v, 2) = 1) and item 3 (R (u, 3) = 5, R (v, 3) shown in FIG. = 4) is plotted in FIG. Figure 2 (b), (c) the total item i (R (u, i) , R (v, i)) is a scatter diagram plotting, FIG. 2 (b) of the user u and user v (positive (C) is an example when the correlation between the user u and the user v is weak. When the correlation is strong, the plots of all items are on a straight line, and when the correlation is weak, they are at random.

〔ユーザベースのＣＦに一般的に使用されるユーザ間類似度尺度〕
ユーザベースのＣＦにおいて、ユーザ間類似度を与える関数Ｓｉｍ（．，．）を適切に選定することはが重要である。なぜなら、類似度関数はｋＮＮに入るユーザ、及び式（１）に係るｋＮＮに入るユーザの重みを決定するからである。
一般的な類似度尺度関数として、マトリックスＲのｎｉｌを０に置換した後に、行ベクトル（各ユーザが与えた評価値のベクトル）がなす角度のコサイン（ｃｏｓ）を計算するコサイン類似度がある。式（２）にこれを示す。 [Inter-user similarity measure commonly used for user-based CF]
In the user-based CF, it is important to appropriately select the function Sim (.,.) That gives the similarity between users. This is because the similarity function determines the weight of the user who enters kNN and the user who enters kNN according to Equation (1).
As a general similarity measure function, there is a cosine similarity for calculating a cosine (cos) of an angle formed by a row vector (a vector of evaluation values given by each user) after replacing nil of the matrix R with 0. This is shown in equation (2).

ここに、ｘ_ｕはN_item次元ベクトルで、その成分はＲ（ｕ，ｉ）≠ｎｉｌならばｘ_ｕ（ｉ）＝Ｒ（ｕ，ｉ）、それ以外はｘ_ｕ（ｉ）＝０となる。上記関数使用の１つの欠点は、各々のユーザｕがアイテムに与える平均的評価

の違いに基づくバイアスが無視されるという点である。それ故、上記欠点の修正方法として、各ベクトル成分から

を差し引く方法が一般的に使用される。

Here, x _u is an N _item dimensional vector, and its component is x _u (i) = R (u, i) if R (u, i) ≠ nil, and x _u (i) = 0 otherwise. . One drawback of using the above function is that the average rating that each user u gives to the item

The bias based on the difference is ignored. Therefore, as a correction method for the above-mentioned defects, from each vector component

The method of subtracting is generally used.

ユーザのバイアス

を差し引いたベクトルを用いた類似度は、式（３）のように計算され、これはユーザ間のピアソン（Ｐｅａｒｓｏｎ）相関と呼ばれる。

ここに、もし、Ｒ（ｕ，ｉ）≠ｎｉｌかつＲ（ｖ，ｉ）≠ｎｉｌならば、

であり、そうでなければｘ’_ｕ（ｉ）＝０、ｘ’_ｖ（ｉ）＝０となる。 User bias

Similarity using a vector obtained by subtracting is calculated as shown in Equation (3), which is called Pearson correlation between users.

Here, if R (u, i) ≠ nil and R (v, i) ≠ nil,

Otherwise, x ′ _u (i) = 0 and x ′ _v (i) = 0.

〔ＣＦへのアタック〕
ユーザベースのＣＦ、すなわち、ユーザと類似する他のユーザの過去の評価を参照してユーザに推薦するアイテムを決める推薦システム及び推薦方法に対しては、アベレジアタックと呼ばれるシリングアタックが効果を持つ。システムが持つ評価値マトリックスＲを、不正な評価値を加えることによって改ざんすれば、推薦されるアイテムは変更される。この目的で偽ユーザを投入する攻撃をシリングアタックと呼び、アベレジアタックはその一つである。この攻撃を受けると、どのユーザも偽ユーザとの類似度が高くなる。つまり、偽ユーザは、推薦アイテムの決定に影響力を持つインフルエンサ、すなわち、ハブユーザとなる。
アベレジアタックにおいて投入される偽ユーザは、ターゲットアイテム（攻撃対象アイテム）を好む振る舞いをし、他のアイテムに対しては誠実なユーザ（偽ユーザ以外のユーザ）の平均的な振る舞いをする。すなわち、偽ユーザはターゲットアイテムには高い評価値点を与え、残りのアイテムには平均的な評価値を与える。結果として、偽ユーザはターゲットアイテムを好み、かつ、任意の誠実なユーザとの類似が高くなる。それ故に、ユーザベースのＣＦは、アベレジアタックを受けると、偽ユーザが高い評価を与えるターゲットアイテムを全ての誠実なユーザに推薦しやすくなる。 [Attack on CF]
A shilling attack called average attack is effective for a user-based CF, that is, a recommendation system and a recommendation method for determining items recommended to a user by referring to past evaluations of other users similar to the user. . If the evaluation value matrix R possessed by the system is altered by adding an illegal evaluation value, the recommended item is changed. An attack that introduces a fake user for this purpose is called a shilling attack, and the average attack is one of them. When subjected to this attack, every user becomes highly similar to a fake user. That is, the fake user is an influencer who has an influence on the determination of the recommended item, that is, a hub user.
The fake user thrown in the average attack behaves like the target item (attack target item), and acts like an honest user (user other than the fake user) on the other items. That is, the fake user gives a high evaluation value point to the target item and gives an average evaluation value to the remaining items. As a result, the fake user likes the target item and becomes more similar to any sincere user. Therefore, when the user-based CF receives an average attack, the user-based CF easily recommends a target item to which a fake user gives a high evaluation to all sincere users.

アイテムベースのＣＦ、すなわち、類似する他のアイテムに対するユーザの過去の評価を参照してユーザに推薦するアイテムを決める推薦システム及び推薦方法に対しては、セグメントアタックあるいはポピュラーアタックと呼ばれるシリングアタックが効果を持つ。この攻撃を受けると、攻撃対象となるターゲットアイテムは、どのユーザも高い評価を与えるポピュラーアイテムとの類似度が高くなる。攻撃者は、ポピュラーアイテムが推薦アイテムの決定に影響力を持つインフルエンサ、すなわち、ハブアイテムであることを悪用し、システムによるターゲットアイテムの評価値を不当に高く変更しようとする。 A shilling attack called segment attack or popular attack is effective for item-based CF, that is, a recommendation system and a recommendation method for determining an item recommended to a user by referring to a user's past evaluation of other similar items. have. Upon receiving this attack, the target item to be attacked becomes highly similar to a popular item that is highly evaluated by any user. The attacker misuses that the popular item is an influencer that influences the determination of the recommended item, that is, a hub item, and tries to change the evaluation value of the target item by the system to be unreasonably high.

〔ハブ現象〕
ハブ現象は、「次元の呪い」の結果として起こる現象の一つである。Ｄをｄ次元データの集合とし、Ｎ_ｋ（ｘ）は、Ｄ内のデータｘ∈ＤがＤ内の他のデータのｋＮＮ内に入る回数を示す。次元ｄが増加すると、Ｎ_ｋの分布形状は右に長い尾を引くように変わる（図３参照）。又は、少数のデータが大きなＮ_ｋ値をとるようになる。かかる大きなＮ_ｋ値を示すデータをハブといい、かかる現象をハブネス（ハブ現象）という。 [Hub phenomenon]
The hub phenomenon is one of the phenomena that occurs as a result of the “dimensional curse”. Let D be a set of d-dimensional data, and N _k (x) represents the number of times data xεD in D falls within kNN of other data in D. As the dimension d increases, the distribution shape of _Nk changes to have a long tail on the right (see FIG. 3). Alternatively, a small number of data has a large N _k value. Data showing such a large N _k value is called a hub, and this phenomenon is called hubness (hub phenomenon).

ここでは、人工データセットを用いてハブ現象について説明する。推薦システムでは一般に各ユーザは数個のアイテムに対してのみ評価値を与えるため、評価マトリックスＲは空欄の多いスパースなマトリックスとなるが、この情況を模してスパースなデータセットを人工的に生成した。データセットは２０００個のデータからなり、それぞれｄ次元ベクトルである。各データの生成方法は次の通りである：まず、各次元ｉ＝１，．．．，ｄに対して、Ｌｏｇｎｏｒｍａｌ（５；１）分布にしたがって発生させた正の実数を丸め、整数ｎｉを得る。そして、２０００個のデータからランダムにｎ_ｉ個を選択し、その各々に対して、範囲〔０，１〕から一様に乱数を発生させ、それを各々のベクトルのｉ番目の要素（ｉ次元成分）とする。 Here, the hub phenomenon will be described using an artificial data set. In the recommendation system, each user generally gives an evaluation value to only a few items, so the evaluation matrix R is a sparse matrix with many blanks, but a sparse data set is artificially generated to mimic this situation. did. The data set consists of 2000 pieces of data, each of which is a d-dimensional vector. The method of generating each data is as follows: First, each dimension i = 1,. . . , D, a positive real number generated according to the Lognomal (5; 1) distribution is rounded to obtain an integer ni. Then, select the n _i pieces randomly from 2000 pieces of data, with respect to each, to generate a uniform random number in the range [0,1], i-th element of each vector it (i dimension Component).

図３は、データ間の類似度をベクトル間の角度、すなわち、ｃｏｓ（コサイン類似度）を用いて測ったときの、Ｎ_１０分布を示すヒストグラムである。図３（ａ）は低次元、図３（ｂ）は高次元の場合のヒストグラムである。ハブ現象の出現を説明するために、次元が低い場合（ｄ＝５０）と高い場合（ｄ＝１０００）の２ケースにおいてＮ_１０分布を比較した。図３は、高次元では大きなＮ_１０値を持つデータが出現し、結果としてＮ_１０の分布が歪む（対称でなくなる）ことを示す。最大となるＮ_１０は図３（ａ）で３８、図３（ｂ）で１３３である。 3, the angle between the vectors of the similarities between the data, that is, when measured with cos (cosine similarity) _is a histogram showing the _{N 10} distribution. FIG. 3A is a histogram in the case of low dimensions, and FIG. 3B is a histogram in the case of high dimensions. In order to explain the appearance of the hub phenomenon, N ₁₀ distributions were compared in two cases, when the dimension was low (d = 50) and when the dimension was high (d = 1000). FIG. 3 shows that data having a large N ₁₀ value appears in a high dimension, and as a result, the distribution of N ₁₀ is distorted (not symmetric). The maximum N ₁₀ is 38 in FIG. 3A and 133 in FIG.

図４はＮ_１０値とデータ中心への類似度の関係を示す散布図である。図４（ａ）は低次元、図４（ｂ）は高次元における図である。Ｎ_１０値とデータ中心への類似度との間には、高次元で強い相関がみられることから、ハブ現象の起源は、高次元で発生するデータ中心へのバイアス、すなわち、ＳｐａｔｉａｌＣｅｎｔｒａｌｉｔｙであることが分かる。 Figure 4 is a scatter diagram showing a similarity relationship to N ₁₀ value and the data center. FIG. 4A is a diagram in a low dimension, and FIG. 4B is a diagram in a high dimension. Since a strong correlation is observed in the high dimension between the N ₁₀ value and the similarity to the data center, the hub phenomenon is originated from a bias to the data center occurring in the high dimension, that is, Spatial Centrality. I understand that.

〔攻撃シリングアタックとハブ現象との関係〕
ナノポウラス達（Ａ．Ｎａｎｏｐｏｕｌｏｓ，ａｎｄＭ．Ｒａｄｏｖａｎｏｖｉｃ，Ｍ．Ｉｖａｎｏｖｉｃ．Ｈｏｗｄｏｅｓｈｉｇｈｄｉｍｅｎｓｉｏｎａｌｉｔｙａｆｆｅｃｔｃｏｌｌａｂｏｒａｔｉｖｅｆｉｌｔｅｒｉｎｇ？ＩｎＰｒｏｃ．３ｒｄＡＣＭＣｏｎｆ．ｏｎＲｅｃｏｍｍｅｎｄｅｒＳｙｓｔｅｍｓ（ＲｅｃＳｙｓ），ｐａｇｅｓ２９３−２９６，２００９．）及びニース達（Ｐ．Ｋｎｅｅｓ，Ｄ．Ｓｃｈｎｉｔｚｅｒ，ａｎｄＡ．Ｆｌｅｘｅｒ．「Ｉｍｐｒｏｖｉｎｇｎｅｉｇｈｂｏｒｈｏｏｄ−ｂａｓｅｄｃｏｌｌａｂｏｒａｔｉｖｅｆｉｌｔｅｒｉｎｇｂｙｒｅｄｕｃｉｎｇｈｕｂｎｅｓｓ」、ＩｎＰｒｏｃ．ＩＣＭＲ’１４，ｐａｇｅｓ１６１−１６８，２０１４年）は、ユーザベースあるいはアイテムベースのＣＦにおいては、ｋＮＮは高次元で計算されるので、ハブ現象が出現すると報告した。通常、ユーザ数及びアイテム数は大きいので、コサイン類似度やピアソン相関のような類似度の計算に使用されるベクトルは高次元となり、ハブ現象が生じる。そして、他のデータのｋＮＮ内に頻繁に現れるハブデータは、多くの推薦を決定するのに影響する。しかし、ハブデータは多くのデータにとってあまり意味を持たないデータである。なぜなら、ハブデータは高次元でデータ中心に類似するという理由によってのみｋＮＮの中に頻繁に生じるのであり、個々のデータを特徴付けるための役には立たないからである。事実、ニース達の文献によれば、推薦システムのパフォーマンスはハブデータの存在により悪化する。さらに、ハブデータはシステムによる推薦アイテムの決定に強い影響を持つデータなので、もしもシステム外からハブデータを操ることができれば、システムを効果的に攻撃することが可能となる。
実際、ハブ現象は推薦システムを攻撃に対して危うくする。例えば、アベレジアタックによりシステムに投入された偽ユーザは、ハブデータとなることで、システムに大きな影響を与える。よって、ハブ現象の発生を抑えることは攻撃回避につながると考えられる。 [Relationship between Attack Shilling Attack and Hub Phenomenon]
(A. Nanopoulos, and M. Radovanovic, M. Ivanovic. How does high dimensionality impact collaborative filtering? 29 Proc. 3rd ACM Cons. 3rd ACM. P. Knees, D. Schnitzer, and A. Flexer, “Improving neighborhood-based collaborative filtering by reducing hubness”, In Proc. ICMR'14, pages 161-168. In item-based CF, kNN is because it is calculated at a high level, it reported that the hub phenomenon appears. Usually, since the number of users and the number of items are large, vectors used for calculation of similarity such as cosine similarity and Pearson correlation are high-dimensional, and a hub phenomenon occurs. And hub data that frequently appears in the kNN of other data affects the determination of many recommendations. However, hub data is data that does not have much meaning for many data. This is because hub data frequently occurs in kNN only because it is high-dimensional and similar to the data center, and is not useful for characterizing individual data. In fact, according to Nice et al., The performance of the recommendation system is exacerbated by the presence of hub data. Furthermore, since hub data is data that has a strong influence on the determination of recommended items by the system, if the hub data can be manipulated from outside the system, the system can be effectively attacked.
In fact, the hub phenomenon compromises the recommendation system against attacks. For example, a fake user entered into the system due to an average attack becomes hub data, which greatly affects the system. Therefore, suppressing the occurrence of the hub phenomenon is thought to lead to attack avoidance.

〔データ中心へのバイアス削減によるハブの抑制〕
データ中心との類似度が高い少数のデータがハブになるというのであれば、類似度尺度を全てのデータ対象がデータ中心に同等に類似になる類似度尺度に変換することにより、ハブ現象を抑制できると考えられる。かかる類似度（尺度）の変換は、与えられた類似度からコミュートタイムカーネルを計算することによって得られ、より簡易には、与えられた類似度をセンタリングすることによって得られる。
Ｎをデータ数とし、ＫをサイズＮの類似度行列とする。Ｋに対するコミュートタイムカーネルＫ^ＣＴは、式（４）で与えられる。

Ｋ^ＣＴ＝Ｌ^＋（Ｌの一般化逆行列）・・・（４）

ここに、Ｌ＝Ｄ−Ｋはグラフラプラシアンと呼ばれる。ＤはＤ_ｉｉ＝Σ_ｊＫ_ｉｊとなる対角行列である。 [Hub suppression by reducing bias toward data center]
If a small number of data with high similarity to the data center is a hub, the hub phenomenon is suppressed by converting the similarity scale to a similarity scale that makes all data objects equally similar to the data center. It is considered possible. Such conversion of the degree of similarity (scale) is obtained by calculating a commutation time kernel from a given degree of similarity, and more simply by centering the given degree of similarity.
Let N be the number of data and K be a similarity matrix of size N. The commutation time kernel K ^CT for K is given by equation (4).

K ^CT = L ⁺ (generalized inverse matrix of L) (4)

Here, L = D−K is called a graph Laplacian. D is a diagonal matrix such that D _ii = Σ _j K _ij .

次に、Ｉを単位行列、

を全要素が１であるＮ次元ベクトルとする。Ｋをセンタリングした類似度行列Ｋ^ＣＥＮＴは式（５）のように計算される。

Next, I is the identity matrix,

Is an N-dimensional vector with all elements being 1. A similarity matrix K ^CENT centered on K is calculated as shown in Equation (5).

図５に実施例１におけるアイテム推薦システム１の構成例を示す。
本実施例では、アイテム推薦システム１としてユーザベースのＣＦを説明する。すなわち、類似度演算に例えばｋ近傍法を用い、対象ユーザと評価が似ている他のユーザの過去の評価値を参照してアイテムを推薦するシステムである。アイテムが商品で、評価値が嗜好度の場合は、対象ユーザと嗜好が似ている他のユーザの過去の嗜好度を参照して商品を推薦するシステムである。
なお、図５の構成は実施例１（ユーザベースのＣＦ）及び実施例２（アイテムベースのＣＦ）の両者に適用可能な構成である。このため、ユーザベースのＣＦ及びアイテムベースのＣＦに共通する説明は本実施例で行うこととし、実施例２では差異を説明する程度とする。 FIG. 5 shows a configuration example of the item recommendation system 1 in the first embodiment.
In this embodiment, a user-based CF will be described as the item recommendation system 1. In other words, this is a system that recommends items by using, for example, the k-nearest neighbor method for similarity calculation and referring to past evaluation values of other users whose evaluation is similar to that of the target user. When the item is a product and the evaluation value is the preference level, the system recommends the product with reference to the past preference levels of other users who have similar preferences to the target user.
The configuration in FIG. 5 is applicable to both the first embodiment (user-based CF) and the second embodiment (item-based CF). For this reason, the description common to the user-based CF and the item-based CF will be described in the present embodiment, and in the second embodiment, the difference will be described to an extent.

アイテム推薦システム１は、データ及びコマンドを処理するパーソナルコンピュータ（ＰＣ）１０、各部で処理された又は入出力されたデータ・コマンド等を表示する表示部１８、データ及びコマンドを入出力するための入出力部１９、及び各部で処理された又は入出力されたデータ・コマンド等を記憶する記憶部２０を含んで構成される。
パーソナルコンピュータ（ＰＣ）１０は、ユーザ及びアイテムを登録する登録部１１、ユーザのアイテムに係る評価の程度を表す評価値を評価マトリックスに記入する評価部１２、類似度尺度を用いてユーザ間の類似度及び／又はアイテム間の類似度を演算する類似度演算部１３、類似度の高い方から例えばｋ個の対象データ（ユーザ及び／又はアイテム）を抽出する近傍データ抽出部１４、評価マトリックスＲのセルに評価値が記入されていない時に、近傍データ抽出部１４で抽出された評価値に基づいて、セルに記入されるであろうと予測される評価値を予測する評価値予測部１５、評価値予測部１５にて高い評価値を予測されたアイテムを推薦するアイテム推薦部１６、アイテム推薦システム１の各部を制御して、アイテム推薦システムとして機能させる制御部１７を備える。 The item recommendation system 1 includes a personal computer (PC) 10 for processing data and commands, a display unit 18 for displaying data / commands processed or input / output by each unit, and an input / output for inputting / outputting data and commands. The output unit 19 includes a storage unit 20 that stores data commands and the like processed or input / output by each unit.
A personal computer (PC) 10 includes a registration unit 11 for registering users and items, an evaluation unit 12 for entering an evaluation value indicating the degree of evaluation of a user's item in an evaluation matrix, and similarity between users using a similarity scale The similarity calculation unit 13 that calculates the degree and / or similarity between items, the neighborhood data extraction unit 14 that extracts, for example, k target data (users and / or items) from the higher similarity, the evaluation matrix R An evaluation value prediction unit 15 for predicting an evaluation value predicted to be written in the cell based on the evaluation value extracted by the neighborhood data extraction unit 14 when the evaluation value is not written in the cell; An item recommendation system that controls an item recommendation unit 16 that recommends an item for which a high evaluation value is predicted by the prediction unit 15 and the item recommendation system 1 is controlled. And a control unit 17 to function as a.

ここにおいて、登録部１１はユーザを登録するユーザ登録部１１１とアイテムを登録するアイテム登録部１１２を有する。本実施例では、類似度演算部１３は、類似度尺度として、ハブを抑制する類似度尺度を用いて、ユーザ間の類似度及び／又はアイテム間の類似度を演算する。詳しくは、類似度演算部１３はハブを抑制する類似度尺度を用いて、評価マトリックスＲの各行のユーザの評価値に着目してユーザ間の類似度を演算する第１の類似度演算部１３１と、各列のアイテムの評価値に着目してアイテム間の類似度を演算する第２の類似度演算部１３２を有する。また、一般的な類似度尺度に基づく類似度からハブを抑制する類似度尺度に基づく類似度への変換を行う類似度尺度変換部１３５を有する。近傍データ抽出部１４は、類似度の高い方から例えばｋ個のユーザを抽出する第１の近傍データ抽出部１４１と、類似度の高い方から例えばｋ個のアイテムを抽出する第２の近傍データ抽出部１４２を有する。評価値予測部１５は、第１の近傍データ抽出部１４１で抽出された評価値に基づいて、対象ユーザの評価値を予測する第１の評価値予測部１５１と、第２の近傍データ抽出部１４２で抽出された評価値に基づいて、対象ユーザの評価値を予測する第２の評価値予測部１５２を有する。第１の評価値予測部１５１及び第２の評価値予測部は、対象ユーザに係る未記入のセルに記入すべき評価値を予測するに際し、記入すべき評価値として例えば平均値を用いることができる。また、重み付けをした平均値を用いるのが、推薦精度を高くできるので好ましい。 Here, the registration unit 11 includes a user registration unit 111 that registers users and an item registration unit 112 that registers items. In the present embodiment, the similarity calculation unit 13 calculates the similarity between users and / or the similarity between items using a similarity scale that suppresses the hub as a similarity scale. Specifically, the similarity calculation unit 13 uses a similarity measure that suppresses the hub, and pays attention to the evaluation value of the user in each row of the evaluation matrix R to calculate the similarity between users. And the 2nd similarity calculation part 132 which calculates the similarity between items paying attention to the evaluation value of the item of each row | line | column. In addition, a similarity scale conversion unit 135 that performs conversion from similarity based on a general similarity scale to similarity based on a similarity scale that suppresses the hub is provided. The neighborhood data extraction unit 14 includes, for example, a first neighborhood data extraction unit 141 that extracts, for example, k users from the higher degree of similarity, and second neighborhood data that extracts, for example, k items from the higher degree of similarity. An extraction unit 142 is included. The evaluation value prediction unit 15 includes a first evaluation value prediction unit 151 that predicts an evaluation value of the target user based on the evaluation value extracted by the first neighborhood data extraction unit 141, and a second neighborhood data extraction unit. Based on the evaluation value extracted in 142, the second evaluation value prediction unit 152 that predicts the evaluation value of the target user is included. The first evaluation value prediction unit 151 and the second evaluation value prediction unit use, for example, an average value as an evaluation value to be entered when predicting an evaluation value to be entered in an unfilled cell related to the target user. it can. In addition, it is preferable to use a weighted average value because recommendation accuracy can be increased.

また、記憶部２０は評価マトリックスＲを記憶する評価マトリックス記憶部２１、類似度尺度及び第１の類似度演算部１３１で演算されたユーザに関する類似度データ、及び／又は、第２の類似度演算部１３２で演算されたアイテムに関する類似度データを記憶する類似度尺度記憶部２２を有する。第１の類似度演算部１３１及び第２の類似度演算部１３２の演算データはハブデータが出現しにくい類似度尺度を用いて演算したものである。類似度尺度記憶部２２は、一般的な類似度尺度を記憶してもよい。また、記憶部２０は近傍データ抽出部１４にて抽出されたデータ（ユーザ及び／又はアイテム）を記憶する近傍データ記憶部２３、評価値推定部１５で推定されたアイテムを記憶する推薦アイテム記憶部２４を有する。類似度尺度記憶部２２は、類似度データの他に一般的な類似度尺度及び／又はハブの出現を抑制する類似度尺度を記憶する。ハブの出現を抑制する類似度尺度を記憶せず、一般的な類似度尺度を記憶している場合には、類似度尺度変換部１３５にて一般的な類似度尺度をハブの出現を抑制する類似度尺度へ変換し、類似度尺度記憶部２２には得られたハブの出現を抑制する類似度尺度が記憶し直される。類似度の変換には、例えば一般的な類似度尺度の式をハブの出現を抑制する類似度尺度の式に変換して類似度を求める方法、一般的な類似度尺度で求めた類似度を行列によりハブの出現を抑制する類似度尺度に基づく類似度に変換する方法等が使用される。この場合、変換前の一般的な類似度尺度を消去せずに残しておいても良い。一般的な類似度尺度とハブの出現を抑制する類似度尺度を共に記憶しておくと、類似度演算結果及びアイテム推薦に係る評価値予測結果を組み合わせて用いたり、比較したりすることができる。近傍データ記憶部２３は、第１の近傍データ抽出部１４１にて抽出されたユーザを記憶する第１の近傍データ記憶部２３１と、第２の近傍データ抽出部１４２にて抽出されたアイテムを記憶する第２の近傍データ記憶部２３２を有する。推薦アイテム記憶部２４には、ユーザに対してアイテム推薦時に表示したい内容が記憶される。例えば、アイテム名の他に、アイテムについての説明、アイテムを使用するための説明、アイテムの画像等を記憶する。これらの内容は推薦時に表示部１８に表示される。 In addition, the storage unit 20 stores the evaluation matrix R for storing the evaluation matrix R, the similarity measure for the user calculated by the similarity measure and the first similarity calculation unit 131, and / or the second similarity calculation. The similarity measure storage unit 22 stores similarity data related to the item calculated by the unit 132. The calculation data of the first similarity calculation unit 131 and the second similarity calculation unit 132 are calculated using a similarity measure that makes it difficult for hub data to appear. The similarity scale storage unit 22 may store a general similarity scale. In addition, the storage unit 20 includes a neighborhood data storage unit 23 that stores data (user and / or item) extracted by the neighborhood data extraction unit 14, and a recommended item storage unit that stores items estimated by the evaluation value estimation unit 15. 24. The similarity scale storage unit 22 stores a general similarity scale and / or a similarity scale that suppresses the appearance of a hub in addition to the similarity data. When the similarity measure that suppresses the appearance of the hub is not stored but the general similarity measure is stored, the similarity measure conversion unit 135 suppresses the appearance of the hub with the general similarity measure. The similarity scale is converted into a similarity scale, and the similarity scale storage unit 22 stores the obtained similarity scale that suppresses the appearance of the hub. For similarity conversion, for example, a general similarity scale formula is converted to a similarity scale formula that suppresses the appearance of hubs, and the similarity is calculated. A method of converting to a similarity based on a similarity measure that suppresses the appearance of a hub by a matrix is used. In this case, a general similarity measure before conversion may be left without being deleted. If a general similarity measure and a similarity measure that suppresses the appearance of a hub are stored together, the similarity calculation result and the evaluation value prediction result related to item recommendation can be used in combination or compared. . The neighborhood data storage unit 23 stores the first neighborhood data storage unit 231 that stores the users extracted by the first neighborhood data extraction unit 141 and the items extracted by the second neighborhood data extraction unit 142. A second neighborhood data storage unit 232 is provided. The recommended item storage unit 24 stores contents to be displayed when recommending items to the user. For example, in addition to the item name, a description about the item, a description for using the item, an image of the item, and the like are stored. These contents are displayed on the display unit 18 at the time of recommendation.

なお、本実施例では、ユーザと類似度の高い方からｋ人を抽出し、ｋ人の評価値の平均としてユーザの評価値を予測するので、類似度演算部１３として第１の類似度演算部１３１、近傍データ抽出部１４として第１の近傍データ抽出部１４１、評価値予測部１５として第１の評価値予測部１５１、近傍データ記憶部２３としての近傍ユーザ記憶部２３１を使用できれば良く、第２の類似度演算部１３２、第２の近傍データ抽出部１４２、第２の評価値予測部１５２、近傍アイテム記憶部２３２は無くても良い。これらは実施例２で使用される。 In the present embodiment, k people are extracted from those having higher similarity to the user, and the user evaluation value is predicted as an average of the k evaluation values, so that the similarity calculation unit 13 performs the first similarity calculation. Unit 131, first neighborhood data extraction unit 141 as neighborhood data extraction unit 14, first assessment value prediction unit 151 as assessment value prediction unit 15, and neighborhood user storage unit 231 as neighborhood data storage unit 23 may be used. The second similarity calculation unit 132, the second neighborhood data extraction unit 142, the second evaluation value prediction unit 152, and the neighborhood item storage unit 232 may be omitted. These are used in Example 2.

図６に実施例１におけるアイテム推薦方法の処理フロー例を示す。図６（ａ）は実施例１における処理フロー例を示す図、図６（ｂ）は後述する実施例２における処理フロー例を示す図である。
まず、評価マトリックスＲにアイテムｉ（Ｓ１０１：アイテム登録工程）、及びユーザｕを登録する（Ｓ１０２：ユーザ登録工程）。アイテムｉの登録とユーザｕの登録はどちらが先でも良く、並行して行っても良い。次に、評価マトリックスＲにユーザｕのアイテムｉに係る評価の程度を表す評価値Ｒ（ｕ，ｉ）を登録する（Ｓ１０３：評価値登録工程）。本実施例ではアイテムを商品とし、評価値を嗜好度とし、評価マトリックスＲを嗜好度マトリックスとする。評価は例えば５段階評価（１〜５の整数)）で行う。ユーザ自身が登録しても良く、システム側で過去のユーザの当該アイテムに係る振る舞いを参照して登録しても良い。必ずしもマトリックスＲ全体を記入する必要はなく、空欄のセルがあっても良く、通常は大部分が空欄になっている。評価マトリックスＲは評価マトリックス記憶部２１に記憶される（Ｓ１０４：評価マトリックス記憶工程）。 FIG. 6 shows a processing flow example of the item recommendation method in the first embodiment. FIG. 6A is a diagram illustrating an example of a processing flow in the first embodiment, and FIG. 6B is a diagram illustrating an example of a processing flow in the second embodiment described later.
First, an item i (S101: item registration process) and a user u are registered in the evaluation matrix R (S102: user registration process). Either the item i registration or the user u registration may be performed first or in parallel. Next, an evaluation value R (u, i) indicating the degree of evaluation related to the item i of the user u is registered in the evaluation matrix R (S103: evaluation value registration step). In this embodiment, an item is a product, an evaluation value is a preference level, and an evaluation matrix R is a preference matrix. Evaluation is performed by, for example, a five-step evaluation (an integer of 1 to 5). The user himself / herself may be registered, or may be registered by referring to the behavior related to the item of the past user on the system side. It is not always necessary to fill in the entire matrix R, and there may be blank cells, and most of them are usually blank. The evaluation matrix R is stored in the evaluation matrix storage unit 21 (S104: evaluation matrix storage step).

次に、評価マトリックスＲに基づいて各ユーザに対して推薦すべきアイテムを定める。まず、ユーザ本人に似た他のユーザを求めるための類似度演算を行う。類似度演算を行うに際し、類似度尺度記憶部２２には類似度尺度として予め一般的な類似度尺度又はハブの出現を抑制する類似度尺度が記憶されているものとする。まず、類似度尺度記憶部２２にハブの出現を抑制する類似度尺度が有るか無いかを判定する（Ｓ１０５：ハブ抑制類似度尺度の有無判定工程）。類似度尺度記憶部２２に、ハブの出現を抑制する類似度尺度が記憶されておらず、一般的な類似度尺度が記憶されている場合には（Ｓ１０５でＮｏの場合）、一般的な類似度尺度に基づく類似度をハブの出現を抑制する類似度尺度に基づく類似度に変換する（Ｓ１０６：類似度尺度変換工程）。ハブの出現を抑制する類似度尺度として、例えば全てのデータ対象がデータ中心に同等に類似になる類似度尺度、すなわちＳｐａｔｉａｌＣｅｎｔｒａｌｉｔｙのない類似度尺度を使用できる。具体的には、例えば、センタリングを行う又はコミュートタイムカーネルへの変換を行う。変換されたハブを抑制する類似度尺度は類似度尺度記憶部２２に記憶される。次に、ハブの出現を抑制する類似度尺度を用いてユーザ間の類似度を演算する（Ｓ１０７：第１の類似度演算工程）。なお、類似度の変換を行列で行う場合には類似度尺度変換工程（Ｓ１０６）と第１の類似度演算工程（Ｓ１０７）とが一括して行われる。この場合類似度尺度は必ずしも式として残るとは限らないが、演算結果においてハブを抑制する類似度尺度に基づく類似度データに内在して残ることになる。類似度尺度記憶部２２に、ハブの出現を抑制する類似度尺度がすでに記憶されている場合には（Ｓ１０５でＹｅｓの場合）、類似度尺度変換工程（Ｓ１０６）を省略し、ハブの出現を抑制する類似度尺度を用いてユーザ間の類似度を演算する（Ｓ１０７：第１の類似度演算工程）。ハブの出現を抑制する類似度尺度を用いて演算された結果は、類似度尺度記憶部２２に記憶される。 Next, an item to be recommended to each user is determined based on the evaluation matrix R. First, similarity calculation is performed to obtain another user similar to the user himself / herself. When performing the similarity calculation, it is assumed that a similarity measure storage unit 22 stores a general similarity measure or a similarity measure that suppresses the appearance of a hub in advance as a similarity measure. First, it is determined whether or not there is a similarity measure that suppresses the appearance of the hub in the similarity measure storage unit 22 (S105: presence / absence determination step of a hub suppression similarity measure). When the similarity measure storage unit 22 does not store the similarity measure that suppresses the appearance of the hub and stores a general similarity measure (in the case of No in S105), the general similarity is stored. The similarity based on the degree scale is converted into the similarity based on the similarity scale that suppresses the appearance of the hub (S106: similarity scale conversion step). As a similarity measure for suppressing the appearance of the hub, for example, a similarity measure in which all data objects are equally similar to the data center, that is, a similarity measure without Spatial Centrality can be used. Specifically, for example, centering or conversion to a commuted time kernel is performed. The similarity measure for suppressing the converted hub is stored in the similarity measure storage unit 22. Next, the similarity between users is calculated using the similarity measure that suppresses the appearance of the hub (S107: first similarity calculation step). Note that when the similarity conversion is performed in a matrix, the similarity scale conversion step (S106) and the first similarity calculation step (S107) are performed in a lump. In this case, the similarity measure does not necessarily remain as an expression, but remains in the similarity data based on the similarity measure that suppresses the hub in the calculation result. When the similarity measure that suppresses the appearance of the hub is already stored in the similarity measure storage unit 22 (Yes in S105), the similarity measure conversion step (S106) is omitted, and the appearance of the hub is determined. The similarity between users is calculated using the similarity measure to suppress (S107: 1st similarity calculation process). The result calculated using the similarity measure that suppresses the appearance of the hub is stored in the similarity measure storage unit 22.

次に、類似度尺度記憶部２２に記憶された類似度が高い方から例えばｋ人のユーザを抽出する（Ｓ１０８：第１の近傍データ抽出工程）。そして、抽出されたｋ人のユーザの平均評価値等に基づき、対象ユーザの空欄になっている評価値を予測する（Ｓ１０９：第１の評価値予測工程）。第１の評価値予測工程（Ｓ１０９）では、対象ユーザに係る未記入のセルに記入すべき評価値を予測するに際し、例えば平均値を用いて予測することができる。また、重み付けをした平均値を用いるのが好ましい。最後に、予測値の高いアイテムを推薦する（Ｓ１１０：アイテム推薦工程）。例えば、アイテムを提供するインターネットのサイトにユーザが訪れた時に、当該ユーザに関して予測値の高い順にアイテムを提示する。また、電子メールで当該ユーザ宛に配信しても良い。 Next, for example, k users are extracted from the one with the higher similarity stored in the similarity scale storage unit 22 (S108: first neighborhood data extraction step). Then, based on the extracted average evaluation values of k users and the like, an evaluation value that is blank for the target user is predicted (S109: first evaluation value prediction step). In the first evaluation value predicting step (S109), when an evaluation value to be entered in an unfilled cell related to the target user is predicted, it can be predicted using, for example, an average value. It is preferable to use a weighted average value. Finally, an item with a high predicted value is recommended (S110: item recommendation step). For example, when a user visits an Internet site that provides an item, items are presented in descending order of the predicted value for the user. Moreover, you may deliver to the said user with an email.

〔実験〕
ハブデータの出現を抑制することが、ユーザベースのＣＦを、アベレジアタックに対して頑健にすることを確かめる実験を行った。実験には、推薦タスクのベンチマークデータとして使用されるムービーレンズ１Ｍデータセット（ｍ１−１ｍ）を用いた。このデータセットは、６，０４０ユーザ、３，７０６アイテムに対する１，０００，２０９個の評価値（整数１〜５の５段階評価）から成る。どのユーザも少なくとも２０アイテムを評価している。ベースラインとなるユーザ間の類似度尺度として、一般的に使われるコサイン類似度（Ｃｏｓ）、及び、ピアソン相関とｓｈｒｕｎｋｅｎピアソン相関（Ｐｅａｒｓｏｎ）を用い、式（１）を用いて評価値を予測した。ｓｈｒｕｎｋｅｎピアソン相関がピアソン相関より良い精度が出ることが知られているので、今回はピアソン相関として、ｓｈｒｕｎｋｅｎピアソン相関を使用する。今後は、ｓｈｒｕｎｋｅｎピアソン相関をピアソン相関（Ｐｅａｒｓｏｎ）と表記する。ピアソン相関（Ｐｅａｒｓｏｎ）のパラメータβは、過去の研究報告に倣い、β＝１００に設定した。ハブデータの出現を抑制するための方法として、ベースラインとなる類似度を式（４）によりコミュートタイムカーネルに変換する方法（ＣＴ）、又は、式（５）によりセンタリング変換する方法（Ｃ_ＥＮＴ）を用いた。この実験の主な目的は変換の前後における攻撃に対するシステムのロバスト性（耐性）を比較することである。 [Experiment]
Experiments were conducted to ensure that suppressing the appearance of hub data makes user-based CF robust against average attacks. In the experiment, a movie lens 1M data set (m1-1m) used as benchmark data for a recommended task was used. This data set consists of 1,000,209 evaluation values (5-level evaluation of integers 1 to 5) for 6,040 users and 3,706 items. Every user has rated at least 20 items. The evaluation value was predicted using the formula (1) by using commonly used cosine similarity (Cos) and Pearson correlation and shrunken Pearson correlation (Pearson) as the similarity measure between users serving as baselines. . Since it is known that the shrunken Pearson correlation is more accurate than the Pearson correlation, this time, the shrunken Pearson correlation is used as the Pearson correlation. From now on, the shrunken Pearson correlation will be referred to as Pearson correlation (Pearson). The Pearson correlation parameter (Pearson) parameter β was set to β = 100 in accordance with past research reports. As a method for suppressing the appearance of hub data, a method of converting the similarity as a baseline into a commuted time kernel (CT) according to equation (4), or a method of centering conversion according to equation (5) (C _ENT ) Was used. The main purpose of this experiment is to compare the robustness (resistance) of the system to attacks before and after conversion.

〔攻撃が無いときの予測精度〕
攻撃に対するロバスト性を調べる前に、攻撃が無いときに、ＣＴ変換及びＣ_ＥＮＴ変換が評価値の予測精度を劣化させることがないか否かを検証した。推薦業務をシミュレートするために、データセット中の１，０００，２０９個の評価値を、９３９，８０９個の訓練データ（テストデータの予測に使用するデータ）と６０，４００個のテストデータ（予測の対象となるデータ）に分割した。ＣＦアルゴリズムの評価に一般的に使用される平均絶対誤差（ＭＡＥ）を用いて、変換前後の類似度尺度の良し悪しを比較した。

ＭＡＥ＝１/｜Ｔ｜Σ_{（ｕ，ｉ）∈Ｔ}｜Ｐｒｅｄ（ｕ，ｉ）−Ｒ（ｕ，ｉ）｜

として計算した。ここにＴはテストデータ（|Ｔ|＝６０４００）として与えられたユーザ−アイテムのペアの組である。 [Prediction accuracy when there is no attack]
Before examining the robustness against the attack, it was verified whether or not the CT conversion and the _CENT conversion do not deteriorate the prediction accuracy of the evaluation value when there is no attack. In order to simulate the recommendation work, 1,000,209 evaluation values in the data set are converted into 939,809 training data (data used for prediction of test data) and 60,400 test data ( Divided into data subject to prediction). The average absolute error (MAE) generally used for the evaluation of the CF algorithm was used to compare the similarity measures before and after the conversion.

MAE = 1 / | T | Σ _{(u, i) ∈T} | Pred (u, i) −R (u, i) |

As calculated. Here, T is a set of user-item pairs given as test data (| T | = 60400).

図７に最近傍パラメータｋを１０から１００の間で変動させ、ベースラインとなる類似度尺度であるコサイン類似度（Ｃｏｓ）及びピアソン相関（Ｐｅａｒｓｏｎ）を、コミュートタイムカーネル変換（ＣＴ）あるいはセンタリング変換（Ｃ_ＥＮＴ）した場合の、平均絶対誤差（ＭＡＥ）を比較して示す。図７より、Ｃ_ＥＮＴは殆どの場合にＭＡＥを減少（予測精度を増加し）させ、ＣＴはピアソン相関の場合はＭＡＥを減少する。このことから、Ｃ_ＥＮＴ変換及びピアソン相関に対するＣＴ変換は、攻撃が無い時の予測精度を悪化させるどころか、改良することが分かる。
以下でアベレジアタックに対するロバスト性を評価するに際し、上記実験で概ねベストとなるＭＡＥを達成するｋ＝５０と設定する。 In FIG. 7, the nearest neighbor parameter k is varied between 10 and 100, and the cosine similarity (Cos) and the Pearson correlation (Pearson), which are the similarity measures as a baseline, are converted into a commuted time kernel transform (CT) or a centering transform The average absolute error (MAE) in the case of (C _ENT ) is compared and shown. From FIG. 7, C _ENT decreases MAE in most cases (increases prediction accuracy), and CT decreases MAE in the case of Pearson correlation. From this, it can be seen that the C _ENT transform and the CT transform for Pearson correlation are improved rather than worsening the prediction accuracy when there is no attack.
In the following, when evaluating the robustness against the average attack, k = 50 is set to achieve the best MAE in the above experiment.

〔攻撃に対するロバスト性〕
アベレジアタックのターゲットアイテムとして２１個の映画アイテムを選択した。これらのアイテムは、アベレジアタックに関する最初の研究を行ったラム達（Ｓ．Ｋ．ＬａｍａｎｄＪ．Ｒｉｅｄｌ．Ｓｈｉｌｌｉｎｇ．ＲｅｃｏｍｍｅｎｄｅｒＳｙｓｔｅｍｓｆｏｒＦｕｎａｎｄＰｒｏｆｉｔ．ＩｎＰｒｏｃ．ＷＷＷ ’０４，ｐａｇｅｓ３９３−４０２，２００４年）が実験で用いたアイテムにできるだけ近いもの（評価ユーザ数、平均評価値の観点から）になるように選んだ。アベレジアタックとして、１００の偽ユーザを投入し、偽ユーザはターゲットアイテムには高い評価（すなわち５）を付与し、残り他のアイテムにはノイズを加えた平均的評価を付与するよう作成した。すなわち、残りの各アイテムの各々に対して、μ＝平均評価値、σ＝１．０となる、正規分布（μ；σ）に従う乱数を生成し、もっとも近い整数値１〜５に変換して付与した。予測シフトと呼ばれる値、すなわち、攻撃前後の予測評価値の差となる値を用い、変換前後の類似度尺度の良し悪しを比較した。より正確には、訓練用データを除き、誠実なユーザ（偽ユーザを除く全ユーザ）とターゲットアイテムの各ペアに対する予測シフトを計算し、その平均値を比較に用いた。 [Robustness against attack]
Twenty-one movie items were selected as target items for the average attack. These items were found in Lam et al. (SK Lam and J. Riedl. Shilling. Recommenders Systems for Fun and Profit. In Proc. WWW '04, pages 393-402, 2004. Year) was selected to be as close as possible to the items used in the experiment (in terms of the number of evaluation users and average evaluation values). As an average attack, 100 fake users were introduced, and the fake user was given a high evaluation (that is, 5) for the target item, and an average evaluation with noise added to the remaining other items. That is, for each of the remaining items, a random number according to a normal distribution (μ; σ), where μ = average evaluation value and σ = 1.0, is generated and converted to the nearest integer value 1-5. Granted. A value called a prediction shift, that is, a value that is a difference between prediction evaluation values before and after the attack, was used to compare the similarity measures before and after the conversion. More precisely, except for training data, a prediction shift was calculated for each pair of honest users (all users except fake users) and target items, and the average value was used for comparison.

表１はアベレジアタックにより生じた予測シフトと、Ｎ_ｋ分布の歪度を示す。大きいＮ_ｋを持つデータ、すなわちハブデータが存在するほど、Ｎ_ｋ分布の歪度は大きい値となる。Ｎ_ｋ分布の歪度は、Ｓ_Ｎｋ＝Ｅ〔（Ｎ_ｋ−μ_Ｎｋ）^３／σ_Ｎｋ ^３〕（Ｅ〔〕は期待オペレータ、μ_Ｎｋとσ_ＮｋはそれぞれＮｋ分布の平均と標準分散である）で表される。Ｎ_ｋ分布の歪度、予測シフトのどちらも、Ｃ_ＥＮＴ又はＣＴ変換後に減少した。このことは、変換された類似度尺度の使用は、ハブデータの出現を抑制し、その結果、推薦システムを攻撃に対してロバストにすることを示している。

Table 1 shows the prediction shift caused by the average attack and the skewness of the _Nk distribution. The more the data having a large N _k , that is, the hub data, the greater the skewness of the N _k distribution. Skewness of N _k _{distribution, S Nk} = E _{_{^{[(N k -μ Nk) 3 /}}} σ Nk 3 ] (E [] is the expectation operator, the mu _Nk and sigma _Nk is the average and the standard variance of each Nk distribution ). Both skewness and prediction shift of _Nk distribution decreased after _CENT or CT conversion. This indicates that the use of the transformed similarity measure suppresses the appearance of hub data, thus making the recommendation system robust against attacks.

図８及び図９は、ユーザ間類似度尺度として一般的なピアソン相関を用いた場合、投入された偽ユーザがハブとなること、及び、類似度尺度の変換によって偽ユーザがハブとなりにくくなることを示す図（その１及びその２）である。図８は、ユーザに関するＮ_５０値とデータ中心への類似度との関係を見るために、各々のユーザをプロットした散布図である。横軸はＮ_５０、縦軸はデータ中心への類似性を示す。図８より、投入された偽ユーザはデータ中心と高い類似度を持ち、ゆえにハブとなっていることが見て取れる。図９（ａ），（ｂ），（ｃ）は、Ｎ_５０値に係るユーザのヒストグラムである。図９（ａ）はピアソン相関（オリジナル）を、図９（ｂ）はセンタリング変換後のピアソン相関を、図９（ｃ）はコミュートタイムカーネル変換後のピアソン相関を、それぞれ類似度尺度として使用した場合のヒストグラムである。
次に、ハブ現象において、何故、Ｃ_ＥＮＴ又はＣＴがロバスト性を提供するかを解析する。 FIGS. 8 and 9 show that when a general Pearson correlation is used as a similarity measure between users, the input false user becomes a hub, and the conversion of the similarity measure makes it difficult for the false user to become a hub. (No. 1 and No. 2) 8, in order to see the relation between the similarity to the N ₅₀ value and the data center about the user is a scatter diagram plotting the respective user. The horizontal axis represents N ₅₀ , and the vertical axis represents the similarity to the data center. From FIG. 8, it can be seen that the input fake user has a high degree of similarity with the data center and is therefore a hub. Figure 9 (a), (b) , (c) _is a histogram of the user according to the _{N 50} value. Fig. 9 (a) used Pearson correlation (original), Fig. 9 (b) used Pearson correlation after centering transformation, and Fig. 9 (c) used Pearson correlation after commutation time kernel transformation as similarity measures. Is a histogram of the case.
Next, we analyze why _CENT or CT provides robustness in the hub phenomenon.

図８の散布図において、最大９６１のＮ_５０値を持つハブユーザが存在すること、及び、Ｎ_５０値とデータ中心への類似度との間に強い相関が生じていることが見て取れる。誠実なユーザは〇、投入された偽ユーザは×で示されるが、投入された偽ユーザは平均的な誠実なユーザを模倣して作られているため、ユーザに関するデータ中心と高い類似度を持つ。それ故、図９（ａ）のＮ_５０分布から分かるように、多くの誠実なユーザと比較して、投入された偽ユーザは大きいＮ_５０値（最小４６５、最大９６１）を有するハブユーザ（インフルエンサ）となる。これに対して、Ｃ_ＥＮＴ又はＣＴ変換は、ハブ現象の発生を抑制する。結果的に、図９（ｂ）に示すように、Ｃ_ＥＮＴでは、偽ユーザのＮ_５０値は最小１０１、最大１５６に減少した。また、図９（ｃ）に示すように、ＣＴを用いた場合は、最小０、最大４に減少した。このことは、オリジナルのピアソン相関をユーザ間類似度尺度として使用する場合と比較して、Ｃ_ＥＮＴ又はＣＴ変換後の類似度尺度を使用することにより、投入された偽ユーザは、他のユーザのｋＮＮにさほど頻繁に表れなくなったことを明確に示している。つまり、偽ユーザは推薦アイテムの決定にさほど影響しないようになった（もはや投入された偽ユーザはインフルエンサではない）。 In the scatter diagram of FIG. 8, it can be seen that there are hub users having N ₅₀ values of up to 961 and that there is a strong correlation between the N ₅₀ values and the similarity to the data center. Authentic users are indicated by 〇, and input fake users are indicated by ×, but since the input fake users are imitated by average sincere users, they have a high degree of similarity with the data center on users. . Therefore, as can be seen from the N ₅₀ distribution in FIG. 9 (a), compared to many sincere users, the input fake user is a hub user (influencer) with a large N ₅₀ value (min 465, max 961). ) On the other hand, the C _ENT or CT conversion suppresses the occurrence of the hub phenomenon. As a result, as shown in FIG. 9B, in C _ENT , the N ₅₀ value of the fake user decreased to a minimum of 101 and a maximum of 156. Further, as shown in FIG. 9 (c), when CT was used, it decreased to a minimum of 0 and a maximum of 4. This is because, compared to using the original Pearson correlation as a user-to-user similarity measure, by using the _CENT or CT-transformed similarity measure, the input fake user can It clearly shows that kNN does not appear so often. In other words, fake users no longer affect the decision of recommended items (no longer fake users are influencers).

以上から、ハブ現象の発生を抑制するように類似度尺度を変換することにより、攻撃に対してロバスト、かつ、オリジナルな類似度尺度と同等又は良い予測精度を示す推薦システムを得られることが分かった。 From the above, it can be seen that by converting the similarity measure so as to suppress the occurrence of the hub phenomenon, it is possible to obtain a recommendation system that is robust against attacks and has the same or better prediction accuracy as the original similarity measure. It was.

〔結論〕
外部から偽ユーザを投入することによって推薦されるアイテムの変更を狙う攻撃に対し、ハブ現象を抑制することによって協調フィルタリング（ＣＦ）をロバストにする方法を提案した。
我々のアプローチは、ハブ現象はシリングアタックにより利用される主要因子の１つであるという基盤に立つものである。我々は、ハブデータの出現を抑制する２つの変換（センタリング及びコミュートタイムカーネルへの変換）を、一般的に使用される類似度尺度（コサイン類似度及びピアソン相関）に適用した。ムービーレンズデータセットを用いて、これらの変換が推薦システムを、推薦精度を劣化させることなく、シリングアタックに対してロバストにすることを示した。 [Conclusion]
We proposed a method to make collaborative filtering (CF) robust by suppressing the hub phenomenon against attacks aimed at changing recommended items by introducing fake users from outside.
Our approach rests on the basis that the hub phenomenon is one of the main factors used by shilling attacks. We applied two transforms that suppress the appearance of hub data (centering and commutation time kernel transformations) to commonly used similarity measures (cosine similarity and Pearson correlation). Using a movie lens data set, we have shown that these transformations make the recommendation system robust to shilling attacks without degrading recommendation accuracy.

以上により、本実施例によれば、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ない、アイテム推薦システム及びアイテム推薦方法を提供できる。 As described above, according to the present embodiment, it is possible to provide an item recommendation system and an item recommendation method that are less affected by an attack as a result even if an attack in which a fake user is introduced.

実施例１（ユーザベースＣＦ）では、ユーザ間の類似度に基づいて評価値を予測したが、本態様ではアイテム間の類似度に基づいて評価値を予測する例について説明する。すなわち、実施例２（アイテムベースＣＦ）では、対象アイテムと類似度の高い方からｋ個のアイテムを抽出し、そのｋ個のアイテムに対して対象ユーザが過去に与えた評価値の平均として、対象ユーザの対象アイテムに対する評価値を予測する例について説明する。 In Example 1 (user base CF), the evaluation value is predicted based on the similarity between users, but in this embodiment, an example in which the evaluation value is predicted based on the similarity between items will be described. That is, in Example 2 (item base CF), k items are extracted from the one having a higher degree of similarity to the target item, and the average of the evaluation values previously given to the k items by the target user is as follows: An example of predicting an evaluation value for a target item of a target user will be described.

実施例１に比して、類似度演算部１３では、ユーザ間の類似度を演算する第１の類似度演算部１３１に代えて、アイテム間の類似度を演算する第２の類似度演算部１３２を使用する。アイテム間の類似度尺度として例えば全てのアイテムがアイテムに関するデータ中心に同等に類似になるものを使用する。近傍データ抽出部１４では、第１の近傍データ抽出部１４１に代えて、第２の類似度演算部１３２にて演算されたアイテム間の類似度を用いて、ターゲットアイテムとの類似度の高い方からｋ個のアイテムを抽出する第２の近傍データ抽出部１４２を使用する。評価値予測部１５では、第１の評価値予測部１５１に代えて、第２の近傍データ抽出部１４２にて抽出されたｋ個のアイテムに係る評価値を用いて、対象ユーザのターゲットアイテム対する未記入のセルに記入すべき評価値を予測する第２の評価値予測部１５２を使用する。その他の構成は実施例１と同様である。 Compared to the first embodiment, the similarity calculation unit 13 replaces the first similarity calculation unit 131 that calculates the similarity between users, and a second similarity calculation unit that calculates the similarity between items. 132 is used. As a similarity measure between items, for example, a measure in which all items are equally similar to the data center related to the item is used. The neighborhood data extraction unit 14 uses the similarity between items calculated by the second similarity calculation unit 132 instead of the first neighborhood data extraction unit 141, and uses the one with the higher similarity to the target item. The second neighborhood data extraction unit 142 is used to extract k items from. In the evaluation value prediction unit 15, instead of the first evaluation value prediction unit 151, the evaluation values for the k items extracted by the second neighborhood data extraction unit 142 are used for the target item of the target user. A second evaluation value predicting unit 152 that predicts an evaluation value to be entered in an unfilled cell is used. Other configurations are the same as those of the first embodiment.

実施例１に比して、類似度演算工程では、ユーザ間の類似度を演算する第１の類似度演算工程（Ｓ１０７）に代えて、アイテム間の類似度を演算する第２の類似度演算工程（Ｓ２０７）を行う。アイテム間の類似度尺度として例えば全てのアイテムがアイテムに関するデータ中心に同等に類似になるものを使用する。近傍データ抽出工程では、第１の近傍データ抽出工程（Ｓ１０８）に代えて、第２の類似度演算工程（Ｓ２０７）にて演算されたアイテム間の類似度を用いて、ターゲットアイテムとの類似度の高い方からｋ個のアイテムを抽出する第２の近傍データ抽出工程（Ｓ２０８）を行う。評価値予測工程では、第１の評価値予測工程（Ｓ１０９）に代えて、第２の近傍データ抽出工程（Ｓ２０７）にて抽出されたｋ個のアイテムに係る評価値を用いて、対象ユーザのターゲットアイテムに対する未記入のセルに記入すべき評価値を予測する第２の評価値予測工程（Ｓ２０９）を行う。その他の処理フローは実施例１と同様である。
このように、対象アイテムと類似度の高い方からｋ個のアイテムを抽出し、ｋ個のアイテムの評価値の（重み付き）平均としてユーザの評価値を予測する。 Compared to the first embodiment, in the similarity calculation step, instead of the first similarity calculation step (S107) for calculating the similarity between users, the second similarity calculation for calculating the similarity between items is performed. Step (S207) is performed. As a similarity measure between items, for example, a measure in which all items are equally similar to the data center related to the item is used. In the neighborhood data extraction step, instead of the first neighborhood data extraction step (S108), the similarity with the target item is calculated using the similarity between items calculated in the second similarity calculation step (S207). A second neighborhood data extraction step (S208) for extracting k items from the higher one is performed. In the evaluation value prediction step, instead of the first evaluation value prediction step (S109), the evaluation values of the k items extracted in the second neighborhood data extraction step (S207) are used, and the target user's A second evaluation value prediction step (S209) for predicting an evaluation value to be entered in an unfilled cell for the target item is performed. Other processing flows are the same as those in the first embodiment.
In this way, k items are extracted from the one having a higher degree of similarity to the target item, and the evaluation value of the user is predicted as the (weighted) average of the evaluation values of the k items.

本実施例によれば、実施例１と同様に、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ない、アイテム推薦システム及びアイテム推薦方法を提供できる。 According to the present embodiment, as in the first embodiment, an item recommendation system and an item recommendation method that are less affected by an attack as a result even when subjected to an attack that introduces a fake user can be provided.

以上の実施例では、偽ユーザを投入する攻撃について説明したが、攻撃が無いときでも、高次元又は大規模データセットには元来ハブデータが存在し易い。この場合でも、ハブの出現を抑制するように類似度尺度を変換すれば、インフルエンサとなるデータの出現を抑制できる。これにより、本来推薦したいアイテムを推薦することが可能になる（ニース達による研究報告、前述の〔攻撃シリングアタックとハブ現象との関係〕に記載、を参照）。
本実施例においては、偽ユーザ投入による攻撃が無いときでも、インフルエンサによる推薦のバイアスを受けない、アイテム推薦システム及びアイテム推薦方法を提供できる。 In the above embodiment, an attack in which a fake user is input has been described. However, even when there is no attack, hub data originally tends to exist in a high-dimensional or large-scale data set. Even in this case, if the similarity measure is converted so as to suppress the appearance of the hub, the appearance of the data serving as the influencer can be suppressed. This makes it possible to recommend the item that is originally recommended (see the research report by Nice et al., Described in [Relationship between Attack Shilling Attack and Hub Phenomenon] above).
In the present embodiment, it is possible to provide an item recommendation system and an item recommendation method that are not subject to recommendation bias by an influencer even when there is no attack due to fake user input.

以上の実施例では、ユーザ毎に嗜好に合うアイテムを推薦する例について説明したが、大勢の人、大衆に広告を出す場合を想定してみる。もし、平均的なユーザ向けに広告するのが良いと仮定する。この場合、各ユーザ間の類似度を演算する代わりに、平均的な評価値を有する仮のユーザを生成し、当該仮のユーザについて、他のユーザとの類似度を演算して、ｋ人のユーザを抽出し、評価値を予測すれば、大衆を対象としてアイテムを推薦すると好適である。
本実施例においても、偽ユーザを投入する攻撃を受けても、結果として攻撃の影響を受けることが少ない、アイテム推薦システム及びアイテム推薦方法を提供できる。 In the above embodiment, an example of recommending an item that suits the taste for each user has been described, but it is assumed that an advertisement is made to a large number of people and the general public. Suppose that it is better to advertise for the average user. In this case, instead of calculating the similarity between the users, a temporary user having an average evaluation value is generated, and the similarity with other users is calculated for the temporary user, and k users If a user is extracted and an evaluation value is predicted, it is preferable to recommend an item for the masses.
Also in the present embodiment, an item recommendation system and an item recommendation method can be provided that are less affected by an attack as a result even when subjected to an attack that introduces a fake user.

また、本発明は、以上の実施例のフローチャート等に記載のアイテム推薦方法をコンピュータに実行させるためのプログラムとしても実現可能である。プログラムはアイテム推薦システムの記憶部に蓄積して使用してもよく、外付けの記憶装置に蓄積して使用してもよく、インターネットからダウンロードして使用しても良い。また、当該プログラムを記録した記録媒体としても実現可能である。 The present invention can also be realized as a program for causing a computer to execute the item recommendation method described in the flowcharts of the above embodiments. The program may be stored and used in the storage unit of the item recommendation system, stored in an external storage device, or downloaded from the Internet and used. Moreover, it is realizable also as a recording medium which recorded the said program.

以上、本発明の実施の形態について説明したが、実施の形態は以上の例に限られるものではなく、本発明の趣旨を逸脱しない範囲で、種々の変更を加え得ることは明白である。 Although the embodiment of the present invention has been described above, the embodiment is not limited to the above example, and it is obvious that various modifications can be made without departing from the spirit of the present invention.

例えば、アイテム及び評価値については、本明細書中に列挙しなかったアイテム及び評価値についても定量的に評価可能であれば本発明を適用できる。また、類似度尺度については、パラメータを用いて調整可能としても良い。アイテム推薦については、アイテム名に添えて画像、説明文を追記可能である。また、推薦は、各ユーザがウェブページにアクセスした時のほか、各ユーザへのメールからアクセス可能にしても良く、メールで配信することも可能である。また、評価マトリックスの寸法、ｋ近傍法のパラメータｋは目的、状況に応じて適宜定めることができる。 For example, for items and evaluation values, the present invention can be applied if items and evaluation values not listed in the present specification can be evaluated quantitatively. The similarity scale may be adjustable using parameters. For item recommendation, images and explanations can be added to the item name. In addition, the recommendation may be accessible by mail to each user in addition to when each user accesses the web page, and can also be distributed by mail. The dimensions of the evaluation matrix and the parameter k of the k-nearest neighbor method can be appropriately determined according to the purpose and situation.

本発明はユーザベースあるいはアイテムベースに代表される協調フィルタリングに基づく推薦システムに利用される。 The present invention is used in a recommendation system based on collaborative filtering represented by a user base or an item base.

１アイテム推薦システム
１０パーソナルコンピュータ（ＰＣ）
１１登録部
１２評価部
１３類似度演算部
１４近傍データ抽出部
１５評価値予測部
１６アイテム推薦部
１７制御部
１８表示部
１９入出力部
２０記憶部
２１評価マトリックス記憶部
２２ハブデータを抑制した類似度尺度記憶部
２３近傍データ記憶部
２４推薦アイテム記憶部
１１１ユーザ登録部
１１２アイテム登録部
１３１第１の類似度演算部
１３２第２の類似度演算部
１３５類似度尺度変換部
１４１第１の近傍データ記憶部
１４２第２の近傍データ記憶部
１５１第１の評価値予測部
１５２第２の評価値予測部
２３１近傍ユーザ記憶部
２３２近傍アイテム記憶部
ｉアイテム
Ｒ評価マトリクス
Ｒ（ｕ，ｉ）評価値
ｕユーザ 1 Item recommendation system 10 Personal computer (PC)
DESCRIPTION OF SYMBOLS 11 Registration part 12 Evaluation part 13 Similarity calculation part 14 Neighborhood data extraction part 15 Evaluation value prediction part 16 Item recommendation part 17 Control part 18 Display part 19 Input / output part 20 Storage part 21 Evaluation matrix storage part 22 Similarity which suppressed hub data Degree scale storage unit 23 Neighborhood data storage unit 24 Recommended item storage unit 111 User registration unit 112 Item registration unit 131 First similarity calculation unit 132 Second similarity calculation unit 135 Similarity scale conversion unit 141 First neighborhood data Storage unit 142 Second neighborhood data storage unit 151 First evaluation value prediction unit 152 Second evaluation value prediction unit 231 Neighborhood user storage unit 232 Neighborhood item storage unit i Item R Evaluation matrix R (u, i) Evaluation value u User

Claims

An evaluation matrix storage unit for storing an evaluation matrix for entering an evaluation value related to the user's item;
A first similarity calculator that calculates the similarity between users using a similarity measure that suppresses the appearance of a hub;
A first neighborhood data extraction unit that extracts k users from the one having a higher similarity to the target user by using the similarity calculated by the first similarity calculation unit;
A first evaluation value for predicting an evaluation value to be entered in an unfilled cell relating to the target user using an evaluation value relating to an item of k users extracted by the first neighborhood data extraction unit With a predictor;
An item recommendation unit that extracts a recommended item for extracting an item to be recommended to the target user from items having a high evaluation value predicted by the first evaluation value prediction unit and recommends the item to the target user;
Item recommendation system.

An evaluation matrix storage unit for storing an evaluation matrix for entering an evaluation value related to the user's item;
A second similarity calculator that calculates the similarity between items using a similarity measure that suppresses the appearance of a hub;
A second neighborhood data extraction unit that extracts k items from a higher similarity with the target item using the similarity calculated by the second similarity calculation unit;
A second evaluation value for predicting an evaluation value to be entered in an unfilled cell relating to the target user, using the user's evaluation value relating to the k items extracted by the second neighborhood data extraction unit With a predictor;
An item recommendation unit that extracts an item to be recommended to the target user from items having a high evaluation value predicted by the second evaluation value prediction unit and recommends the item to the target user;
Item recommendation system.

A similarity measure storage unit for storing a similarity measure for suppressing the appearance of the hub;
The item recommendation system according to claim 1 or 2.

A similarity scale conversion unit that converts similarity based on a general similarity scale into similarity based on a similarity scale that suppresses the appearance of the hub;
The item recommendation system according to any one of claims 1 to 3.

A weighted average value is used as the evaluation value to be entered when predicting an evaluation value to be entered in an unfilled cell of the target user;
The item recommendation system according to any one of claims 1 to 4.

An evaluation matrix storage step of storing an evaluation matrix for entering an evaluation value relating to the user's item;
A first similarity calculation step of calculating a similarity between users using a similarity measure that suppresses the appearance of a hub;
A first neighborhood data extraction step of extracting k users from the higher similarity with the target user using the similarity calculated in the first similarity calculation step;
A first evaluation value for predicting an evaluation value to be written in an unfilled cell related to the target user using an evaluation value related to an item of k users extracted in the first neighborhood data extraction step A prediction process;
An item recommendation step of extracting an item to be recommended to the target user from items having a high evaluation value predicted in the first evaluation value prediction step and recommending the item to the target user;
Item recommendation method.

An evaluation matrix storage step of storing an evaluation matrix for entering an evaluation value relating to the user's item;
A second similarity calculation step of calculating a similarity between items using a similarity measure that suppresses the appearance of a hub;
A second neighborhood data extraction step of extracting k items from a higher similarity with the target item using the similarity calculated in the second similarity calculation step;
A second evaluation value for predicting an evaluation value to be entered in an unfilled cell relating to the target user, using the user's evaluation value relating to the k items extracted in the second neighborhood data extraction step A prediction process;
An item recommendation step of extracting an item to be recommended to the target user from items having a high evaluation value predicted in the second evaluation value prediction step and recommending the item to the target user;
Item recommendation method.

A similarity scale conversion step of converting similarity based on a general similarity scale into similarity based on a similarity scale that suppresses the appearance of the hub;
The item recommendation method according to claim 6 or 7.

The program for making a computer perform the item recommendation method of any one of Claim 6 thru | or 8.

A computer-readable recording medium on which the program according to claim 9 is recorded.