JP2017182724A

JP2017182724A - Item recommendation program, item recommendation method, and item recommendation apparatus

Info

Publication number: JP2017182724A
Application number: JP2016073265A
Authority: JP
Inventors: 康紀深堀; Yasunori Fukabori; 博巳楠本; Hiromi Kusumoto; 則生岸田; Norio Kishida
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2017-10-05
Anticipated expiration: 2036-03-31
Also published as: JP6668892B2

Abstract

PROBLEM TO BE SOLVED: To reduce calculation load on item recommendation.SOLUTION: An item recommendation program causes a computer to execute generation processing, clustering processing, and extraction processing. The generation processing determines, for each user, a vector which indicates preference of a user with respect to each of characteristics of an item by multiplying history information of the item selected by a user by an item characteristic matrix reduced with at least one piece of group information, out of the item characteristic matrix having characteristics of each of items and group information organizing characteristics, and adds user profile to the vector, to generate a user preference characteristic vector. The clustering processing clusters users, on the basis of the user preference characteristic vectors of the users. The extraction processing aggregates history information of users belonging to each of clusters, to extract an item selected with high frequency as a recommendation item.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、アイテム推薦プログラム、アイテム推薦方法およびアイテム推薦装置に関する。 Embodiments described herein relate generally to an item recommendation program, an item recommendation method, and an item recommendation device.

従来、番組などの多数のアイテムの中からユーザに適したアイテムを検索して推薦するアイテム推薦装置がある。推薦するアイテムの検索には、ユーザベースによるもの、アイテムベースによるものおよびユーザベースとアイテムベースとを組み合わせたハイブリッドベースによるものがある。 2. Description of the Related Art Conventionally, there is an item recommendation device that searches and recommends an item suitable for a user from a large number of items such as a program. Searches for recommended items include a user base, an item base, and a hybrid base combining a user base and an item base.

ユーザベースによるアイテム検索は、ユーザの行動履歴から関心が類似するユーザを抽出し、抽出されたユーザがアクセスしたアイテムを推薦する。アイテムベースによるアイテム検索は、ユーザが過去にアクセスしたアイテムや、選択されたアイテムと類似するアイテムを推薦する。ハイブリッドベースによるアイテム検索は、ユーザのプロファイル、アイテムの内容およびユーザの行動履歴を組み合わせた類似度によりアイテムを推薦する。ユーザベースでは行動履歴の少ない新規ユーザへのアイテム推薦が困難な場合がある。また、アイテムベースでは自分と類似するユーザの情報がアイテム推薦に反映されない。これに対し、ハイブリッドベースでは、行動履歴の少ない新規ユーザへのアイテム推薦や、自分と類似するユーザの情報を反映したアイテム推薦を実現できる。 In the user-based item search, users with similar interests are extracted from the user's behavior history, and the items accessed by the extracted users are recommended. The item-based item search recommends items that the user has accessed in the past or items similar to the selected item. In the hybrid-based item search, an item is recommended based on a similarity obtained by combining a user profile, an item content, and a user action history. In the user base, it may be difficult to recommend items to new users with a small behavior history. In addition, in the item base, the user information similar to the user is not reflected in the item recommendation. On the other hand, in the hybrid base, it is possible to realize item recommendation for a new user with a small action history and item recommendation reflecting information of a user similar to himself / herself.

特開２００２−３６９０９０号公報JP 2002-369090 A 特開２０００−１３７０８号公報JP 2000-13708 A

しかしながら、ハイブリッドベースによるアイテム推薦では、各ユーザのプロファイル、アイテムの内容およびユーザの行動履歴の要素による多次元行列（高階テンソル、多次元配列）のデータ量が膨大なものとなり、アイテム推薦にかかる計算負荷が大きいという問題がある。このように、アイテム推薦にかかる計算負荷が大きい場合には、計算時間が長くなり、実用時間内に処理を終えることが困難なものとなる。 However, in the item recommendation by the hybrid base, the amount of data of the multidimensional matrix (higher order tensor, multidimensional array) due to the profile of each user, the contents of the item, and the elements of the user's behavior history becomes enormous, and the calculation for item recommendation There is a problem that the load is large. Thus, when the calculation load for item recommendation is large, the calculation time becomes long, and it becomes difficult to finish the processing within the practical time.

１つの側面では、アイテム推薦にかかる計算負荷を小さくすることができるアイテム推薦プログラム、アイテム推薦方法およびアイテム推薦装置を提供することを目的とする。 In one aspect, an object is to provide an item recommendation program, an item recommendation method, and an item recommendation device that can reduce the calculation load for item recommendation.

第１の案では、アイテム推薦プログラムは、作成する処理と、クラスタリングする処理と、抽出する処理とをコンピュータに実行させる。作成する処理は、ユーザごとに、アイテムごとの当該アイテムにおける各特性と、特性をまとめるグループ情報とを有するアイテム特性行列のうち、少なくとも一つのグループ情報を用いて縮約したアイテム特性行列に前記ユーザが選択したアイテムの履歴情報を掛けあわせて前記アイテムの各特性に対するユーザの嗜好を示すベクトルを求め、当該ベクトルに前記ユーザのプロファイルを付与してユーザ嗜好特性ベクトルを作成する。クラスタリングする処理は、ユーザごとに作成されたユーザ嗜好特性ベクトルに基づいてユーザをクラスタリングする。抽出する処理は、クラスタごとに、当該クラスタに属するユーザの履歴情報を集計して選択頻度の高いアイテムを推薦するアイテムとして抽出する。 In the first plan, the item recommendation program causes the computer to execute a process to create, a process to cluster, and a process to extract. The processing to be created includes, for each user, an item characteristic matrix contracted by using at least one group information among item characteristic matrices having each characteristic of the item for each item and group information for collecting the characteristics. Is multiplied by the history information of the selected item to obtain a vector indicating the user's preference for each characteristic of the item, and the user's profile is added to the vector to create a user preference characteristic vector. In the clustering process, the users are clustered based on the user preference characteristic vector created for each user. In the process of extracting, for each cluster, the history information of users belonging to the cluster is aggregated and an item with a high selection frequency is extracted as an item to be recommended.

本発明の１実施態様によれば、アイテム推薦にかかる計算負荷を小さくすることができる。 According to one embodiment of the present invention, the calculation load for item recommendation can be reduced.

図１は、実施形態にかかるアイテム推薦装置の機能構成を例示するブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of an item recommendation device according to the embodiment. 図２は、実施形態にかかるアイテム推薦装置の動作例を示すフローチャートである。FIG. 2 is a flowchart illustrating an operation example of the item recommendation device according to the embodiment. 図３は、ユーザ嗜好特性ベクトルの作成を説明する説明図である。FIG. 3 is an explanatory diagram illustrating creation of a user preference characteristic vector. 図４は、ユーザ行動分析テーブルを説明する説明図である。FIG. 4 is an explanatory diagram for explaining the user behavior analysis table. 図５−１は、クラスタ数とＳＤ（Ｋ）の関係を示すグラフである。FIG. 5A is a graph illustrating the relationship between the number of clusters and SD (K). 図５−２は、クラスタ数とＳＳＷ（Ｋ）の関係を示すグラフである。FIG. 5B is a graph illustrating the relationship between the number of clusters and SSW (K). 図６は、クラスタリング処理の一例を示すフローチャートである。FIG. 6 is a flowchart illustrating an example of the clustering process. 図７は、基本推薦リストテーブルの作成を説明する説明図である。FIG. 7 is an explanatory diagram for explaining the creation of the basic recommendation list table. 図８は、ユーザ推薦リストテーブルの作成を説明する説明図である。FIG. 8 is an explanatory diagram for explaining the creation of the user recommendation list table. 図９は、実施形態にかかるアイテム推薦装置のハードウエア構成の一例を示すブロック図である。FIG. 9 is a block diagram illustrating an example of a hardware configuration of the item recommendation device according to the embodiment.

以下、図面を参照して、実施形態にかかるアイテム推薦プログラム、アイテム推薦方法およびアイテム推薦装置を説明する。実施形態において同一の機能を有する構成には同一の符号を付し、重複する説明は省略する。なお、以下の実施形態で説明するアイテム推薦プログラム、アイテム推薦方法およびアイテム推薦装置は、一例を示すに過ぎず、実施形態を限定するものではない。また、以下の各実施形態は、矛盾しない範囲内で適宜組みあわせてもよい。 Hereinafter, an item recommendation program, an item recommendation method, and an item recommendation device according to embodiments will be described with reference to the drawings. In the embodiment, configurations having the same functions are denoted by the same reference numerals, and redundant description is omitted. Note that the item recommendation program, the item recommendation method, and the item recommendation device described in the following embodiments are merely examples, and do not limit the embodiments. In addition, the following embodiments may be appropriately combined within a consistent range.

図１は、実施形態にかかるアイテム推薦装置の機能構成を例示するブロック図である。図１に示すように、アイテム推薦装置１は、分析部１０、クラスタリング部２０および出力部３０を有し、アイテムＤＢ１１における多数のアイテム（番組など）の中からユーザに適したアイテムを検索してユーザ推薦リストテーブル１７を出力する装置である。 FIG. 1 is a block diagram illustrating a functional configuration of an item recommendation device according to the embodiment. As shown in FIG. 1, the item recommendation device 1 includes an analysis unit 10, a clustering unit 20, and an output unit 30, and searches for items suitable for the user from among a large number of items (programs and the like) in the item DB 11. It is a device that outputs a user recommendation list table 17.

なお、本実施形態では、ユーザに推薦するアイテムとして、ビデオ・オン・デマンド（ＶＯＤ）で提供する映像番組（コンテンツ）を例示する。ユーザに推薦するアイテムの種別は、映像番組に限定するものではなく、例えば通販サイトにおいて取り扱う商品などであってもよい。 In this embodiment, a video program (content) provided by video on demand (VOD) is exemplified as an item recommended to the user. The type of item recommended to the user is not limited to a video program, and may be a product handled on a mail order site, for example.

分析部１０には、アイテムＤＢ１１よりアイテム（映像番組）の内容（特性）を抽出した映像番組ごとのアイテムキーワードテーブル１２、ユーザが過去に選択（視聴）した番組の履歴を示すユーザごとの視聴履歴データ１３およびユーザプロファイルＤＢ１８よりロードしたユーザごとのユーザプロファイル１４が入力される。 The analysis unit 10 includes an item keyword table 12 for each video program in which the contents (characteristics) of the item (video program) are extracted from the item DB 11, and a viewing history for each user indicating a history of programs selected (viewed) by the user in the past. The user profile 14 for each user loaded from the data 13 and the user profile DB 18 is input.

アイテムキーワードテーブル１２における映像番組ごとの内容（特性）としては、映像番組のジャンル、出演者、説明文に含まれるキーワード、登場地域などがある。なお、映像番組における各特性については、上記の種類に限定するものではなく、映像番組の特徴を示すものであればいずれであってもよい。 The contents (characteristics) for each video program in the item keyword table 12 include the genre of the video program, the performers, the keywords included in the description, the appearance area, and the like. The characteristics of the video program are not limited to the above-mentioned types, and any characteristic may be used as long as it shows the characteristics of the video program.

ユーザプロファイル１４におけるユーザのプロファイルとしては、ユーザの性別、年齢、所在地などがある。なお、ユーザのプロファイルについては、上記の種類に限定するものではなく、ユーザの特性（特徴）を示すのであればいずれであってもよい。 The user profile in the user profile 14 includes the user's gender, age, location, and the like. Note that the user profile is not limited to the above-mentioned type, and may be any as long as it shows the user's characteristics (features).

分析部１０は、ユーザプロファイル１４を用いたユーザの嗜好の類似性、アイテムキーワードテーブル１２を用いた映像番組の内容の類似性および視聴履歴データ１３を用いた視聴履歴の類似性をハイブリッドベースで分析する。分析部１０は、この分析結果をユーザ行動分析テーブル１５として出力する。 The analysis unit 10 analyzes the similarity of the user's preference using the user profile 14, the similarity of the content of the video program using the item keyword table 12, and the similarity of the viewing history using the viewing history data 13 on a hybrid basis. To do. The analysis unit 10 outputs the analysis result as the user behavior analysis table 15.

クラスタリング部２０は、ユーザ行動分析テーブル１５をもとに類似するユーザのクラスタを作成する。そして、クラスタリング部２０は、クラスタごとに、クラスタに属するユーザの視聴履歴を集計して選択頻度の高い映像番組を推薦する映像番組として抽出する。クラスタリング部２０は、クラスタごとに抽出した映像番組を選択頻度の高い順に並べて基本推薦リストテーブル１６を出力する。 The clustering unit 20 creates a cluster of similar users based on the user behavior analysis table 15. Then, for each cluster, the clustering unit 20 aggregates the viewing histories of users belonging to the cluster and extracts video programs with a high selection frequency as recommended video programs. The clustering unit 20 arranges the video programs extracted for each cluster in descending order of selection frequency and outputs the basic recommendation list table 16.

出力部３０は、基本推薦リストテーブル１６をもとに、ユーザが属するクラスタより抽出された映像番組のユーザ推薦リストテーブル１７を出力する。具体的には、出力部３０は、ユーザの視聴履歴データ１３を参照し、ユーザが属するクラスタより抽出された映像番組のリストからユーザが既に視聴済み（選択済み）である映像番組を除外する。そして、出力部３０は、ユーザが未視聴（未選択）の映像番組のリストをユーザ推薦リストテーブル１７として出力する。これにより、アイテム推薦装置１は、ユーザが未視聴の映像番組を推薦することができる。 The output unit 30 outputs the user recommendation list table 17 of the video program extracted from the cluster to which the user belongs based on the basic recommendation list table 16. Specifically, the output unit 30 refers to the viewing history data 13 of the user, and excludes video programs that the user has already viewed (selected) from the list of video programs extracted from the cluster to which the user belongs. Then, the output unit 30 outputs a list of video programs that the user has not viewed (unselected) as the user recommendation list table 17. Thereby, the item recommendation device 1 can recommend an unviewed video program.

ハイブリッドベースでのアイテム推薦では、多次元行列における要素数が増えると計算負荷が大きくなる。例えば、ユーザ数が１０００万いるビデオ・オン・デマンドサービスにおいて約５万の番組の中からユーザに番組を推薦する場合を想定する。この場合、ユーザごとに視聴履歴を保持する２次元行列（配列）の要素数は、（アイテム、ユーザ）→５・１０＾４×１・１０＾７＝５・１０＾１１と巨大な数となる（「＾」記号はべき乗を表す。以下同様）。これに、ユーザのプロファイルとして性別（男女の２階層）、年齢（５才毎に層別化して２０階層）などの情報を加えた多次元行列（高階テンソル、多次元配列）の要素数は次のとおりに１０テラオーダとなる。
（アイテム、ユーザ、性別）→５・１０＾１１×２＝１・１０＾１２
（アイテム、ユーザ、性別、年齢）→１・１０＾１２×２０＝２・１０＾１３ In the item recommendation on a hybrid basis, the calculation load increases as the number of elements in the multidimensional matrix increases. For example, assume a case in which a program is recommended to a user from about 50,000 programs in a video-on-demand service with 10 million users. In this case, the number of elements of the two-dimensional matrix (array) holding the viewing history for each user is (item, user) → 5 · 10 ^ 4 × 1 · 10 ^ 7 = 5 · 10 ^ 11 ("^" Sign represents power, and so on). The number of elements of a multidimensional matrix (higher tensor, multidimensional array) including information such as sex (two levels of men and women) and age (stratified by every five years and 20 levels) as the user profile is as follows: As shown in the figure, it becomes 10 tera orders.
(Item, user, gender) → 5 · 10 ^ 11 × 2 = 1 · 10 ^ 12
(Item, user, gender, age) → 1 · 10 ^ 12 × 20 = 2 · 10 ^ 13

さらに、アイテムの内容を追加していくと、多次元行列のデータ量はペタオーダーを超えるものとなる。このような多次元行列を利用してユーザ同士の類似度を算出する場合には、行列の特異値分解をテンソルに拡張したテンソル特異値分解を用いる。この特異値分解の計算量は要素数の２乗に比例するので、データ量がペタオーダー（１０＾１２）の場合には、特異値分解にかかる計算量は（１０＾１２）＾４＝１０＾４８となり、計算負荷が大きなものとなる。 Furthermore, as the contents of the items are added, the data amount of the multidimensional matrix exceeds the peta order. When calculating the degree of similarity between users using such a multidimensional matrix, tensor singular value decomposition obtained by extending the singular value decomposition of the matrix to a tensor is used. Since the calculation amount of this singular value decomposition is proportional to the square of the number of elements, when the data amount is peta-order (10 ^ 12), the calculation amount required for singular value decomposition is (10 ^ 12) ^ 4 = 10. ^ 48, which increases the calculation load.

そこで、アイテム推薦装置１の分析部１０は、映像番組（アイテム）ごとの各特性を有するアイテムキーワードテーブル１２（アイテム特性行列）にユーザの視聴履歴データ１３を掛けあわせて、映像番組の各特性に対するユーザの嗜好を示すベクトルを求め、ユーザプロファイル１４を付与してユーザ嗜好特性ベクトルを作成する。すなわち、分析部１０は、アイテムキーワードテーブル１２、視聴履歴データ１３およびユーザプロファイル１４における多次元行列（テンソル）の縮約を行う。 Accordingly, the analysis unit 10 of the item recommendation device 1 multiplies the item keyword table 12 (item characteristic matrix) having each characteristic for each video program (item) by the user's viewing history data 13 to correspond to each characteristic of the video program. A vector indicating the user's preference is obtained, and the user profile 14 is assigned to create a user preference characteristic vector. That is, the analysis unit 10 reduces the multidimensional matrix (tensor) in the item keyword table 12, the viewing history data 13, and the user profile 14.

分析部１０は、ユーザ嗜好特性ベクトルをユーザ毎に求め、ユーザ行動分析テーブル１５として出力する。クラスタリング部２０は、ユーザ行動分析テーブル１５を参照し、ユーザごとのプロファイル・嗜好を有するユーザ嗜好特性ベクトルをもとに、ユーザをクラスタリングする。このように、アイテム推薦装置１では、アイテムキーワードテーブル１２、視聴履歴データ１３およびユーザプロファイル１４における多次元行列（テンソル）の縮約を行うことで、計算にかかるデータ量を削減し、計算負荷を小さくすることができる。 The analysis unit 10 obtains a user preference characteristic vector for each user and outputs it as the user behavior analysis table 15. The clustering unit 20 refers to the user behavior analysis table 15 and clusters users based on user preference characteristic vectors having profiles and preferences for each user. As described above, the item recommendation device 1 reduces the amount of data required for calculation by reducing the multidimensional matrix (tensor) in the item keyword table 12, the viewing history data 13, and the user profile 14, thereby reducing the calculation load. Can be small.

ここで、アイテム推薦装置１の動作の詳細を説明する。なお、以下の説明で使用する記号を次のように定義する。
Ｉ：アイテム（映像番組）数
Ｊ：ユーザ数
Ｋ：アイテム特性のグループ数／ユーザのクラスタ数
Ｌ［ｋ］：ｋ番目のアイテム特性のグループの項目数
Ｍ：ユーザプロファイルの項目数
Ｎ：アイテムにおける全特性の項目数 Here, the detail of operation | movement of the item recommendation apparatus 1 is demonstrated. Note that symbols used in the following description are defined as follows.
I: number of items (video programs) J: number of users K: number of groups of item characteristics / number of clusters of users L [k]: number of items of groups of k-th item characteristics M: number of items of user profile N: in item Number of items for all characteristics

図２は、実施形態にかかるアイテム推薦装置１の動作例を示すフローチャートである。図２に示すように、処理が開始されると、分析部１０は、アイテムキーワードテーブル１２、視聴履歴データ１３およびユーザプロファイル１４の入力を受け付ける。次いで、分析部１０は、視聴履歴データ１３をもとに、全ユーザについて、映像番組ごとの視聴時間比率を要素とする視聴時間比率ベクトルを作成する（Ｓ１）。 FIG. 2 is a flowchart illustrating an operation example of the item recommendation device 1 according to the embodiment. As shown in FIG. 2, when the process is started, the analysis unit 10 receives input of the item keyword table 12, the viewing history data 13, and the user profile 14. Next, the analysis unit 10 creates a viewing time ratio vector having a viewing time ratio for each video program as an element for all users based on the viewing history data 13 (S1).

具体的には、分析部１０は、各ユーザの視聴履歴データ１３をもとに、全ユーザの視聴時間比率行列Ｒを作成する。この視聴時間比率行列Ｒは、列インデックスで各ユーザを、行インデックスで各アイテム（映像番組）を識別し、次のとおりに表記する。
視聴時間比率行列：Ｒ［ｉ，ｊ］＝ｖｔｉｍｅ［ｉ，ｊ］／ｐｔｉｍｅ［ｉ］
ユーザインデックス：ｊ＝１，２，…，Ｊ
アイテムインデックス：ｉ＝１，２，…，Ｉ Specifically, the analysis unit 10 creates a viewing time ratio matrix R for all users based on the viewing history data 13 of each user. The viewing time ratio matrix R identifies each user by a column index and each item (video program) by a row index, and describes them as follows.
Viewing time ratio matrix: R [i, j] = vtime [i, j] / ptime [i]
User index: j = 1, 2,..., J
Item index: i = 1, 2,..., I

ここで、ｖｔｉｍｅ［ｉ，ｊ］は、ユーザ（ｊ）がアイテム（ｉ）を視聴した時間である。また、ｐｔｉｍｅ［ｉ］は、アイテム（ｉ）の再生時間である。視聴時間比率行列Ｒ［ｉ，ｊ］の算出に用いる視聴履歴データ１３は、現在から過去に一定期間（例えば１年間）遡った履歴を用いる。この履歴を用いる期間は、１ヶ月、３ヶ月、半年などのように任意に設定できるものとする。また、分析部１０は、視聴時間比率行列Ｒ［ｉ，ｊ］を一定期間（例えば３ヶ月）ごとに再計算する。この再計算の期間についても、１ヶ月、２週間などのように任意に設定できるものとする。 Here, vtime [i, j] is the time when the user (j) viewed the item (i). Also, ptime [i] is the playback time of item (i). The viewing history data 13 used for calculating the viewing time ratio matrix R [i, j] uses a history that goes back from the present for a certain period (for example, one year). It is assumed that the period for using this history can be arbitrarily set such as one month, three months, and half a year. In addition, the analysis unit 10 recalculates the viewing time ratio matrix R [i, j] every certain period (for example, three months). This recalculation period can also be set arbitrarily such as one month, two weeks.

なお、本実施形態ではアイテムの履歴情報（映像番組の視聴履歴）をもとに、映像番組の視聴時間を算出しているが、アイテムが商品の場合には、ユーザ（ｊ）がアイテム（ｉ）を購入した数を要素値として求めてもよい。 In this embodiment, the viewing time of the video program is calculated based on the item history information (video program viewing history). However, if the item is a product, the user (j) ) May be obtained as an element value.

次いで、分析部１０は、アイテムキーワードテーブル１２をもとに、全アイテムの内容（特性）を行列で表現するアイテム特性行列Ｘ［ｉ，ｇ［ｋ，ｌ］］を作成する（Ｓ２）。なお、アイテム特性行列Ｘ［ｉ，ｇ［ｋ，ｌ］］では、アイテム（ｉ）がアイテム内容項目Ｇ［ｋ，ｌ］を含んでいれば「１」、含んでいなければ「０」の値を持つものとし、次のとおりに定義する。
アイテム特性行列：Ｘ［ｉ，ｇ［ｋ，ｌ］］＝０ｏｒ１
アイテム特性項目行列：Ｇ［ｋ，ｌ］
アイテム特性通し番号インデックス：ｇ［ｋ，ｌ］
アイテム特性グループインデックス：ｋ＝１，２，…，Ｋ
グループ内インデックス：ｌ＝１，２，…，Ｌ［ｋ］ Next, the analysis unit 10 creates an item characteristic matrix X [i, g [k, l]] that represents the contents (characteristics) of all items as a matrix based on the item keyword table 12 (S2). In the item characteristic matrix X [i, g [k, l]], “1” is included if the item (i) includes the item content item G [k, l], and “0” otherwise. It has a value and is defined as follows.
Item characteristic matrix: X [i, g [k, l]] = 0 or 1
Item characteristic item matrix: G [k, l]
Item characteristic serial number index: g [k, l]
Item characteristic group index: k = 1, 2,..., K
In-group index: l = 1, 2,..., L [k]

Ｌ［ｋ］はグループごとに異なる値を持つ。グループとは、ジャンル（ｋ＝１）、出演者（ｋ＝２）、監督（ｋ＝３）、説明文のキーワード（ｋ＝４）、登場地域（ｋ＝５）、公開年（ｋ＝６）…などのように、アイテム（映像番組）を分類するために使用される特性である。なお、本実施例では、説明のために、出演者、監督といった非常に多くのフラグを含む大きなグループに簡略化して表現しているが、それぞれのグループはクラスタリングした場合にユーザ特性が現れる程度に細分化したグループを用意してもよい。 L [k] has a different value for each group. A group is a genre (k = 1), performer (k = 2), director (k = 3), explanatory keyword (k = 4), appearance area (k = 5), release year (k = 6) )... Is a characteristic used to classify items (video programs). In the present embodiment, for the sake of explanation, a large group including a very large number of flags, such as performers and directors, is simply expressed, but each group has a degree of user characteristics when clustered. A subdivided group may be prepared.

アイテム内容項目は、ジャンルならばＧ［１，］＝［ＳＦ、スポーツ、時代劇、ドキュメント、音楽、…］、出演者ならばＧ［２，］＝［○○○○、××××、…］、ジャンル（ｋ）ならばＧ［ｋ，］＝［ＡＡＡＡ、ＢＢＢＢ、…］などのように、アイテムを特徴づける具体的な項目である。また、キーワードは、アイテムの説明文に含まれている名詞、形容詞、動詞などである。 If the item content item is a genre, G [1,] = [SF, sports, historical drama, document, music,...], If the performer is G [2,] = [XXXXXX, XXXXXX, ..], If it is a genre (k), it is a specific item that characterizes the item, such as G [k,] = [AAAA, BBBB,. The keywords are nouns, adjectives, verbs, etc. included in the description of the item.

アイテムのキーワードについては、次の（１）〜（４）を実行することで収集する。
（１）アイテム内容を記述した説明文（テレビ番組の場合はＥＰＧ（電子番組ガイド）、オンデマンドビデオ番組の場合は内容紹介テキスト）からテキストマイニングにより、固有名詞、一般名詞、動詞、形容詞を収集する。
（２）アイテムに関するＳＮＳ（ソーシャル・ネットワーキング・サービス）上のレビュー文章を検索してテキストマイニングを行うことで、（１）で収集できていない新たなキーワードを収集する。
（３）アイテムの提供者側が追加したほうが良いと判断して登録されているキーワードを付加する。
（４）アイテムについて視聴者がコメントした意見よりテキストマイニングして得られたキーワードを付加する。 About the keyword of an item, it collects by performing following (1)-(4).
(1) Collecting proper nouns, common nouns, verbs, and adjectives by text mining from explanations (EPG (electronic program guide) for TV programs, content introduction text for on-demand video programs) that describe item contents To do.
(2) A new keyword that cannot be collected in (1) is collected by searching a review sentence on SNS (Social Networking Service) related to the item and performing text mining.
(3) The registered keyword is added because it is determined that the item provider should add the item.
(4) A keyword obtained by text mining from an opinion commented by the viewer on the item is added.

アイテム特性通し番号インデックスは、あるアイテム内容項目が全グループを通して見た時に、何番目の項目かを表すインデックスである。たとえば、ジャンルが１００項目、出演者が１０００項目、監督が１００項目ある場合、ｇ「１，１００」＝１００、ｇ［２，１］＝１００＋１＝１０１、ｇ［２，２］＝１００＋２＝１０２、ｇ［３，１］＝１００＋１０００＋１＝１１０１などとなる。 The item characteristic serial number index is an index representing the item number when an item content item is viewed through all groups. For example, if there are 100 genres, 1000 performers, and 100 directors, g “1,100” = 100, g [2,1] = 100 + 1 = 101, g [2,2] = 100 + 2 = 102 , G [3,1] = 100 + 1000 + 1 = 1101.

なお、分析部１０は、上述した手法でアイテムキーワードテーブル１２よりアイテム特性行列Ｘ［ｉ，ｇ［ｋ，ｌ］］を作成する際に、内容項目数を制限してもよい。具体的には、分析部１０は、全アイテムに対する特性を表すキーワードの和集合を作成して、共通内容キーワードとすることで、項目数を減らすことができる。 The analysis unit 10 may limit the number of content items when creating the item characteristic matrix X [i, g [k, l]] from the item keyword table 12 by the above-described method. Specifically, the analysis unit 10 can reduce the number of items by creating a union of keywords representing characteristics for all items and using them as common content keywords.

なお、共通内容キーワードの要素数は膨大な数になる可能性があることから、分析部１０は、取捨選択のルールを予め設けておき、要素数が爆発することを防止する。一例として、例えば古いレビューは捨てる、和集合を作成する時に出現頻度の小さいキーワードは捨てる、視聴回数の少ないアイテムに関するキーワードは捨てるなどのルールを適用してもよい。また、特性項目の要素数は、ユーザ嗜好特性ベクトルのクラスタリングが実用的な計算時間内で終了するようにすることからも、制限が生じる場合がある。 Since the number of elements of the common content keyword may be enormous, the analysis unit 10 provides a selection rule in advance to prevent the number of elements from exploding. For example, rules such as discarding old reviews, discarding keywords with low appearance frequency when creating a union, and discarding keywords related to items with a small number of views may be applied. In addition, the number of elements of characteristic items may be limited because clustering of user preference characteristic vectors is completed within a practical calculation time.

次いで、分析部１０は、Ｓ２で作成したアイテム特性行列Ｘ［ｉ，ｇ［ｋ，ｌ］］と、視聴時間比率行列：Ｒ［ｉ，ｊ］とを掛けあわせて嗜好特性行列Ｖ０［ｊ，ｇ［ｋ，ｌ］］（行列の各行が各ユーザの嗜好を示すベクトル）を作成する（Ｓ３）。 Next, the analysis unit 10 multiplies the item characteristic matrix X [i, g [k, l]] created in S2 and the viewing time ratio matrix: R [i, j], to calculate the preference characteristic matrix V0 [j, g [k, l]] (a vector in which each row of the matrix indicates the preference of each user) is created (S3).

具体的には、分析部１０は、嗜好特性行列Ｖ０［ｊ，ｇ［ｋ，ｌ］］は次の式（１）で作成する。 Specifically, the analysis unit 10 creates the preference characteristic matrix V0 [j, g [k, l]] by the following equation (1).

次いで、分析部１０は、ユーザごとに、ユーザ特性（ユーザプロファイル１４）を付加してユーザ嗜好特性ベクトルに拡張する（Ｓ４）。具体的には、分析部１０は、嗜好特性行列Ｖ０［ｊ，ｇ［ｋ，ｌ］］に、ユーザプロファイル１４に基づくユーザプロファイル特性行列を付加した嗜好特性行列Ｖ［ｊ，ｎ］を作成する。なお、ユーザプロファイル特性行列は、列インデックスで各ユーザを、行インデックスで各プロファイルを識別する行列であり、次のとおりである。
ユーザプロファイル特性行列：Ｐ［ｊ，ｍ］
ユーザインデックス：ｊ＝１，２，…，Ｊ
プロファイルインデックス：ｍ＝１，２，…，Ｍ Next, the analysis unit 10 adds a user characteristic (user profile 14) for each user and extends the user preference characteristic vector (S4). Specifically, the analysis unit 10 creates a preference characteristic matrix V [j, n] obtained by adding a user profile characteristic matrix based on the user profile 14 to the preference characteristic matrix V0 [j, g [k, l]]. . The user profile characteristic matrix is a matrix for identifying each user by a column index and each profile by a row index, and is as follows.
User profile characteristic matrix: P [j, m]
User index: j = 1, 2,..., J
Profile index: m = 1, 2,..., M

ユーザプロファイル特性行列の各要素で示されるユーザのプロファイルについては、男女や都道府県名などの非数値の場合もあるが、クラスタリングにおける距離計算が可能となるように数値化する。例えば、男女の場合は男＝１、女＝０のように数値化する。また、都道府県名の場合は、北海道から沖縄までを北海道＝１、青森＝２…のように、通し番号を振って数値化する。また、年齢や年収金額などのように、数値化番号が大きくなる要素は、適当に階層に分割して番号を振り分けるものとする。例えば、年齢の場合は１０歳ごと、年収金額の場合は１００万円ごとに数値化する。なお、これらの数値化にかかる規則は任意に設定できるものとする。 The user profile indicated by each element of the user profile characteristic matrix may be a non-numeric value such as gender or prefecture name, but is digitized so that distance calculation in clustering is possible. For example, in the case of males and females, the numerical values are such that male = 1 and female = 0. In the case of the name of the prefecture, the numbers from Hokkaido to Okinawa are digitized by assigning serial numbers such as Hokkaido = 1, Aomori = 2, etc. In addition, elements such as age and annual income that have a large numerical number are appropriately divided into hierarchies and assigned numbers. For example, in the case of age, it is digitized every 10 years, and in the case of annual income, it is digitized every 1 million yen. It should be noted that these rules for digitization can be arbitrarily set.

なお、嗜好特性行列Ｖ［ｊ，ｎ］については、例えばアイテム内容要素（嗜好特性行列Ｖ０［ｊ，ｇ［ｋ，ｌ］］）の前にユーザプロファイル要素（ユーザプロファイル特性行列Ｐ［ｊ，ｍ］）を付加して作成する。すなわち、嗜好特性行列Ｖ［ｊ，ｎ］については、次のとおりとなる。
（Ｐ［ｊ，１］，Ｐ［ｊ，２］，…，Ｐ［ｊ，Ｍ］，
Ｖ０［ｊ，［１，１］］，Ｖ０［ｊ，［１，２］］，…，Ｖ０［ｊ，［１，Ｌ１］］，Ｖ０［ｊ，［２，１］］，Ｖ０［ｊ，［２，２］］，…，Ｖ０［ｊ，［２，Ｌ２］］，…，
Ｖ０［ｊ，［Ｋ，１］］，Ｖ０［ｊ，［Ｋ，２］］，…，Ｖ０［ｊ，［Ｋ，ＬＫ］） For the preference characteristic matrix V [j, n], for example, the user profile element (user profile characteristic matrix P [j, m] before the item content element (preference characteristic matrix V0 [j, g [k, l]]) is used. ]). That is, the preference characteristic matrix V [j, n] is as follows.
(P [j, 1], P [j, 2], ..., P [j, M],
V0 [j, [1,1]], V0 [j, [1,2]], ..., V0 [j, [1, L1]], V0 [j, [2,1]], V0 [j, [2,2]], ..., V0 [j, [2, L2]], ...,
V0 [j, [K, 1]], V0 [j, [K, 2]], ..., V0 [j, [K, LK])

ここで、ユーザプロファルの特性数はＭ、アイテム内容特性のアイテムグループ数はＫ、各グループの項目数はＬ１，Ｌ２，…ＬＫであるものとしている。このユーザごとの嗜好特性ベクトルを行要素にもつ行列が、嗜好特性行列Ｖ［ｊ，ｎ］（ｊ＝１，…，Ｊ，，ｎ＝１，…，Ｎ＝Ｍ＋Ｌ１＋…＋ＬＫ）である。分析部１０は、作成した嗜好特性行列Ｖ［ｊ，ｎ］をユーザ行動分析テーブル１５として出力する。 Here, it is assumed that the number of characteristics of the user profile is M, the number of item groups of the item content characteristics is K, and the number of items of each group is L1, L2,. A matrix having the preference characteristic vector for each user as a row element is a preference characteristic matrix V [j, n] (j = 1,..., J, n = 1,..., N = M + L1 +... + LK). The analysis unit 10 outputs the created preference characteristic matrix V [j, n] as the user behavior analysis table 15.

図３は、ユーザ嗜好特性ベクトルの作成を説明する説明図である。図３に示すデータテーブルＴ１０は、各アイテム（Ｉｔｅｍ１，Ｉｔｅｍ２…ＩｔｅｍＩ）のアイテム特性行列に、１ユーザについての視聴時間比率ベクトルを掛けあわせたものである。図３に示すように、分析部１０は、アイテム特性行列にユーザの視聴時間比率ベクトルを掛けあわせ、列ごとに集計してユーザの嗜好を示すベクトル（最下段）を求める。 FIG. 3 is an explanatory diagram illustrating creation of a user preference characteristic vector. The data table T10 shown in FIG. 3 is obtained by multiplying the item characteristic matrix of each item (Item1, Item2 ... ItemI) by the viewing time ratio vector for one user. As illustrated in FIG. 3, the analysis unit 10 multiplies the item characteristic matrix by the user viewing time ratio vector and aggregates the columns for each column to obtain a vector (bottom row) indicating the user preference.

図４は、ユーザ行動分析テーブル１５を説明する説明図である。図４に示すように、ユーザ行動分析テーブル１５は、ユーザごとに求めた、ユーザの嗜好を示すベクトル（図３の最下段）にユーザプロファイル（特性１〜特性Ｍ）を付加したものである。なお、ユーザプロファイルについては、実際には数値化されているものとする。 FIG. 4 is an explanatory diagram for explaining the user behavior analysis table 15. As shown in FIG. 4, the user behavior analysis table 15 is obtained by adding a user profile (characteristic 1 to characteristic M) to a vector (the lowest level in FIG. 3) indicating the user's preference obtained for each user. Note that the user profile is actually digitized.

なお、Ｓ４においてユーザ特性を付加したユーザ嗜好特性ベクトルについては、ユーザプロファイルとアイテムに対するユーザの嗜好との要素における数値の大きさが異なり、不揃いとなる。このようなユーザ嗜好特性ベクトルをもとにクラスタリングを行うと、数値の大きい特性でクラスタが決まってしまう場合がある。したがって、分析部１０は、Ｓ４においてユーザ嗜好特性ベクトルを規格化した上で、ユーザ行動分析テーブル１５として出力する。 Note that the user preference characteristic vector to which the user characteristic is added in S4 has different numerical values in the elements of the user profile and the user's preference with respect to the items, and is not uniform. When clustering is performed based on such user preference characteristic vectors, clusters may be determined with characteristics having large numerical values. Therefore, the analysis unit 10 normalizes the user preference characteristic vector in S4 and outputs it as the user behavior analysis table 15.

具体的には、分析部１０は、規格化されたユーザ嗜好特性行列をＮＶ［ｊ，ｎ］（ｊ＝１，…，Ｊ，，ｎ＝１，…，Ｎ＝Ｍ＋Ｌ１＋…＋ＬＫ）とすると、以下の式（２）を用いて規格化する。 Specifically, the analysis unit 10 assumes that the standardized user preference characteristic matrix is NV [j, n] (j = 1,..., J, n = 1,..., N = M + L1 +... + LK). It normalizes using the following formula | equation (2).

また、推薦するアイテムのリストを作成するにあたり、ユーザのクラスタリング時にどの特性グループを重視するか、人為的に調整する場合がある。このような調整を行うために、特性のグループごとに重み係数ベクトルを導入してもよい。 In addition, when creating a list of recommended items, it may be artificially adjusted which characteristic group is important when clustering users. In order to perform such adjustment, a weight coefficient vector may be introduced for each group of characteristics.

具体的には、ユーザプロファイルまたはアイテム内容について次の重み係数ベクトルを導入する。例えば、ｗｃ［ｋ＝４］＝０であれば、４番めの特性グループを考慮しないでクラスタリングを行うこととなる。
ユーザプロファイル重み係数：ｗｕ［ｍ］，ｍ＝１，…，Ｍ，（０≦ｗｕ［ｍ］≦１）
アイテム内容グループ重み係数：ｗｃ［ｋ］，ｋ＝１，…，Ｋ，（０≦ｗｃ［ｋ］≦１） Specifically, the following weight coefficient vector is introduced for the user profile or item content. For example, if wc [k = 4] = 0, clustering is performed without considering the fourth characteristic group.
User profile weight coefficient: wu [m], m = 1,..., M, (0 ≦ wu [m] ≦ 1)
Item content group weight coefficient: wc [k], k = 1,..., K, (0 ≦ wc [k] ≦ 1)

次いで、クラスタリング部２０は、Ｓ４までの処理により分析部１０により作成されたユーザ行動分析テーブル１５を参照し、ユーザをユーザグループ（クラスタ）に分割するクラスタリングの処理を行う（Ｓ５）。 Next, the clustering unit 20 refers to the user behavior analysis table 15 created by the analysis unit 10 through the processing up to S4, and performs clustering processing to divide users into user groups (clusters) (S5).

具体的には、クラスタリング部２０は、ユーザのクラスタリングにおいて、ユーザ行動分析テーブル１５におけるユーザ嗜好特性行列ＮＶ［ｊ，ｎ］を構成する行ベクトルのクラスタリングを行う。クラスタリングを行うため、まず、上述した重み係数を乗じたユーザ嗜好特性行列ＷＶ［ｊ，ｎ］を算出する。なお、重み付きのユーザ嗜好特性行列ＷＶ［ｊ，ｎ］の各行の要素値は次のように算出される。
ユーザプロファイル部分：
ＷＶ［ｊ，ｎ］＝ｗｕ［ｎ］×ＮＶ［ｊ，ｎ］，ｎ＝１，…，Ｍ

アイテム内容部分：
グループ１：ＷＶ［ｊ，ｎ］＝ｗｃ［１］×ＮＶ［ｊ，ｎ］，ｎ＝Ｍ＋１，…，Ｍ＋Ｌ１
グループ２：ＷＶ［ｊ，ｎ］＝ｗｃ［２］×ＮＶ［ｊ，ｎ］，ｎ＝Ｍ＋Ｌ１＋１，…，Ｍ＋Ｌ１＋Ｌ２
：
グループＫ：ＷＶ［ｊ，ｎ］＝ｗｃ［Ｋ］×ＮＶ［ｊ，ｎ］，ｎ＝Ｎ−ＬＫ，…，Ｎ Specifically, the clustering unit 20 performs clustering of row vectors constituting the user preference characteristic matrix NV [j, n] in the user behavior analysis table 15 in user clustering. In order to perform clustering, first, a user preference characteristic matrix WV [j, n] multiplied by the above-described weighting factor is calculated. Note that the element value of each row of the weighted user preference characteristic matrix WV [j, n] is calculated as follows.
User profile part:
WV [j, n] = wu [n] × NV [j, n], n = 1,.

Item content part:
Group 1: WV [j, n] = wc [1] × NV [j, n], n = M + 1,..., M + L1
Group 2: WV [j, n] = wc [2] × NV [j, n], n = M + L1 + 1,..., M + L1 + L2
:
Group K: WV [j, n] = wc [K] × NV [j, n], n = N−LK,.

すなわち、行ベクトル要素は次のようになる。
（ｗｕ［１］ＮＶ［ｊ，１］，ｗｕ［２］ＮＶ［ｊ，２］，…，ｗｕ［Ｍ］ＮＶ［ｊ，Ｍ］，
ｗｃ［１］ＮＶ［ｊ，Ｍ＋１］，ｗｃ［１］ＮＶ［ｊ，Ｍ＋２］，…，ｗｃ［１］ＮＶ［ｊ，Ｍ＋Ｌ１］，
ｗｃ［２］ＮＶ［ｊ，Ｍ＋Ｌ１＋１］，ｗｃ［２］ＮＶ［ｊ，Ｍ＋Ｌ１＋２］，…，ｗｃ［２］ＮＶ［ｊ，Ｍ＋Ｌ１＋Ｌ２］，
：
ｗｃ［Ｋ］ＮＶ［ｊ，Ｎ−ＬＫ］，ｗｃ［Ｋ］ＮＶ［ｊ，Ｎ−ＬＫ＋１］，…，ｗｃ［Ｋ］ＮＶ［ｊ，Ｎ］） That is, the row vector element is as follows.
(Wu [1] NV [j, 1], wu [2] NV [j, 2], ..., wu [M] NV [j, M],
wc [1] NV [j, M + 1], wc [1] NV [j, M + 2], ..., wc [1] NV [j, M + L1],
wc [2] NV [j, M + L1 + 1], wc [2] NV [j, M + L1 + 2], ..., wc [2] NV [j, M + L1 + L2],
:
wc [K] NV [j, N-LK], wc [K] NV [j, N-LK + 1], ..., wc [K] NV [j, N])

クラスタリング部２０は、重み付きのユーザ嗜好特性行列ＷＶ［ｊ，ｎ］を使用してクラスタリングを行う。クラスタリング手法には多くのモデルがある。本実施形態では、ユーザ数（ｊ）が１００万以上で場合によっては１０００万オーダーに達することも考えられるので、計算量がＯ（ｊ＾２）以上になる階層的クラスタリング手法は現実的ではない。よって、Ｏ（クラスタ数×ユーザ数（ｊ））以下の計算量でクラスタリングが行える非階層的な分割最適化クラスタリング手法を採用する。 The clustering unit 20 performs clustering using the weighted user preference characteristic matrix WV [j, n]. There are many models of clustering methods. In the present embodiment, since the number of users (j) is 1 million or more and may reach 10 million orders in some cases, the hierarchical clustering method in which the calculation amount is O (j ^ 2) or more is not realistic. . Therefore, a non-hierarchical division optimization clustering method that can perform clustering with a calculation amount equal to or less than O (number of clusters × number of users (j)) is adopted.

クラスタリング計算においては、類似度の指標となる多次元ベクトルの距離を用いるので、まず距離の計算方法を定義しておく。重み付きのユーザ嗜好特性行列ＷＶ［ｊ，ｎ］の行要素から構成されるｊ行目の行ベクトルをｗｖ_ｊと書くと次の式（３）のように表すことができる。 In the clustering calculation, since the distance of a multidimensional vector that is an index of similarity is used, a distance calculation method is first defined. When a row vector of the j-th row composed of row elements of the weighted user preference characteristic matrix WV [j, n] is written as wv _j , it can be expressed as the following equation (3).

行ベクトルｗｖ_ｊと行ベクトルｗｖ_ｉの距離をｄ_ｊｉ（ｗｖ_ｊ，ｗｖ_ｉ）と書くと２乗ユークリッド距離とマンハッタン距離は以下の式（４）のように定義される。なお、２乗ユークリッド距離またはマンハッタン距離のどちらの距離を採用するかは、クラスタリング結果の良し悪しを見てユーザが決定してもよい。 When the distance between the row vector wv _j and the row vector wv _i is written as d _ji (wv _j , wv _i ), the squared Euclidean distance and the Manhattan distance are defined as the following Expression (4). Note that the user may determine whether to adopt the squared Euclidean distance or the Manhattan distance, based on whether the clustering result is good or bad.

分割最適化クラスタリング手法としては、例えばｋ−ｍｅａｎｓ法を採用する。ｋ−ｍｅａｎｓ法は以下の式（５）の目的関数を最小化する。 For example, the k-means method is adopted as the division optimization clustering method. The k-means method minimizes the objective function of the following equation (5).

ただし、ｗｖ_ｊ∈ｃ_ｋの意味はｋ番目のクラスタに属するユーザについての和をとることを意味する。また、クラスタ数はＫであり、ｃ_ｋがｋ番目のクラスタ中心座標ベクトルであることを仮定している。 However, the meaning of wv _j εc _k means taking the sum of users belonging to the kth cluster. Further, it is assumed that the number of clusters is K, and _ck is the k-th cluster center coordinate vector.

実際のクラスタ分割処理は以下の（１）〜（４）のとおりとなる。
（１）Ｋ個の中心座標ｃ_ｋをランダムに生成する。
（２）初回計算時でない場合は中心座標を以下の式（６）で計算する。 The actual cluster division processing is as follows (1) to (4).
(1) K central coordinates _ck are randomly generated.
(2) If it is not at the time of the first calculation, the center coordinates are calculated by the following equation (6).

（３）ユーザベクトルｗｖ_ｊをｍｉｎ_ｋｄ_ｊｋ（ｗｖ_ｊ，ｃ_ｋ）となるクラスタｋに割り当てる。
（４）前回算出したクラスタメンバーから変化が無ければ計算を終了する。変化がある場合は（２）へ処理を戻す。 (3) The user vector wv _j is assigned to the cluster k that becomes min _k d _jk (wv _j , c _k ).
(4) If there is no change from the previously calculated cluster member, the calculation is terminated. If there is a change, the process returns to (2).

ｋ−ｍｅａｎｓ法は、ランダムに初期中心点を決定しているので、大域的な最小値が得られることを保障しない。よって、実際は何セットかの初期中心点を使ってクラスタリングを行い、そのうちの目的関数が最小のクラスタ分割を採用する。 Since the k-means method randomly determines the initial center point, it does not guarantee that a global minimum value is obtained. Therefore, in practice, clustering is performed using several sets of initial center points, and cluster division with the smallest objective function is adopted.

また、計算時間が実用上問題となる場合は、１０％程度のユーザをランダムにサンプルしてｋ−ｍｅａｎｓ法を行い、中心点ｃ_ｋを固定してから全ユーザのクラスタリングを行うものとする。この場合、全ユーザに対するクラスタリングでは、クラスタの割り当て処理が１回で済むので、計算時間を短縮することができる。 When the calculation time becomes a problem in practice, about 10% of users are randomly sampled, the k-means method is performed, and the center point _ck is fixed, and then all users are clustered. In this case, in the clustering for all the users, the cluster allocation process is performed only once, so that the calculation time can be shortened.

これまでの説明では、クラスタ数（Ｋ）は、アイテムのリストを作成する作成者が予め与えることを暗に仮定している。しかしながら、実際はクラスタ数を予め与えることは困難な場合がある。ただし、ビデオ・オン・デマンドなどの分野における知見に基づき設定できる場合は、クラスタ数を設定してもよい。例えば、映画やビデオ番組などのジャンルは大分類と中分類の組み合わせが百数十あるので、その程度の値を設定する。一方、知見よりクラスタ数を決定できない分野においては、最適クラスタ数を推定するいくつかの手法を利用して決定する。 In the description so far, it is assumed that the number of clusters (K) is given in advance by the creator who creates the list of items. However, in practice, it may be difficult to give the number of clusters in advance. However, the number of clusters may be set if it can be set based on knowledge in a field such as video on demand. For example, since there are hundreds of combinations of major classifications and middle classifications for genres such as movies and video programs, such a value is set. On the other hand, in a field where the number of clusters cannot be determined based on knowledge, some methods are used to estimate the optimal number of clusters.

クラスタ数の推定手法には、「石岡恒憲（２００６）：ｘ−ｍｅａｎｓ法改良の一提案−ｋ−ｍｅａｎｓ法の逐次繰り返しとクラスターの再併合−，『計算機統計学』，１８（１），ｐ．３−１３」のように、多くのモデルがある。ここでは、Ｉｎｔｅｇｒａｔｅｄｃｌａｓｓｉｆｉｃａｔｉｏｎｌｉｋｅｌｉｈｏｏｄ（ＩＣＬ）基準法、ｅｌｂｏｗ−ｐｏｉｎｔ法を使用するが、他の手法の使用に制限はない。例えば、ＩＣＬ基準法については、「Ｂｉｅｒｎａｃｋｉ，Ｃ．，ｅｔ．ａｌ（１９９８）：ＡｓｓｅｓｓｉｎｇａＭｉｘｔｕｒｅＭｏｄｅｌｆｏｒＣｌｕｓｔｅｒｉｎｇｗｉｔｈｔｈｅＩｎｔｅｇｒａｔｅｄＣｌａｓｓｉｆｆｃａｔｉｏｎＬｉｋｅｌｉｈｏｏｄ，ＲａｐｐｏｒｔｄｅＲｅｃｈｅｒｃｈｅ３５２１，ＩＮＲＩＡ．」が知られている。また、ｅｌｂｏｗ−ｐｏｉｎｔ法については、「Ｓａｌｖａｄｏｒ，Ｓ．ａｎｄＣｈａｎ，Ｐ．（２００４）：ＤｅｔｅｒｍｉｎｉｎｇｔｈｅＮｕｍｂｅｒｏｆＣｌｕｓｔｅｒｓ／ＳｅｇｍｅｎｔｓｉｎＨｉｅｒａｒｃｈｉｃａｌＣｌｕｓｔｅｒｉｎｇ／ＳｅｇｍｅｎｔａｔｉｏｎＡｌｇｏｒｉｔｈｍｓ，Ｐｒｏｃ．ｏｆ１６ｔｈＩＥＥＥＩｎｔ．Ｃｏｎｆ．ｏｎＴｏｏｌｓｗｉｔｈＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ，ｐｐ．５７６ − ５８４」が知られている。 The method for estimating the number of clusters is described in “Tsunenori Ishioka (2006): Proposal for improving the x-means method—sequential iteration of k-means method and re-merging of clusters—,“ Computer Statistics ”, 18 (1) There are many models such as “p.3-13”. Here, an integrated classification likelihood (ICL) standard method and an elbow-point method are used, but there is no limitation on the use of other methods. For example, with respect to the ICL standard method, “Biernacki, C., et.al (1998): Assessing a Mixture for clustering the Integrated Classification Derivatives, Rapport, 21”. The elbow-point method is described in “Salvador, S. and Chan, P. (2004): Determining the Number of Clusters in Segmental on the Ill. Of the Institute for Education.” Intelligence, pp. 576-584 ”is known.

ＩＣＬ基準法では、ＩＣＬ値が最大となるクラスタ数を最適クラスタ数に採用する。ＩＣＬ値の算出は次の近似式（７）を用いる。ここで、Ｋはクラスタ数、Ｊはユーザ数、Ｊ_ＫはクラスタＫに属するユーザ数、Ｎはユーザ嗜好特性ベクトルの次元である。 In the ICL standard method, the number of clusters with the maximum ICL value is adopted as the optimum number of clusters. The calculation of the ICL value uses the following approximate expression (7). Here, K is the number of clusters, J is the number of users, J _K is the number of users belonging to the cluster K, and N is the dimension of the user preference characteristic vector.

ｅｌｂｏｗ−ｐｏｉｎｔ法は、ｋｎｅｅ−ｐｏｉｎｔ法ともＬ字法とも呼ばれるように、クラスタ内距離の総和ＳＳＷ（Ｋ）をクラスタ数の関数としたとき、急速に減る領域からほとんど減らなくなる領域への変化点クラスタ数をもって、最適クラスタ数とする手法である。ＳＳＷ（Ｋ）はクラスタ数の連続関数ではないので、２次微分係数を差分近似で求め、その値が負の最大となるクラスタ数をもって最適クラスタ数とする。すなわち次の式（８）を最小とするＫを求める。ここで、ＳＤの相対的な大きさしか見ないので、差分近似式の共通因子である除数２は落としている。 The elbow-point method is called a knee-point method or an L-shaped method. When the sum SSW (K) of intra-cluster distances is a function of the number of clusters, the transition point from a rapidly decreasing region to a region that hardly decreases. This is a method of obtaining the optimum number of clusters by the number of clusters. Since SSW (K) is not a continuous function of the number of clusters, a secondary differential coefficient is obtained by difference approximation, and the number of clusters whose value is the maximum negative is determined as the optimum number of clusters. That is, K that minimizes the following equation (8) is obtained. Here, since only the relative size of SD is seen, the divisor 2 which is a common factor of the difference approximation formula is dropped.

図５−１は、クラスタ数とＳＤ（Ｋ）の関係を示すグラフである。図５−１の例では、最小となるｋ＝６を最適クラスタ数とする。ただし、クラスタ数とＳＳＷ（Ｋ）の関係を確認し、最適クラスタ数として正しいかどうかをユーザが判断して決めてもよい。 FIG. 5A is a graph illustrating the relationship between the number of clusters and SD (K). In the example of FIG. 5A, the minimum k = 6 is set as the optimum number of clusters. However, the relationship between the number of clusters and SSW (K) may be confirmed, and the user may determine whether the optimum number of clusters is correct.

図５−２は、クラスタ数とＳＳＷ（Ｋ）の関係を示すグラフである。図５−２に示すように、クラスタ数とＳＳＷ（Ｋ）の関係を示すグラフをモニタ１０３（図９参照）などに表示し、ユーザが目視で確認してもよい。クラスタ数とＳＳＷ（Ｋ）の関係を示すグラフから正しくないと判断した場合、ユーザは、入力装置１０２（図９参照）より最適変化点を入力する。 FIG. 5B is a graph illustrating the relationship between the number of clusters and SSW (K). As shown in FIG. 5B, a graph showing the relationship between the number of clusters and SSW (K) may be displayed on the monitor 103 (see FIG. 9) or the like, and the user may confirm it visually. If it is determined from the graph indicating the relationship between the number of clusters and SSW (K) that the data is not correct, the user inputs the optimum change point from the input device 102 (see FIG. 9).

また、クラスタリング部２０は、高次元データのクラスタリングにおいて生じる、いわゆる「次元の呪い」の発生を抑止するため、クラスタリングにおいて次元縮約法を用いてもよい。 Further, the clustering unit 20 may use a dimensional reduction method in clustering in order to suppress the occurrence of a so-called “curse of dimension” that occurs in clustering of high-dimensional data.

例えば、クラスタリング部２０は、次元縮約法として次のｉ〜ｉｉｉの３つの手法のいずれかを用いる。
ｉ．１つ目の手法は、ユーザ嗜好ベクトルに含まれる各特性を、特性の種別などに着目した所定のグループごとにまとめ、予め設定されたクラスタリングで有効でないグループを除去して縮約したユーザ嗜好特性ベクトルを求める方法である。
ｉｉ．２つ目の手法は、ユーザ嗜好ベクトルに含まれる各特性を、特性の種別などに着目した所定のグループごとにまとめて縮約した上でクラスタリングし、各クラスタにおいて、縮約前のユーザ嗜好特性ベクトルに基づいてユーザをクラスタリングする方法である（詳細は後述する）。この手法では、全グループ特性を利用したクラスタリングであるが、アイテムの内容特性の細かな違いは反映できないため、２段階クラスタリングを実行している。
ｉｉｉ．３つ目の手法は、アイテム特性行列Ｘ［ｉ，ｇ［ｋ，ｌ］］で各列の和をとったベクトルｓｕｍＸ［ｇ［ｋ，ｌ］］（各要素値は各特性項目の出現回数となる）を作成して、各グループ内で出現数が相対的に小さい項目を除外する方法である。 For example, the clustering unit 20 uses any one of the following three methods i to iii as the dimension reduction method.
i. The first method is a method of collecting user characteristics that are included in the user preference vector for each predetermined group that focuses on the type of characteristics, etc., and that is reduced by removing groups that are not valid using preset clustering. This is a method for obtaining a vector.
ii. In the second method, each characteristic included in the user preference vector is clustered after being reduced for each predetermined group focusing on the type of characteristic, and the user preference characteristics before reduction are clustered in each cluster. This is a method of clustering users based on vectors (details will be described later). In this method, clustering is performed using all group characteristics. However, since a small difference in item content characteristics cannot be reflected, two-stage clustering is executed.
iii. The third method is a vector sumX [g [k, l]] obtained by summing each column in the item characteristic matrix X [i, g [k, l]] (each element value is the number of appearances of each characteristic item). This is a method of excluding items having a relatively small number of occurrences in each group.

ここで、上述した２つ目の手法（ｉｉ）について詳細に説明する。２つ目の手法（ｉｉ）では、クラスタリング部２０は、ユーザ嗜好特性ベクトルに含まれる各特性を、特性の種別などに着目した所定のグループごとにまとめて縮約する。次いで、クラスタリング部２０は、縮約後のユーザ嗜好特性ベクトルに基づいてユーザをクラスタリングする（第１のクラスタリング）。次いで、クラスタリング部２０は、第１のクラスタリング後の各クラスタにおいて、縮約前のユーザ嗜好特性ベクトルに基づいてユーザをクラスタリングする（第２のクラスタリング）。このように多段階の階層ごとのクラスタリングを行うことで、アイテム推薦装置１は、高次元データのクラスタリングにおいて生じる、いわゆる「次元の呪い」の発生を抑止することができる。 Here, the second method (ii) described above will be described in detail. In the second method (ii), the clustering unit 20 collectively reduces each characteristic included in the user preference characteristic vector for each predetermined group focusing on the characteristic type and the like. Next, the clustering unit 20 clusters users based on the reduced user preference characteristic vector (first clustering). Next, the clustering unit 20 clusters users based on the user preference characteristic vector before contraction in each cluster after the first clustering (second clustering). As described above, by performing clustering for each multi-level hierarchy, the item recommendation device 1 can suppress the occurrence of a so-called “dimensional curse” that occurs in clustering of high-dimensional data.

図６は、クラスタリング処理の一例を示すフローチャートである。図６に示すように、処理が開始されると、クラスタリング部２０は、ユーザ嗜好特性ベクトルｗｖ_ｊの要素を特性グループごとにまとめる（Ｓ１０）。 FIG. 6 is a flowchart illustrating an example of the clustering process. As shown in FIG. 6, when the process is started, the clustering unit 20 collects the elements of the user preference characteristic vector wv _j for each characteristic group (S10).

ここで、ユーザ嗜好特性ベクトルｗｖ_ｊについて、特性グループごとにまとめたベクトル要素の算出方法を説明する。この算出方法は、ユーザ嗜好特性ベクトルｗｖ_ｊの特性グループごとの要素値の和をとって、以下の式（９）に示す次元縮約ベクトルｒｖ_ｊに変換することである。 Here, the calculation method of the vector element put together for every characteristic group about the user preference characteristic vector wv _j is demonstrated. This calculation method, taking the sum of the element value of each characteristic group of the user preference feature vector wv _j, is to convert the Dimension Reduction vector rv _j as shown in formula (9).

なお、ユーザプロファイル特性部分の（ＮＶ［ｊ，１］，Ｐ［ｊ，２］，…，ＮＶ［ｊ，Ｍ］）は、１グループ１特性なので、要素の和をとらなくてもよい。また、グループごとに特性項目数は異なるので、規格化のために（Ｌ１，Ｌ２，…，ＬＫ）で各グループ要素値を除している。 Note that (NV [j, 1], P [j, 2],..., NV [j, M]) in the user profile characteristic portion is one group 1 characteristic, and therefore, it is not necessary to take the sum of elements. Further, since the number of characteristic items differs for each group, each group element value is divided by (L1, L2,..., LK) for standardization.

ただし、この次元縮約ベクトルｒｖ_ｊをこのまま単純につかってクラスタリングを行うと、問題が生じる場合がある。例えば、監督グループでは１アイテムに１監督なので、特性グループごとにまとめたベクトル要素がクラスタリングに有効なものとはならない。また、出演者グループの場合は出演者が多いか否かでクラスタリングが行われる場合がある。また、アイテムが映像番組ではなく商品の場合も同様である。例えば、加工食品の原材料をグループとして、各原料を特性項目にしているものとする。ここで、グループ内の和を取ることは、ある加工食品がどの程度の数の原料を使っているかを意味することになり、購入者がどの材料が好きかを反映したものではなくなる。このように、次元縮約によりユーザの嗜好などについての情報が失われる場合がある。 However, if this dimension reduction vector rv _j is simply used as it is for clustering, a problem may occur. For example, since there is one director per item in the director group, the vector elements collected for each characteristic group are not effective for clustering. In the case of a performer group, clustering may be performed depending on whether there are many performers. The same applies when the item is not a video program but a product. For example, assume that raw materials of processed food are grouped and each raw material is a characteristic item. Here, taking the sum within a group means how many ingredients a certain processed food uses, and does not reflect which ingredients the purchaser likes. In this way, information about user preferences may be lost due to dimension reduction.

そこで、本実施形態では、各グループに対するユーザの評価重みを導入してクラスタリングを行う。この評価重みは、ユーザがどのグループの特性を重視してアイテム（映像番組）を選択する傾向があるかを反映するものである。具体的には、クラスタリング部２０は、入力装置１０２（図９参照）からのユーザ設定をもとに、特性グループごとの重み設定を行う（Ｓ１１）。この重み設定により、次元縮約する場合であっても、似たような選択基準を持つユーザに分類することができる。 Therefore, in this embodiment, clustering is performed by introducing user evaluation weights for each group. This evaluation weight reflects which group's characteristics the user tends to select items (video programs). Specifically, the clustering unit 20 performs weight setting for each characteristic group based on the user setting from the input device 102 (see FIG. 9) (S11). By this weight setting, even if the dimension is reduced, it can be classified into users having similar selection criteria.

クラスタリング部２０は、ユーザ設定をもとに、各グループに対するユーザの評価重みｗｇ［ｋ］，ｋ＝１，…，Ｋ，（０≦ｗｇ［ｋ］≦１）を導入する。さらに、クラスタリング部２０は、混合比率をδ（０≦δ≦１）とし、以下の式（１０）に示すベクトルを作成してクラスタリング（第１段階）を行う（Ｓ１２）。なお、混合比率δはＳ１１においてユーザが設定するものとする。 The clustering unit 20 introduces user evaluation weights wg [k], k = 1,..., K, (0 ≦ wg [k] ≦ 1) for each group based on the user settings. Further, the clustering unit 20 sets the mixing ratio to δ (0 ≦ δ ≦ 1), creates a vector shown in the following equation (10), and performs clustering (first stage) (S12). The mixing ratio δ is set by the user in S11.

次いで、クラスタリング部２０は、第１段階のクラスタリングで求めた各クラスタ内のユーザに対して、縮約前のユーザ全ての特性項目（ユーザ嗜好特性ベクトルｗｖ_ｊ）を使ってクラスタリング（第２段階）する（Ｓ１３）。 Next, the clustering unit 20 performs clustering on the users in each cluster obtained by the first-stage clustering using all the characteristic items (user preference characteristic vector wv _j ) before reduction (second stage). (S13).

図２に戻り、クラスタリング部２０は、クラスタごとに、クラスタに属するユーザの視聴履歴を集計して選択頻度の高い映像番組を推薦する映像番組として抽出する。クラスタリング部２０は、クラスタごとに抽出した映像番組を選択頻度の高い順に並べて基本推薦リストテーブル１６を作成する（Ｓ６）。 Returning to FIG. 2, for each cluster, the clustering unit 20 aggregates the viewing histories of users belonging to the cluster and extracts video programs with a high selection frequency as recommended video programs. The clustering unit 20 creates the basic recommendation list table 16 by arranging the video programs extracted for each cluster in descending order of selection frequency (S6).

図７は、基本推薦リストテーブル１６の作成を説明する説明図である。図７に示すように、クラスタリング部２０は、クラスタリングにより、ユーザごとに、ユーザが属するクラスタを示すデータテーブルＴ１１を生成する。次いで、クラスタリング部２０は、データテーブルＴ１１をもとに、クラスタごとに、クラスタに属するユーザの視聴履歴を集計して選択頻度の高い映像番組を抽出する。例えば、図示例では、「１」のクラスタについては、「ｕｓｅｒ５、ｕｓｅｒ７」の視聴履歴を集計して選択頻度の高い映像番組「１１、２５、９８、４、６２」が抽出され、選択頻度の高い順に推薦順位が付与される。このようにして、クラスタリング部２０は、全クラスタについての推薦順位のリストを求め、基本推薦リストテーブル１６として出力する。 FIG. 7 is an explanatory diagram for explaining the creation of the basic recommendation list table 16. As illustrated in FIG. 7, the clustering unit 20 generates a data table T11 indicating the cluster to which the user belongs for each user by clustering. Next, based on the data table T11, the clustering unit 20 aggregates viewing histories of users belonging to the cluster for each cluster and extracts video programs with a high selection frequency. For example, in the illustrated example, for the cluster “1”, the viewing history of “user5, user7” is aggregated to extract the video program “11, 25, 98, 4, 62” having a high selection frequency, and the selection frequency of The recommendation order is given in descending order. In this way, the clustering unit 20 obtains a list of recommendation ranks for all clusters and outputs it as the basic recommendation list table 16.

次いで、出力部３０は、基本推薦リストテーブル１６と、視聴履歴データ１３とをもとに、各ユーザについてのユーザ推薦リストテーブル１７を出力する（Ｓ７）。図８は、ユーザ推薦リストテーブル１７の作成を説明する説明図である。 Next, the output unit 30 outputs the user recommendation list table 17 for each user based on the basic recommendation list table 16 and the viewing history data 13 (S7). FIG. 8 is an explanatory diagram for explaining the creation of the user recommendation list table 17.

図８に示すように、出力部３０は、ユーザの視聴履歴データ１３を参照し、ユーザが属するクラスタより抽出された基本推薦リストテーブル１６からユーザが既に視聴済み（選択済み）である映像番組を除外する。そして、出力部３０は、ユーザが未視聴（未選択）の映像番組のリストをユーザ推薦リストテーブル１７として出力する。 As illustrated in FIG. 8, the output unit 30 refers to the user's viewing history data 13, and the video program that the user has already viewed (selected) from the basic recommendation list table 16 extracted from the cluster to which the user belongs. exclude. Then, the output unit 30 outputs a list of video programs that the user has not viewed (unselected) as the user recommendation list table 17.

例えば、図８の例では、「１」のユーザは、「４」のクラスタに属しており、「６、１２、２５、６２、９９」の番組を視聴済みである。また、「４」のクラスタについては、「６、８８、６２、１３、７８」が推薦リストとして集計されている。したがって、「１」のユーザについては、集計された推薦リストの中で未視聴の「８８、１３、７８」が出力されることとなる。これにより、アイテム推薦装置１は、ユーザが未視聴の映像番組を推薦することができる。 For example, in the example of FIG. 8, the user “1” belongs to the cluster “4”, and the program “6, 12, 25, 62, 99” has been viewed. For the cluster “4”, “6, 88, 62, 13, 78” is tabulated as a recommendation list. Therefore, for the user of “1”, “88, 13, 78” which has not been viewed in the totaled recommendation list is output. Thereby, the item recommendation device 1 can recommend an unviewed video program.

また、出力部３０は、ユーザ推薦リストテーブル１７におけるユーザごとに推薦するアイテムについての評価値を算出して出力してもよい。具体的には、出力部３０は、推薦するアイテムの評価値として、ｐｒｅｃｉｓｉｏｎ、ｒｅｃａｌｌ、Ｆ−ｍｅａｓｕｒｅを算出する。それぞれの定義は以下の式（１１）のとおりである。 The output unit 30 may calculate and output an evaluation value for an item recommended for each user in the user recommendation list table 17. Specifically, the output unit 30 calculates a precision, a recall, and an F-measure as evaluation values of recommended items. Each definition is as the following formula | equation (11).

たとえば、上記の評価値の算出は、ユーザのユーザ推薦リストテーブル１７のトップＮのアイテムを使用して、全ユーザおよびクラスタごとのユーザに対して行う。トップＮの「Ｎ」の値は、ユーザが任意に設定可能なものであってよく、暗黙値として例えば「１０」を設定しておく。 For example, the calculation of the evaluation value is performed for all users and users for each cluster using the top N items of the user recommendation list table 17 of the users. The value of “N” for the top N may be arbitrarily set by the user, and for example, “10” is set as an implicit value.

評価値の算出に使用する視聴履歴データ１３は、ユーザ推薦リストテーブル１７の作成の前と後とで異なる。ユーザ推薦リストテーブル１７の作成前は、視聴履歴データ１３を適当な評価期間に分割して評価すればよい。ただし、この分割割合はユーザが任意に設定するものとする。ユーザ推薦リストテーブル１７の作成後は、新規の視聴履歴データ１３が獲得できるので、ある期間分の視聴履歴データ１３が蓄積できたところで評価を行う。ただし、この蓄積に要する期間はユーザが任意に設定するものとする。 The viewing history data 13 used for calculating the evaluation value is different before and after the creation of the user recommendation list table 17. Before the user recommendation list table 17 is created, the viewing history data 13 may be divided into appropriate evaluation periods for evaluation. However, the division ratio is arbitrarily set by the user. Since the new viewing history data 13 can be acquired after the user recommendation list table 17 is created, the evaluation is performed when the viewing history data 13 for a certain period can be accumulated. However, the period required for this accumulation is arbitrarily set by the user.

ユーザは、出力された評価値を確認することで、入力装置１０２（図９参照）より諸設定を更新して処理の改善を試みることができる。たとえば、ユーザは、評価値が小さく、推薦の効果が悪いと判断した場合は、グループ特性重みの調整や新たな特性の項目を追加する設定を入力装置１０２（図９参照）より行うなどして改善を試みることができる。 By checking the output evaluation value, the user can update various settings from the input device 102 (see FIG. 9) and try to improve the process. For example, when the user determines that the evaluation value is small and the recommendation effect is bad, the user performs adjustment of the group characteristic weight or setting for adding a new characteristic item from the input device 102 (see FIG. 9). You can try to improve.

また、アイテム推薦装置１で行われる各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ（Micro Controller Unit）等のマイクロ・コンピュータ）上で、その全部または任意の一部を実行するようにしてもよい。また、各種処理機能は、ＣＰＵ（またはＭＰＵ、ＭＣＵ等のマイクロ・コンピュータ）で解析実行されるプログラム上、またはワイヤードロジックによるハードウエア上で、その全部または任意の一部を実行するようにしてもよいことは言うまでもない。また、アイテム推薦装置１で行われる各種処理機能は、クラウドコンピューティングにより、複数のコンピュータが協働して実行してもよい。 Various processing functions performed by the item recommendation device 1 may be executed entirely or arbitrarily on a CPU (or a microcomputer such as an MPU or MCU (Micro Controller Unit)). In addition, various processing functions may be executed in whole or in any part on a program that is analyzed and executed by a CPU (or a microcomputer such as an MPU or MCU) or hardware based on wired logic. Needless to say, it is good. Further, various processing functions performed in the item recommendation device 1 may be executed by a plurality of computers in cooperation with each other by cloud computing.

ところで、上記の実施形態で説明した各種の処理は、予め用意されたプログラムをコンピュータで実行することで実現できる。そこで、以下では、上記の実施形態と同様の機能を有するプログラムを実行するコンピュータ（ハードウエア）の一例を説明する。図９は、実施形態にかかるアイテム推薦装置１のハードウエア構成の一例を示すブロック図である。 By the way, the various processes described in the above embodiments can be realized by executing a program prepared in advance by a computer. Therefore, in the following, an example of a computer (hardware) that executes a program having the same function as the above embodiment will be described. FIG. 9 is a block diagram illustrating an example of a hardware configuration of the item recommendation device 1 according to the embodiment.

図９に示すように、アイテム推薦装置１は、各種演算処理を実行するＣＰＵ１０１と、データ入力を受け付ける入力装置１０２と、モニタ１０３と、スピーカ１０４とを有する。また、アイテム推薦装置１は、記憶媒体からプログラム等を読み取る媒体読取装置１０５と、各種装置と接続するためのインタフェース装置１０６と、有線または無線により外部機器と通信接続するための通信装置１０７とを有する。また、アイテム推薦装置１は、各種情報を一時記憶するＲＡＭ１０８と、ハードディスク装置１０９とを有する。また、アイテム推薦装置１内の各部（１０１〜１０９）は、バス１１０に接続される。 As illustrated in FIG. 9, the item recommendation device 1 includes a CPU 101 that executes various arithmetic processes, an input device 102 that receives data input, a monitor 103, and a speaker 104. In addition, the item recommendation device 1 includes a medium reading device 105 that reads a program and the like from a storage medium, an interface device 106 for connection to various devices, and a communication device 107 for communication connection with an external device by wire or wireless. Have. The item recommendation device 1 also includes a RAM 108 that temporarily stores various types of information and a hard disk device 109. Each unit (101 to 109) in the item recommendation device 1 is connected to the bus 110.

ハードディスク装置１０９には、上記の実施形態で説明した分析部１０、クラスタリング部２０および出力部３０における各種の処理を実行するためのプログラム１１１が記憶される。また、ハードディスク装置１０９には、プログラム１１１が参照する各種データ１１２（アイテムキーワードテーブル１２、視聴履歴データ１３など）が記憶される。入力装置１０２は、例えば、アイテム推薦装置１の操作者から操作情報の入力を受け付ける。モニタ１０３は、例えば、操作者が操作する各種画面を表示する。インタフェース装置１０６は、例えば印刷装置等が接続される。通信装置１０７は、ＬＡＮ（Local Area Network）等の通信ネットワークと接続され、通信ネットワークを介した外部機器との間で各種情報をやりとりする。 The hard disk device 109 stores a program 111 for executing various processes in the analysis unit 10, the clustering unit 20, and the output unit 30 described in the above embodiment. The hard disk device 109 stores various data 112 (the item keyword table 12, the viewing history data 13 and the like) referred to by the program 111. For example, the input device 102 receives input of operation information from an operator of the item recommendation device 1. The monitor 103 displays various screens operated by the operator, for example. The interface device 106 is connected to, for example, a printing device. The communication device 107 is connected to a communication network such as a LAN (Local Area Network), and exchanges various types of information with an external device via the communication network.

ＣＰＵ１０１は、ハードディスク装置１０９に記憶されたプログラム１１１を読み出して、ＲＡＭ１０８に展開して実行することで、各種の処理を行う。なお、プログラム１１１は、ハードディスク装置１０９に記憶されていなくてもよい。例えば、アイテム推薦装置１が読み取り可能な記憶媒体に記憶されたプログラム１１１を読み出して実行するようにしてもよい。アイテム推薦装置１が読み取り可能な記憶媒体は、例えば、ＣＤ−ＲＯＭやＤＶＤディスク、ＵＳＢ（Universal Serial Bus）メモリ等の可搬型記録媒体、フラッシュメモリ等の半導体メモリ、ハードディスクドライブ等が対応する。また、公衆回線、インターネット、ＬＡＮ等に接続された装置にこのプログラム１１１を記憶させておき、アイテム推薦装置１がこれらからプログラム１１１を読み出して実行するようにしてもよい。 The CPU 101 reads out the program 111 stored in the hard disk device 109, develops it in the RAM 108, and executes it to perform various processes. Note that the program 111 may not be stored in the hard disk device 109. For example, the program 111 stored in a storage medium readable by the item recommendation device 1 may be read and executed. The storage medium readable by the item recommendation device 1 corresponds to, for example, a portable recording medium such as a CD-ROM, a DVD disk, a USB (Universal Serial Bus) memory, a semiconductor memory such as a flash memory, a hard disk drive, and the like. Alternatively, the program 111 may be stored in a device connected to a public line, the Internet, a LAN, or the like, and the item recommendation device 1 may read and execute the program 111 therefrom.

１…アイテム推薦装置
１０…分析部
１１…アイテムＤＢ
１２…アイテムキーワードテーブル
１３…視聴履歴データ
１４、１８…ユーザプロファイル
１５…ユーザ行動分析テーブル
１６…基本推薦リストテーブル
１７…ユーザ推薦リストテーブル
２０…クラスタリング部
３０…出力部
１０１…ＣＰＵ
１０２…入力装置
１０３…モニタ
１０４…スピーカ
１０５…媒体読取装置
１０６…インタフェース装置
１０７…通信装置
１０８…ＲＡＭ
１０９…ハードディスク装置
１１０…バス
１１１…プログラム
１１２…各種データ
Ｔ１０、Ｔ１１…データテーブル 1 ... Item recommendation device 10 ... Analysis unit 11 ... Item DB
12 ... Item keyword table 13 ... Viewing history data 14, 18 ... User profile 15 ... User behavior analysis table 16 ... Basic recommendation list table 17 ... User recommendation list table 20 ... Clustering unit 30 ... Output unit 101 ... CPU
102 ... Input device 103 ... Monitor 104 ... Speaker 105 ... Media reader 106 ... Interface device 107 ... Communication device 108 ... RAM
109 ... Hard disk device 110 ... Bus 111 ... Program 112 ... Various data T10, T11 ... Data table

Claims

For each user, the item selected by the user in the item characteristic matrix contracted using at least one group information out of the item characteristic matrix having each characteristic in the item for each item and group information for grouping the characteristics. To obtain a vector indicating the user's preference for each characteristic of the item by multiplying the history information, to create a user preference characteristic vector by adding the user profile to the vector,
Clustering users based on user preference characteristic vectors created for each user,
An item recommendation program that causes a computer to execute a process of collecting, for each cluster, history information of users belonging to the cluster and extracting items with high selection frequency as recommended items.

The clustering process includes a first clustering that reduces the characteristics included in the user preference characteristic vector collectively for each predetermined group, and clusters users based on the reduced user preference characteristic vector; 2. The item recommendation program according to claim 1, wherein in each cluster after the first clustering, second clustering is performed for clustering users based on the user preference characteristic vector before contraction.

The clustering process is performed by reducing each characteristic included in the user preference characteristic vector by excluding a predetermined group, and clustering users based on the reduced user preference characteristic vector; and 2. The item recommendation program according to claim 1, wherein in each cluster after the first clustering, second clustering is performed for clustering users based on the user preference characteristic vector before contraction.

The item recommendation program according to claim 2 or 3, wherein in the clustering process, a weight is set for each group to be reduced in the first clustering.

5. The process according to claim 1, wherein the creating process normalizes each element indicating a user preference and a profile of the user included in the user preference characteristic vector. 5. Item recommendation program given in one paragraph.

The computer according to any one of claims 1 to 5, wherein the computer further executes a process of outputting an unselected item by the user based on the history information from recommended items extracted from a cluster to which the user belongs. Item recommendation program given in any 1 paragraph.

For each user, the item selected by the user in the item characteristic matrix contracted using at least one group information out of the item characteristic matrix having each characteristic in the item for each item and group information for grouping the characteristics. To obtain a vector indicating the user's preference for each characteristic of the item by multiplying the history information, to create a user preference characteristic vector by adding the user profile to the vector,
Clustering users based on user preference characteristic vectors created for each user,
An item recommendation method characterized in that, for each cluster, a computer executes a process of aggregating history information of users belonging to the cluster and extracting an item with a high selection frequency as an item to be recommended.

For each user, the item selected by the user in the item characteristic matrix contracted using at least one group information out of the item characteristic matrix having each characteristic in the item for each item and group information for grouping the characteristics. A vector indicating user preferences for each characteristic of the item by multiplying the history information, and an analysis unit for creating a user preference characteristic vector by adding the user profile to the vector;
A clustering unit for clustering users based on the user preference characteristic vector created for each user, and for each cluster, collecting history information of users belonging to the cluster and extracting items with a high selection frequency as recommended items; An item recommendation device comprising: